Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #44839 > unrolled thread

find overlapping lines & output times observed

Started byLinsey Raaijmakers <lm.raaijmakers@gmail.com>
First post2013-05-06 11:39 -0700
Last post2013-05-06 21:32 +0100
Articles 2 — 2 participants

Back to article view | Back to comp.lang.python


Contents

  find overlapping lines & output times observed Linsey Raaijmakers <lm.raaijmakers@gmail.com> - 2013-05-06 11:39 -0700
    Re: find overlapping lines & output times observed Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-05-06 21:32 +0100

#44839 — find overlapping lines & output times observed

FromLinsey Raaijmakers <lm.raaijmakers@gmail.com>
Date2013-05-06 11:39 -0700
Subjectfind overlapping lines & output times observed
Message-ID<05aadf46-6cf2-4040-84e6-4e480528f6fd@googlegroups.com>
Hello,

I have a file like this:
action start    end
50	5321 	5321
7	5323	        5347
12	5339	        5351
45	5373  	5373
45	5420	        5420
25	5425	        5425
26	5425	        5425
50	5451	        5451
45	5452  	5452
14	5497	        5503
26	5513	        5513
25	5517	        5517
45	5533	        5533
50	5533	        5533
5	5537	        5540
25	5580	        5580
45	5586  	5586
26	5595	        5595
45	5603	        5603
50	5634	        5634
45	5645	        5645
7	5657	        5689
25	5682	        5682
26	5682	        5690
26	5708	        5708
45	5717     	5717
50	5740	        5740
45	5777	        5777
45	5804	        5804
7	5805	        5845

and want to find how many times combinations occur in a time frame (between column 2 and 3 ). This can be multiple combinations, which is my problem now.
I have no problems finding overlap between 2 actions. 
I want to start with the first line, action 50. and check for all lines in the rest of the file if there are lines that overlap this action.
So the first line has no overlap, but action 25 and 26 would be a combination that overlaps, and 45 and 50 (5533	-5533). but 7,25,26 would be a combination of 3(5682-5682 & 5682-5690 & 5657-5689 because these three overlap each other.

I have a script now that identifies overlap between two actions (see bottom page), but how can I change this so that it outputs all possible combinations?

My desired output would be:

action    times observed    apex
50         5                          5321, 5451, 5533, 5634,  5740
50,45    1                          5533;5533
7           4                          5347, 5689, 5688, 5845
7,25      2                          5347;5425, 5689;5682
7,25,26 1                          5689;5682;5690

CODE: 

from collections import Counter
f = open('and.txt','r');

action_list = []
onset_list = []
apex_list = []
offset_list = []
action_counter = 0
combination_list = []


for line in f:
  fields = line.split("\t")
  for col in fields:
    action = fields[0]
    onset = fields[1]
    apex = fields[2]
    offset = fields[3]

  action_list.append(action)
  onset_list.append(onset)
  apex_list.append(apex)
  offset_list.append(offset)

action_cnvrt = map(int, action_list)
c = Counter(action_cnvrt)

filtered = list(set(action_list))
filtered_cnvrt = map(int, filtered)

for a in filtered_cnvrt:
  action_count = str(a)+"\t"+str(c[a])
  print action_count

for i in range (0,len(onset_list)):
  combination_list.append(action_list[i])
  for j in range(0,len(apex_list)):
    if i != j:
      if onset_list[j]>= onset_list[i] and apex_list[j] <= apex_list[i]:
        print action_list[j]+","+action_list[i]+'\t'+onset_list[j]+'\t'+apex_list[j]+'\t'+onset_list[i]+'\t'+apex_list[i]


I hope somebody can help me :)

[toc] | [next] | [standalone]


#44849

FromOscar Benjamin <oscar.j.benjamin@gmail.com>
Date2013-05-06 21:32 +0100
Message-ID<mailman.1384.1367872382.3114.python-list@python.org>
In reply to#44839
On 6 May 2013 19:39, Linsey Raaijmakers <lm.raaijmakers@gmail.com> wrote:
> I have a file like this:
> action start    end
> 50      5321    5321
> 7       5323            5347
> 12      5339            5351
> 45      5373    5373
> 45      5420            5420
> 25      5425            5425
[snip]

your code below suggests that your file also has an "apex" column. If
that's correct what do you what is it and what do you want to do with
it?

[snip]
> I have a script now that identifies overlap between two actions (see bottom page), but how can I change this so that it outputs all possible combinations?

I looked at that code and I don't think it does what you describe. Are
you sure that it works?

> My desired output would be:
>
> action    times observed    apex
> 50         5                          5321, 5451, 5533, 5634,  5740
> 50,45    1                          5533;5533
> 7           4                          5347, 5689, 5688, 5845
> 7,25      2                          5347;5425, 5689;5682
> 7,25,26 1                          5689;5682;5690
>
> CODE:
>
> from collections import Counter
> f = open('and.txt','r');
>
> action_list = []
> onset_list = []
> apex_list = []
> offset_list = []
> action_counter = 0
> combination_list = []
>
>
> for line in f:
>   fields = line.split("\t")
>   for col in fields:
>     action = fields[0]
>     onset = fields[1]
>     apex = fields[2]
>     offset = fields[3]

The above code suggests that the file has four columns. Also since
you're not actually using the loop variable "col" you can just delete
the "for" line and unindent the rest. In fact the easiest way to do
all of this is just:

    action, onset, apex, offset = line.split()

>
>   action_list.append(action)
>   onset_list.append(onset)
>   apex_list.append(apex)
>   offset_list.append(offset)
>
> action_cnvrt = map(int, action_list)
> c = Counter(action_cnvrt)
>
> filtered = list(set(action_list))

There's no need to convert this back to a list if you're just going to
iterate over it again with map.

> filtered_cnvrt = map(int, filtered)
>
> for a in filtered_cnvrt:
>   action_count = str(a)+"\t"+str(c[a])
>   print action_count

The above counts the number of times each event occurs which is one
part of your desired output.

>
> for i in range (0,len(onset_list)):
>   combination_list.append(action_list[i])
>   for j in range(0,len(apex_list)):
>     if i != j:
>       if onset_list[j]>= onset_list[i] and apex_list[j] <= apex_list[i]:
>         print action_list[j]+","+action_list[i]+'\t'+onset_list[j]+'\t'+apex_list[j]+'\t'+onset_list[i]+'\t'+apex_list[i]

What is combination_list for? It should just end up containing the
same thing as action_list if I understand correctly.

It's generally better in Python to loop directly over things rather
than using indices so, instead of something like:

  for i in range(len(onset_list)):
    print offset_list[i] - onset_list[i]

you should do something like

  for offset, onset in zip(offset_list, onset_list):
    print offset - onset

and your code will be a lot clearer.

The algorithm you are using is to loop over all events and then to
loop over all other events comparing all pairs of events. This will
not scale very well if you want to look at a large file or to compare
simultaneous occurrences of more than two events.

It looks as if your input data is ordered by the onset column. Is that
the case? If so then you can use an algorithm that just loops once
over all the events. The way the algorithm works is that you store
which events are currently active and loop through the events keeping
track of the start time of the most recently added event adding and
removing events as they start and stop. In pseudocode:

now = start of first event
active = [first event]
for next_starting in events (not including first):
    next_ending = event that ends soonest from active
    while start of next_starting > end of next_ending:
        report active events from now to end of next_ending
        now = end of next_ending
        remove next_ending from active
    report active events from now until start of next_starting
    now = start of next_starting
    add next_starting to active

And some more code to deal with what happens when you get to the end
of the list of events...

The report steps probably mean adding to a Counter or dict to remember
that the currently active events were active during each particular
time window.


Oscar

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web