Groups > comp.lang.python > #50170 > unrolled thread

make sublists of a list broken at nth certain list items

Started by	CM <cmpython@gmail.com>
First post	2013-07-08 13:52 -0700
Last post	2013-07-08 22:27 +0100
Articles	4 — 3 participants

Back to article view | Back to comp.lang.python

  make sublists of a list broken at nth certain list items CM <cmpython@gmail.com> - 2013-07-08 13:52 -0700
    Re: make sublists of a list broken at nth certain list items Fábio Santos <fabiosantosart@gmail.com> - 2013-07-08 22:13 +0100
    Re: make sublists of a list broken at nth certain list items Joshua Landau <joshua.landau.ws@gmail.com> - 2013-07-08 22:24 +0100
    Re: make sublists of a list broken at nth certain list items Joshua Landau <joshua.landau.ws@gmail.com> - 2013-07-08 22:27 +0100

#50170 — make sublists of a list broken at nth certain list items

From	CM <cmpython@gmail.com>
Date	2013-07-08 13:52 -0700
Subject	make sublists of a list broken at nth certain list items
Message-ID	<9d0cd072-3cf7-4156-8e84-884faeef7048@googlegroups.com>

I'm looking for a Pythonic way to do the following:

I have data in the form of a long list of tuples.  I would like to break that list into four sub-lists.  The break points would be based on the nth occasion of a particular tuple.  (The list represents behavioral data trials; the particular tuple represents the break between trials; I want to collect 20 trials at a time, so every 20th break between trials, start a new sublist).

So say I have this data:  

data_list = [(0.0, 1.0), (1.0, 24.0), (24.0, 9.0), (9.0, 17.0), (17.0, 5.0), (5.0, 0.0), (5.0, 0.0), (5.0, 24.0), (24.0, 13.0), (13.0, 0.0), (13.0, 21.0), (21.0, 0.0), (21.0, 0.0), (21.0, 23.0), (23.0, 24.0), (24.0, 10.0), (10.0, 18.0), (18.0, 4.0), (4.0, 22.0), (22.0, 1.0), (1.0, 0.0), (1.0, 24.0), (24.0, 6.0), (6.0, 14.0), (14.0, 5.0), (5.0, 0.0), (5.0, 0.0), (5.0, 0.0), (5.0, 0.0), (5.0, 0.0), (5.0, 0.0), (5.0, 0.0), (5.0, 0.0), (5.0, 0.0), (5.0, 24.0), (24.0, 6.0), (6.0, 14.0), (14.0, 4.0), (4.0, 0.0), (4.0, 22.0), (22.0, 1.0), (1.0, 0.0), (1.0, 24.0), (24.0, 9.0), (9.0, 17.0), (17.0, 4.0), (4.0, 0.0), (4.0, 22.0), (22.0, 1.0), (1.0, 0.0), (1.0, 0.0), (1.0, 24.0), (24.0, 12.0), (12.0, 4.0), (4.0, 0.0), (4.0, 22.0)]  #rest of data truncated...

I'd like to break the list into sublists at the 20th, 40th, and 60th occasions of any tuple that begins with 1.0--so for example, (1.0, 0.0).  This will produce four sub-lists, for trial 1-20, 21-40, 41-60, and 61-80.

What I have, just to get the break points within the data_list, and which is not working is:

trial_break_indexes_list = []  #needed to see where the sublists start
trial_count = 0  #keep count of which trial we're on

trial_break_indexes_list = []  #holds the index of the transitions_list for trials 1-20, 21-40, 41-60, and 61-80
trial_count = 0

for tup in data_list:
    if tup[0] == 1.0: #Therefore the start of a new trial

        #We have a match!  Therefore get the index in the data_list
        data_list_index = data_list.index(tup)

        trial_count += 1  #update the trial count.

        if trial_count % 20 == 0:  #this will match on 0, 20, 40, 60, 80
            trial_break_indexes_list.append(data_list_index)

print 'This is trial_break_indexes_list: ', trial_break_indexes_list

Unfortunately, the final output here is:

>>> 
This is trial_break_indexes_list:  [1, 20, 20, 20, 20, 1, 20, 1]

I sense there is a way more elegant/simpler/Pythonic way to approach this, let alone one that is actually correct, but I don't know of it.  Suggestions appreciated!

Thanks.

[toc] | [next] | [standalone]

#50174

From	Fábio Santos <fabiosantosart@gmail.com>
Date	2013-07-08 22:13 +0100
Message-ID	<mailman.4400.1373318330.3114.python-list@python.org>
In reply to	#50170

[Multipart message — attachments visible in raw view] — view raw

You don't want to use index() to figure out the index of the tuples. It is
slower, and will not find the item you want if there is more than one of
the same. For example,

[1, 4, 4, 4].index(4)

will always be 1, no matter how many times you loop through it.

Instead, use enumerate() to keep track of the index. Replace your loop by:

for index, tup in enumerate(data_list):

This should fix your problem. After you have the correct indices, look into
list slicing syntax.
On 8 Jul 2013 21:59, "CM" <cmpython@gmail.com> wrote:

> I'm looking for a Pythonic way to do the following:
>
> I have data in the form of a long list of tuples.  I would like to break
> that list into four sub-lists.  The break points would be based on the nth
> occasion of a particular tuple.  (The list represents behavioral data
> trials; the particular tuple represents the break between trials; I want to
> collect 20 trials at a time, so every 20th break between trials, start a
> new sublist).
>
> So say I have this data:
>
> data_list = [(0.0, 1.0), (1.0, 24.0), (24.0, 9.0), (9.0, 17.0), (17.0,
> 5.0), (5.0, 0.0), (5.0, 0.0), (5.0, 24.0), (24.0, 13.0), (13.0, 0.0),
> (13.0, 21.0), (21.0, 0.0), (21.0, 0.0), (21.0, 23.0), (23.0, 24.0), (24.0,
> 10.0), (10.0, 18.0), (18.0, 4.0), (4.0, 22.0), (22.0, 1.0), (1.0, 0.0),
> (1.0, 24.0), (24.0, 6.0), (6.0, 14.0), (14.0, 5.0), (5.0, 0.0), (5.0, 0.0),
> (5.0, 0.0), (5.0, 0.0), (5.0, 0.0), (5.0, 0.0), (5.0, 0.0), (5.0, 0.0),
> (5.0, 0.0), (5.0, 24.0), (24.0, 6.0), (6.0, 14.0), (14.0, 4.0), (4.0, 0.0),
> (4.0, 22.0), (22.0, 1.0), (1.0, 0.0), (1.0, 24.0), (24.0, 9.0), (9.0,
> 17.0), (17.0, 4.0), (4.0, 0.0), (4.0, 22.0), (22.0, 1.0), (1.0, 0.0), (1.0,
> 0.0), (1.0, 24.0), (24.0, 12.0), (12.0, 4.0), (4.0, 0.0), (4.0, 22.0)]
>  #rest of data truncated...
>
> I'd like to break the list into sublists at the 20th, 40th, and 60th
> occasions of any tuple that begins with 1.0--so for example, (1.0, 0.0).
>  This will produce four sub-lists, for trial 1-20, 21-40, 41-60, and 61-80.
>
> What I have, just to get the break points within the data_list, and which
> is not working is:
>
> trial_break_indexes_list = []  #needed to see where the sublists start
> trial_count = 0  #keep count of which trial we're on
>
> trial_break_indexes_list = []  #holds the index of the transitions_list
> for trials 1-20, 21-40, 41-60, and 61-80
> trial_count = 0
>
> for tup in data_list:
>     if tup[0] == 1.0: #Therefore the start of a new trial
>
>         #We have a match!  Therefore get the index in the data_list
>         data_list_index = data_list.index(tup)
>
>         trial_count += 1  #update the trial count.
>
>         if trial_count % 20 == 0:  #this will match on 0, 20, 40, 60, 80
>             trial_break_indexes_list.append(data_list_index)
>
> print 'This is trial_break_indexes_list: ', trial_break_indexes_list
>
> Unfortunately, the final output here is:
>
> >>>
> This is trial_break_indexes_list:  [1, 20, 20, 20, 20, 1, 20, 1]
>
> I sense there is a way more elegant/simpler/Pythonic way to approach this,
> let alone one that is actually correct, but I don't know of it.
>  Suggestions appreciated!
>
> Thanks.
> --
> http://mail.python.org/mailman/listinfo/python-list
>

[toc] | [prev] | [next] | [standalone]

#50175

From	Joshua Landau <joshua.landau.ws@gmail.com>
Date	2013-07-08 22:24 +0100
Message-ID	<mailman.4401.1373318697.3114.python-list@python.org>
In reply to	#50170

On 8 July 2013 21:52, CM <cmpython@gmail.com> wrote:
> I'm looking for a Pythonic way to do the following:
>
> I have data in the form of a long list of tuples.  I would like to break that list into four sub-lists.  The break points would be based on the nth occasion of a particular tuple.  (The list represents behavioral data trials; the particular tuple represents the break between trials; I want to collect 20 trials at a time, so every 20th break between trials, start a new sublist).

I would do this like so:

from collections import deque

# Fast and hacky -- just how I like it
exhaust_iterable = deque(maxlen=0).extend

def chunk_of(data, *, length):
    count = 0
    for datum in data:
        count += datum[0] == 1

        yield datum

        if count == 60:
            break

def chunked(data):
    data = iter(data)
    while True:
        chunk = chunk_of(data, length=20)
        yield chunk
        exhaust_iterable(chunk)

You use "chunked(data)" and iterate over the 'chunks' in that. If you
go to the next chunk before finishing the one you're on the previous
chunk will be lost, so convert it to a permanent form first.

Looking at you code:

> for tup in data_list:
>     if tup[0] == 1.0: #Therefore the start of a new trial
>
>         #We have a match!  Therefore get the index in the data_list
>         data_list_index = data_list.index(tup)

This is no good (ninja'd by Fábio). The proper way to keep an index is by:

for index, tup in enumerate(data_list):

>         trial_count += 1  #update the trial count.
>
>         if trial_count % 20 == 0:  #this will match on 0, 20, 40, 60, 80
>             trial_break_indexes_list.append(data_list_index)
>
> print 'This is trial_break_indexes_list: ', trial_break_indexes_list
...
> I sense there is a way more elegant/simpler/Pythonic way to approach this, let alone one that is actually correct, but I don't know of it.  Suggestions appreciated!

Yup.

[toc] | [prev] | [next] | [standalone]

#50176

From	Joshua Landau <joshua.landau.ws@gmail.com>
Date	2013-07-08 22:27 +0100
Message-ID	<mailman.4402.1373318866.3114.python-list@python.org>
In reply to	#50170

On 8 July 2013 22:24, Joshua Landau <joshua.landau.ws@gmail.com> wrote:
>         if count == 60:

Obviously this should be:

if count == length:

[toc] | [prev] | [standalone]

csiph-web

make sublists of a list broken at nth certain list items

Contents

#50170 — make sublists of a list broken at nth certain list items

#50174

#50175

#50176