Groups > comp.lang.python > #33122 > unrolled thread

Re: A gnarly little python loop

Started by	Cameron Simpson <cs@zip.com.au>
First post	2012-11-11 19:48 +1100
Last post	2012-11-11 14:23 -0500
Articles	11 — 5 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: A gnarly little python loop Cameron Simpson <cs@zip.com.au> - 2012-11-11 19:48 +1100
    Re: A gnarly little python loop Paul Rubin <no.email@nospam.invalid> - 2012-11-11 01:09 -0800
      Re: A gnarly little python loop Peter Otten <__peter__@web.de> - 2012-11-11 10:54 +0100
        Re: A gnarly little python loop Steve Howell <showell30@yahoo.com> - 2012-11-11 09:16 -0800
        Re: A gnarly little python loop Steve Howell <showell30@yahoo.com> - 2012-11-11 09:16 -0800
      Re: A gnarly little python loop Steve Howell <showell30@yahoo.com> - 2012-11-11 09:29 -0800
        Re: A gnarly little python loop Peter Otten <__peter__@web.de> - 2012-11-11 19:34 +0100
          Re: A gnarly little python loop Steve Howell <showell30@yahoo.com> - 2012-11-11 11:16 -0800
            Re: A gnarly little python loop Cameron Simpson <cs@zip.com.au> - 2012-11-12 11:43 +1100
              Re: A gnarly little python loop Steve Howell <showell30@yahoo.com> - 2012-11-11 17:38 -0800
          Re: A gnarly little python loop Roy Smith <roy@panix.com> - 2012-11-11 14:23 -0500

#33122 — Re: A gnarly little python loop

From	Cameron Simpson <cs@zip.com.au>
Date	2012-11-11 19:48 +1100
Subject	Re: A gnarly little python loop
Message-ID	<mailman.3555.1352623728.27098.python-list@python.org>

On 11Nov2012 08:56, Stefan Behnel <stefan_ml@behnel.de> wrote:
| Steve Howell, 11.11.2012 04:03:
| > On Nov 10, 2:58 pm, Roy Smith <r...@panix.com> wrote:
| >>     page = 1
| >>     while 1:
| >>         r = api.GetSearch(term="foo", page=page)
| >>         if not r:
| >>             break
| >>         for tweet in r:
| >>             process(tweet)
| >>         page += 1
| >>
| >> It works, but it seems excessively fidgety.  Is there some cleaner way
| >> to refactor this?
| > 
| > I think your code is perfectly readable and clean, but you can flatten
| > it like so:
| > 
| >     def get_tweets(term, get_page):
| >         page_nums = itertools.count(1)
| >         pages = itertools.imap(api.getSearch, page_nums)
| >         valid_pages = itertools.takewhile(bool, pages)
| >         tweets = itertools.chain.from_iterable(valid_pages)
| >         return tweets
| 
| I'd prefer the original code ten times over this inaccessible beast.

Me too.
-- 
Cameron Simpson <cs@zip.com.au>

In an insane society, the sane man must appear insane.
        - Keith A. Schauer <keith@balrog.dseg.ti.com>

[toc] | [next] | [standalone]

#33123

From	Paul Rubin <no.email@nospam.invalid>
Date	2012-11-11 01:09 -0800
Message-ID	<7x4nkwzesu.fsf@ruckus.brouhaha.com>
In reply to	#33122

Cameron Simpson <cs@zip.com.au> writes:
> | I'd prefer the original code ten times over this inaccessible beast.
> Me too.

Me, I like the itertools version better.  There's one chunk of data
that goes through a succession of transforms each of which
is very straightforward.

[toc] | [prev] | [next] | [standalone]

#33125

From	Peter Otten <__peter__@web.de>
Date	2012-11-11 10:54 +0100
Message-ID	<mailman.3557.1352627686.27098.python-list@python.org>
In reply to	#33123

Paul Rubin wrote:

> Cameron Simpson <cs@zip.com.au> writes:
>> | I'd prefer the original code ten times over this inaccessible beast.
>> Me too.
> 
> Me, I like the itertools version better.  There's one chunk of data
> that goes through a succession of transforms each of which
> is very straightforward.

[Steve Howell]
>     def get_tweets(term, get_page):
>         page_nums = itertools.count(1)
>         pages = itertools.imap(api.getSearch, page_nums)
>         valid_pages = itertools.takewhile(bool, pages)
>         tweets = itertools.chain.from_iterable(valid_pages)
>         return tweets
 

But did you spot the bug(s)?
My itertools-based version would look like this

    def get_tweets(term):
        pages = (api.GetSearch(term, pageno)
                 for pageno in itertools.count(1))
        for page in itertools.takewhile(bool, pages):
            yield from page

but I can understand that it's not everybody's cup of tea.

[toc] | [prev] | [next] | [standalone]

#33132

From	Steve Howell <showell30@yahoo.com>
Date	2012-11-11 09:16 -0800
Message-ID	<c1f54b2d-0c2c-48a2-9185-4478ebd40e2f@googlegroups.com>
In reply to	#33125

On Sunday, November 11, 2012 1:54:46 AM UTC-8, Peter Otten wrote:
> Paul Rubin wrote:
> 
> 
> 
> > Cameron Simpson <cs@zip.com.au> writes:
> 
> >> | I'd prefer the original code ten times over this inaccessible beast.
> 
> >> Me too.
> 
> > 
> 
> > Me, I like the itertools version better.  There's one chunk of data
> 
> > that goes through a succession of transforms each of which
> 
> > is very straightforward.
> 
> 
> 
> [Steve Howell]
> 
> >     def get_tweets(term, get_page):
> 
> >         page_nums = itertools.count(1)
> 
> >         pages = itertools.imap(api.getSearch, page_nums)
> 
> >         valid_pages = itertools.takewhile(bool, pages)
> 
> >         tweets = itertools.chain.from_iterable(valid_pages)
> 
> >         return tweets
> 
>  
> 
> 
> 
> But did you spot the bug(s)?
> 

My first version was sketching out the technique, and I don't have handy access to the API.

Here is an improved version:

    def get_tweets(term):
        def get_page(page):
            return getSearch(term, page)
        page_nums = itertools.count(1)
        pages = itertools.imap(get_page, page_nums)
        valid_pages = itertools.takewhile(bool, pages)
        tweets = itertools.chain.from_iterable(valid_pages)
        return tweets

    for tweet in get_tweets("foo"):
            process(tweet)

This is what I used to test it:


    def getSearch(term = "foo", page = 1):
        # simulate api for testing
        if page < 5:
            return [
                'page %d, tweet A for term %s' % (page, term),
                'page %d, tweet B for term %s' % (page, term),
            ]
        else:
            return None

    def process(tweet):
        print tweet

[toc] | [prev] | [next] | [standalone]

#33133

From	Steve Howell <showell30@yahoo.com>
Date	2012-11-11 09:16 -0800
Message-ID	<mailman.3561.1352654170.27098.python-list@python.org>
In reply to	#33125

On Sunday, November 11, 2012 1:54:46 AM UTC-8, Peter Otten wrote:
> Paul Rubin wrote:
> 
> 
> 
> > Cameron Simpson <cs@zip.com.au> writes:
> 
> >> | I'd prefer the original code ten times over this inaccessible beast.
> 
> >> Me too.
> 
> > 
> 
> > Me, I like the itertools version better.  There's one chunk of data
> 
> > that goes through a succession of transforms each of which
> 
> > is very straightforward.
> 
> 
> 
> [Steve Howell]
> 
> >     def get_tweets(term, get_page):
> 
> >         page_nums = itertools.count(1)
> 
> >         pages = itertools.imap(api.getSearch, page_nums)
> 
> >         valid_pages = itertools.takewhile(bool, pages)
> 
> >         tweets = itertools.chain.from_iterable(valid_pages)
> 
> >         return tweets
> 
>  
> 
> 
> 
> But did you spot the bug(s)?
> 

My first version was sketching out the technique, and I don't have handy access to the API.

Here is an improved version:

    def get_tweets(term):
        def get_page(page):
            return getSearch(term, page)
        page_nums = itertools.count(1)
        pages = itertools.imap(get_page, page_nums)
        valid_pages = itertools.takewhile(bool, pages)
        tweets = itertools.chain.from_iterable(valid_pages)
        return tweets

    for tweet in get_tweets("foo"):
            process(tweet)

This is what I used to test it:


    def getSearch(term = "foo", page = 1):
        # simulate api for testing
        if page < 5:
            return [
                'page %d, tweet A for term %s' % (page, term),
                'page %d, tweet B for term %s' % (page, term),
            ]
        else:
            return None

    def process(tweet):
        print tweet

[toc] | [prev] | [next] | [standalone]

#33134

From	Steve Howell <showell30@yahoo.com>
Date	2012-11-11 09:29 -0800
Message-ID	<8be50a3e-0ba6-439f-b445-7dedeacdc1c7@lg12g2000pbb.googlegroups.com>
In reply to	#33123

On Nov 11, 1:09 am, Paul Rubin <no.em...@nospam.invalid> wrote:
> Cameron Simpson <c...@zip.com.au> writes:
> > | I'd prefer the original code ten times over this inaccessible beast.
> > Me too.
>
> Me, I like the itertools version better.  There's one chunk of data
> that goes through a succession of transforms each of which
> is very straightforward.

Thanks, Paul.

Even though I supplied the "inaccessible" itertools version, I can
understand why folks find it inaccessible.  As I said to the OP, there
was nothing wrong with the original imperative approach; I was simply
providing an alternative.

It took me a while to appreciate itertools, but the metaphor that
resonates with me is a Unix pipeline.  It's just a metaphor, so folks
shouldn't be too literal, but the idea here is this:

  page_nums -> pages -> valid_pages -> tweets

The transforms are this:

  page_nums -> pages: call API via imap
  pages -> valid_pages: take while true
  valid_pages -> tweets: use chain.from_iterable to flatten results

Here's the code again for context:

    def get_tweets(term):
        def get_page(page):
            return getSearch(term, page)
        page_nums = itertools.count(1)
        pages = itertools.imap(get_page, page_nums)
        valid_pages = itertools.takewhile(bool, pages)
        tweets = itertools.chain.from_iterable(valid_pages)
        return tweets

[toc] | [prev] | [next] | [standalone]

#33139

From	Peter Otten <__peter__@web.de>
Date	2012-11-11 19:34 +0100
Message-ID	<mailman.3562.1352658857.27098.python-list@python.org>
In reply to	#33134

Steve Howell wrote:

> On Nov 11, 1:09 am, Paul Rubin <no.em...@nospam.invalid> wrote:
>> Cameron Simpson <c...@zip.com.au> writes:
>> > | I'd prefer the original code ten times over this inaccessible beast.
>> > Me too.
>>
>> Me, I like the itertools version better.  There's one chunk of data
>> that goes through a succession of transforms each of which
>> is very straightforward.
> 
> Thanks, Paul.
> 
> Even though I supplied the "inaccessible" itertools version, I can
> understand why folks find it inaccessible.  As I said to the OP, there
> was nothing wrong with the original imperative approach; I was simply
> providing an alternative.
> 
> It took me a while to appreciate itertools, but the metaphor that
> resonates with me is a Unix pipeline.  It's just a metaphor, so folks
> shouldn't be too literal, but the idea here is this:
> 
>   page_nums -> pages -> valid_pages -> tweets
> 
> The transforms are this:
> 
>   page_nums -> pages: call API via imap
>   pages -> valid_pages: take while true
>   valid_pages -> tweets: use chain.from_iterable to flatten results
> 
> Here's the code again for context:
> 
>     def get_tweets(term):
>         def get_page(page):
>             return getSearch(term, page)
>         page_nums = itertools.count(1)
>         pages = itertools.imap(get_page, page_nums)
>         valid_pages = itertools.takewhile(bool, pages)
>         tweets = itertools.chain.from_iterable(valid_pages)
>         return tweets
> 

Actually you supplied the "accessible" itertools version. For reference, 
here's the inaccessible version:

class api:
    """Twitter search API mock-up"""
    pages = [
        ["a", "b", "c"],
        ["d", "e"],
        ]
    @staticmethod
    def GetSearch(term, page):
        assert term == "foo"
        assert page >= 1
        if page > len(api.pages):
            return []
        return api.pages[page-1]

from collections import deque
from functools import partial
from itertools import chain, count, imap, takewhile

def process(tweet):
    print tweet

term = "foo"

deque(
    imap(
        process,
        chain.from_iterable(
            takewhile(bool, imap(partial(api.GetSearch, term), count(1))))),
    maxlen=0)

;)

[toc] | [prev] | [next] | [standalone]

#33140

From	Steve Howell <showell30@yahoo.com>
Date	2012-11-11 11:16 -0800
Message-ID	<3b0b8e3b-6f0a-4337-89ab-235e938952b2@y5g2000pbi.googlegroups.com>
In reply to	#33139

On Nov 11, 10:34 am, Peter Otten <__pete...@web.de> wrote:
> Steve Howell wrote:
> > On Nov 11, 1:09 am, Paul Rubin <no.em...@nospam.invalid> wrote:
> >> Cameron Simpson <c...@zip.com.au> writes:
> >> > | I'd prefer the original code ten times over this inaccessible beast.
> >> > Me too.
>
> >> Me, I like the itertools version better.  There's one chunk of data
> >> that goes through a succession of transforms each of which
> >> is very straightforward.
>
> > Thanks, Paul.
>
> > Even though I supplied the "inaccessible" itertools version, I can
> > understand why folks find it inaccessible.  As I said to the OP, there
> > was nothing wrong with the original imperative approach; I was simply
> > providing an alternative.
>
> > It took me a while to appreciate itertools, but the metaphor that
> > resonates with me is a Unix pipeline.  It's just a metaphor, so folks
> > shouldn't be too literal, but the idea here is this:
>
> >   page_nums -> pages -> valid_pages -> tweets
>
> > The transforms are this:
>
> >   page_nums -> pages: call API via imap
> >   pages -> valid_pages: take while true
> >   valid_pages -> tweets: use chain.from_iterable to flatten results
>
> > Here's the code again for context:
>
> >     def get_tweets(term):
> >         def get_page(page):
> >             return getSearch(term, page)
> >         page_nums = itertools.count(1)
> >         pages = itertools.imap(get_page, page_nums)
> >         valid_pages = itertools.takewhile(bool, pages)
> >         tweets = itertools.chain.from_iterable(valid_pages)
> >         return tweets
>
> Actually you supplied the "accessible" itertools version. For reference,
> here's the inaccessible version:
>
> class api:
>     """Twitter search API mock-up"""
>     pages = [
>         ["a", "b", "c"],
>         ["d", "e"],
>         ]
>     @staticmethod
>     def GetSearch(term, page):
>         assert term == "foo"
>         assert page >= 1
>         if page > len(api.pages):
>             return []
>         return api.pages[page-1]
>
> from collections import deque
> from functools import partial
> from itertools import chain, count, imap, takewhile
>
> def process(tweet):
>     print tweet
>
> term = "foo"
>
> deque(
>     imap(
>         process,
>         chain.from_iterable(
>             takewhile(bool, imap(partial(api.GetSearch, term), count(1))))),
>     maxlen=0)
>
> ;)

I know Peter's version is tongue in cheek, but I do think that it has
a certain expressive power, and it highlights three mind-expanding
Python modules.

Here's a re-flattened take on Peter's version ("Flat is better than
nested." -- PEP 20):

    term = "foo"
    search = partial(api.GetSearch, term)
    nums = count(1)
    paged_tweets = imap(search, nums)
    paged_tweets = takewhile(bool, paged_tweets)
    tweets = chain.from_iterable(paged_tweets)
    processed_tweets = imap(process, tweets)
    deque(processed_tweets, maxlen=0)

The use of deque to exhaust an iterator is slightly overboard IMHO,
but all the other lines of code can be fairly easily understood once
you read the docs.

    partial: http://docs.python.org/2/library/functools.html
    count, imap, takewhile, chain.from_iterable:
http://docs.python.org/2/library/itertools.html
    deque: http://docs.python.org/2/library/collections.html

[toc] | [prev] | [next] | [standalone]

#33156

From	Cameron Simpson <cs@zip.com.au>
Date	2012-11-12 11:43 +1100
Message-ID	<mailman.3569.1352681039.27098.python-list@python.org>
In reply to	#33140

On 11Nov2012 11:16, Steve Howell <showell30@yahoo.com> wrote:
| On Nov 11, 10:34 am, Peter Otten <__pete...@web.de> wrote:
| > Steve Howell wrote:
| > > On Nov 11, 1:09 am, Paul Rubin <no.em...@nospam.invalid> wrote:
| > >> Cameron Simpson <c...@zip.com.au> writes:
| > >> > | I'd prefer the original code ten times over this inaccessible beast.
| > >> > Me too.
| >
| > >> Me, I like the itertools version better.  There's one chunk of data
| > >> that goes through a succession of transforms each of which
| > >> is very straightforward.
| >
| > > Thanks, Paul.
| >
| > > Even though I supplied the "inaccessible" itertools version, I can
| > > understand why folks find it inaccessible.  As I said to the OP, there
| > > was nothing wrong with the original imperative approach; I was simply
| > > providing an alternative.
| >
| > > It took me a while to appreciate itertools, but the metaphor that
| > > resonates with me is a Unix pipeline.
[...]
| > Actually you supplied the "accessible" itertools version. For reference,
| > here's the inaccessible version:
[...]
| I know Peter's version is tongue in cheek, but I do think that it has
| a certain expressive power, and it highlights three mind-expanding
| Python modules.
| Here's a re-flattened take on Peter's version ("Flat is better than
| nested." -- PEP 20):
[...]

Ok, who's going to quiz the OP on his/her uptake of these techniques...
-- 
Cameron Simpson <cs@zip.com.au>

It's hard to make a man understand something when his livelihood depends
on him not understanding it. - Upton Sinclair

[toc] | [prev] | [next] | [standalone]

#33164

From	Steve Howell <showell30@yahoo.com>
Date	2012-11-11 17:38 -0800
Message-ID	<ffba84a4-79aa-48cc-b0ae-de1180dbb099@q5g2000pbk.googlegroups.com>
In reply to	#33156

On Nov 11, 4:44 pm, Cameron Simpson <c...@zip.com.au> wrote:
> On 11Nov2012 11:16, Steve Howell <showel...@yahoo.com> wrote:
> | On Nov 11, 10:34 am, Peter Otten <__pete...@web.de> wrote:
> | > Steve Howell wrote:
> | > > On Nov 11, 1:09 am, Paul Rubin <no.em...@nospam.invalid> wrote:
> | > >> Cameron Simpson <c...@zip.com.au> writes:
> | > >> > | I'd prefer the original code ten times over this inaccessible beast.
> | > >> > Me too.
> | >
> | > >> Me, I like the itertools version better.  There's one chunk of data
> | > >> that goes through a succession of transforms each of which
> | > >> is very straightforward.
> | >
> | > > Thanks, Paul.
> | >
> | > > Even though I supplied the "inaccessible" itertools version, I can
> | > > understand why folks find it inaccessible.  As I said to the OP, there
> | > > was nothing wrong with the original imperative approach; I was simply
> | > > providing an alternative.
> | >
> | > > It took me a while to appreciate itertools, but the metaphor that
> | > > resonates with me is a Unix pipeline.
> [...]
> | > Actually you supplied the "accessible" itertools version. For reference,
> | > here's the inaccessible version:
> [...]
> | I know Peter's version is tongue in cheek, but I do think that it has
> | a certain expressive power, and it highlights three mind-expanding
> | Python modules.
> | Here's a re-flattened take on Peter's version ("Flat is better than
> | nested." -- PEP 20):
> [...]
>
> Ok, who's going to quiz the OP on his/her uptake of these techniques...

Cameron, with all due respect, I think you're missing the point.

Roy posted this code:

    page = 1
    while 1:
        r = api.GetSearch(term="foo", page=page)
        if not r:
            break
        for tweet in r:
            process(tweet)
        page += 1

In his own words, he described the loop as "gnarly" and the overall
code as "fidgety."

One way to eliminate the "while", the "if", and the "break" statements
is to use higher level constructs that are shipped with all modern
versions of Python, and which are well documented and well tested (and
fast, I might add):

    search = partial(api.GetSearch, "foo")
    paged_tweets = imap(search, count(1))
    paged_tweets = takewhile(bool, paged_tweets)
    tweets = chain.from_iterable(paged_tweets)
    for tweet in tweets:
        process(tweet)

The moral of the story is that you can avoid brittle loops by relying
on a well-tested library to work at a higher level of abstraction.

For this particular use case, the imperative version is fine, but for
more complex use cases, the loops are only gonna get more gnarly and
fidgety.

[toc] | [prev] | [next] | [standalone]

#33141

From	Roy Smith <roy@panix.com>
Date	2012-11-11 14:23 -0500
Message-ID	<roy-DA7578.14234611112012@news.panix.com>
In reply to	#33139

In article <mailman.3562.1352658857.27098.python-list@python.org>,
 Peter Otten <__peter__@web.de> wrote:

> deque(
>     imap(
>         process,
>         chain.from_iterable(
>             takewhile(bool, imap(partial(api.GetSearch, term), count(1))))),
>     maxlen=0)
> 
> ;)

If I wanted STL, I would still be writing C++ :-)

[toc] | [prev] | [standalone]

csiph-web

Re: A gnarly little python loop

Contents

#33122 — Re: A gnarly little python loop

#33123

#33125

#33132

#33133

#33134

#33139

#33140

#33156

#33164

#33141