Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #33122 > unrolled thread

Re: A gnarly little python loop

Started byCameron Simpson <cs@zip.com.au>
First post2012-11-11 19:48 +1100
Last post2012-11-11 14:23 -0500
Articles 11 — 5 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: A gnarly little python loop Cameron Simpson <cs@zip.com.au> - 2012-11-11 19:48 +1100
    Re: A gnarly little python loop Paul Rubin <no.email@nospam.invalid> - 2012-11-11 01:09 -0800
      Re: A gnarly little python loop Peter Otten <__peter__@web.de> - 2012-11-11 10:54 +0100
        Re: A gnarly little python loop Steve Howell <showell30@yahoo.com> - 2012-11-11 09:16 -0800
        Re: A gnarly little python loop Steve Howell <showell30@yahoo.com> - 2012-11-11 09:16 -0800
      Re: A gnarly little python loop Steve Howell <showell30@yahoo.com> - 2012-11-11 09:29 -0800
        Re: A gnarly little python loop Peter Otten <__peter__@web.de> - 2012-11-11 19:34 +0100
          Re: A gnarly little python loop Steve Howell <showell30@yahoo.com> - 2012-11-11 11:16 -0800
            Re: A gnarly little python loop Cameron Simpson <cs@zip.com.au> - 2012-11-12 11:43 +1100
              Re: A gnarly little python loop Steve Howell <showell30@yahoo.com> - 2012-11-11 17:38 -0800
          Re: A gnarly little python loop Roy Smith <roy@panix.com> - 2012-11-11 14:23 -0500

#33122 — Re: A gnarly little python loop

FromCameron Simpson <cs@zip.com.au>
Date2012-11-11 19:48 +1100
SubjectRe: A gnarly little python loop
Message-ID<mailman.3555.1352623728.27098.python-list@python.org>
On 11Nov2012 08:56, Stefan Behnel <stefan_ml@behnel.de> wrote:
| Steve Howell, 11.11.2012 04:03:
| > On Nov 10, 2:58 pm, Roy Smith <r...@panix.com> wrote:
| >>     page = 1
| >>     while 1:
| >>         r = api.GetSearch(term="foo", page=page)
| >>         if not r:
| >>             break
| >>         for tweet in r:
| >>             process(tweet)
| >>         page += 1
| >>
| >> It works, but it seems excessively fidgety.  Is there some cleaner way
| >> to refactor this?
| > 
| > I think your code is perfectly readable and clean, but you can flatten
| > it like so:
| > 
| >     def get_tweets(term, get_page):
| >         page_nums = itertools.count(1)
| >         pages = itertools.imap(api.getSearch, page_nums)
| >         valid_pages = itertools.takewhile(bool, pages)
| >         tweets = itertools.chain.from_iterable(valid_pages)
| >         return tweets
| 
| I'd prefer the original code ten times over this inaccessible beast.

Me too.
-- 
Cameron Simpson <cs@zip.com.au>

In an insane society, the sane man must appear insane.
        - Keith A. Schauer <keith@balrog.dseg.ti.com>

[toc] | [next] | [standalone]


#33123

FromPaul Rubin <no.email@nospam.invalid>
Date2012-11-11 01:09 -0800
Message-ID<7x4nkwzesu.fsf@ruckus.brouhaha.com>
In reply to#33122
Cameron Simpson <cs@zip.com.au> writes:
> | I'd prefer the original code ten times over this inaccessible beast.
> Me too.

Me, I like the itertools version better.  There's one chunk of data
that goes through a succession of transforms each of which
is very straightforward.

[toc] | [prev] | [next] | [standalone]


#33125

FromPeter Otten <__peter__@web.de>
Date2012-11-11 10:54 +0100
Message-ID<mailman.3557.1352627686.27098.python-list@python.org>
In reply to#33123
Paul Rubin wrote:

> Cameron Simpson <cs@zip.com.au> writes:
>> | I'd prefer the original code ten times over this inaccessible beast.
>> Me too.
> 
> Me, I like the itertools version better.  There's one chunk of data
> that goes through a succession of transforms each of which
> is very straightforward.

[Steve Howell]
>     def get_tweets(term, get_page):
>         page_nums = itertools.count(1)
>         pages = itertools.imap(api.getSearch, page_nums)
>         valid_pages = itertools.takewhile(bool, pages)
>         tweets = itertools.chain.from_iterable(valid_pages)
>         return tweets
 

But did you spot the bug(s)?
My itertools-based version would look like this

    def get_tweets(term):
        pages = (api.GetSearch(term, pageno)
                 for pageno in itertools.count(1))
        for page in itertools.takewhile(bool, pages):
            yield from page

but I can understand that it's not everybody's cup of tea.

[toc] | [prev] | [next] | [standalone]


#33132

FromSteve Howell <showell30@yahoo.com>
Date2012-11-11 09:16 -0800
Message-ID<c1f54b2d-0c2c-48a2-9185-4478ebd40e2f@googlegroups.com>
In reply to#33125
On Sunday, November 11, 2012 1:54:46 AM UTC-8, Peter Otten wrote:
> Paul Rubin wrote:
> 
> 
> 
> > Cameron Simpson <cs@zip.com.au> writes:
> 
> >> | I'd prefer the original code ten times over this inaccessible beast.
> 
> >> Me too.
> 
> > 
> 
> > Me, I like the itertools version better.  There's one chunk of data
> 
> > that goes through a succession of transforms each of which
> 
> > is very straightforward.
> 
> 
> 
> [Steve Howell]
> 
> >     def get_tweets(term, get_page):
> 
> >         page_nums = itertools.count(1)
> 
> >         pages = itertools.imap(api.getSearch, page_nums)
> 
> >         valid_pages = itertools.takewhile(bool, pages)
> 
> >         tweets = itertools.chain.from_iterable(valid_pages)
> 
> >         return tweets
> 
>  
> 
> 
> 
> But did you spot the bug(s)?
> 

My first version was sketching out the technique, and I don't have handy access to the API.

Here is an improved version:

    def get_tweets(term):
        def get_page(page):
            return getSearch(term, page)
        page_nums = itertools.count(1)
        pages = itertools.imap(get_page, page_nums)
        valid_pages = itertools.takewhile(bool, pages)
        tweets = itertools.chain.from_iterable(valid_pages)
        return tweets

    for tweet in get_tweets("foo"):
            process(tweet)

This is what I used to test it:


    def getSearch(term = "foo", page = 1):
        # simulate api for testing
        if page < 5:
            return [
                'page %d, tweet A for term %s' % (page, term),
                'page %d, tweet B for term %s' % (page, term),
            ]
        else:
            return None

    def process(tweet):
        print tweet

[toc] | [prev] | [next] | [standalone]


#33133

FromSteve Howell <showell30@yahoo.com>
Date2012-11-11 09:16 -0800
Message-ID<mailman.3561.1352654170.27098.python-list@python.org>
In reply to#33125
On Sunday, November 11, 2012 1:54:46 AM UTC-8, Peter Otten wrote:
> Paul Rubin wrote:
> 
> 
> 
> > Cameron Simpson <cs@zip.com.au> writes:
> 
> >> | I'd prefer the original code ten times over this inaccessible beast.
> 
> >> Me too.
> 
> > 
> 
> > Me, I like the itertools version better.  There's one chunk of data
> 
> > that goes through a succession of transforms each of which
> 
> > is very straightforward.
> 
> 
> 
> [Steve Howell]
> 
> >     def get_tweets(term, get_page):
> 
> >         page_nums = itertools.count(1)
> 
> >         pages = itertools.imap(api.getSearch, page_nums)
> 
> >         valid_pages = itertools.takewhile(bool, pages)
> 
> >         tweets = itertools.chain.from_iterable(valid_pages)
> 
> >         return tweets
> 
>  
> 
> 
> 
> But did you spot the bug(s)?
> 

My first version was sketching out the technique, and I don't have handy access to the API.

Here is an improved version:

    def get_tweets(term):
        def get_page(page):
            return getSearch(term, page)
        page_nums = itertools.count(1)
        pages = itertools.imap(get_page, page_nums)
        valid_pages = itertools.takewhile(bool, pages)
        tweets = itertools.chain.from_iterable(valid_pages)
        return tweets

    for tweet in get_tweets("foo"):
            process(tweet)

This is what I used to test it:


    def getSearch(term = "foo", page = 1):
        # simulate api for testing
        if page < 5:
            return [
                'page %d, tweet A for term %s' % (page, term),
                'page %d, tweet B for term %s' % (page, term),
            ]
        else:
            return None

    def process(tweet):
        print tweet

[toc] | [prev] | [next] | [standalone]


#33134

FromSteve Howell <showell30@yahoo.com>
Date2012-11-11 09:29 -0800
Message-ID<8be50a3e-0ba6-439f-b445-7dedeacdc1c7@lg12g2000pbb.googlegroups.com>
In reply to#33123
On Nov 11, 1:09 am, Paul Rubin <no.em...@nospam.invalid> wrote:
> Cameron Simpson <c...@zip.com.au> writes:
> > | I'd prefer the original code ten times over this inaccessible beast.
> > Me too.
>
> Me, I like the itertools version better.  There's one chunk of data
> that goes through a succession of transforms each of which
> is very straightforward.

Thanks, Paul.

Even though I supplied the "inaccessible" itertools version, I can
understand why folks find it inaccessible.  As I said to the OP, there
was nothing wrong with the original imperative approach; I was simply
providing an alternative.

It took me a while to appreciate itertools, but the metaphor that
resonates with me is a Unix pipeline.  It's just a metaphor, so folks
shouldn't be too literal, but the idea here is this:

  page_nums -> pages -> valid_pages -> tweets

The transforms are this:

  page_nums -> pages: call API via imap
  pages -> valid_pages: take while true
  valid_pages -> tweets: use chain.from_iterable to flatten results

Here's the code again for context:

    def get_tweets(term):
        def get_page(page):
            return getSearch(term, page)
        page_nums = itertools.count(1)
        pages = itertools.imap(get_page, page_nums)
        valid_pages = itertools.takewhile(bool, pages)
        tweets = itertools.chain.from_iterable(valid_pages)
        return tweets

[toc] | [prev] | [next] | [standalone]


#33139

FromPeter Otten <__peter__@web.de>
Date2012-11-11 19:34 +0100
Message-ID<mailman.3562.1352658857.27098.python-list@python.org>
In reply to#33134
Steve Howell wrote:

> On Nov 11, 1:09 am, Paul Rubin <no.em...@nospam.invalid> wrote:
>> Cameron Simpson <c...@zip.com.au> writes:
>> > | I'd prefer the original code ten times over this inaccessible beast.
>> > Me too.
>>
>> Me, I like the itertools version better.  There's one chunk of data
>> that goes through a succession of transforms each of which
>> is very straightforward.
> 
> Thanks, Paul.
> 
> Even though I supplied the "inaccessible" itertools version, I can
> understand why folks find it inaccessible.  As I said to the OP, there
> was nothing wrong with the original imperative approach; I was simply
> providing an alternative.
> 
> It took me a while to appreciate itertools, but the metaphor that
> resonates with me is a Unix pipeline.  It's just a metaphor, so folks
> shouldn't be too literal, but the idea here is this:
> 
>   page_nums -> pages -> valid_pages -> tweets
> 
> The transforms are this:
> 
>   page_nums -> pages: call API via imap
>   pages -> valid_pages: take while true
>   valid_pages -> tweets: use chain.from_iterable to flatten results
> 
> Here's the code again for context:
> 
>     def get_tweets(term):
>         def get_page(page):
>             return getSearch(term, page)
>         page_nums = itertools.count(1)
>         pages = itertools.imap(get_page, page_nums)
>         valid_pages = itertools.takewhile(bool, pages)
>         tweets = itertools.chain.from_iterable(valid_pages)
>         return tweets
> 

Actually you supplied the "accessible" itertools version. For reference, 
here's the inaccessible version:

class api:
    """Twitter search API mock-up"""
    pages = [
        ["a", "b", "c"],
        ["d", "e"],
        ]
    @staticmethod
    def GetSearch(term, page):
        assert term == "foo"
        assert page >= 1
        if page > len(api.pages):
            return []
        return api.pages[page-1]

from collections import deque
from functools import partial
from itertools import chain, count, imap, takewhile

def process(tweet):
    print tweet

term = "foo"

deque(
    imap(
        process,
        chain.from_iterable(
            takewhile(bool, imap(partial(api.GetSearch, term), count(1))))),
    maxlen=0)

;)

[toc] | [prev] | [next] | [standalone]


#33140

FromSteve Howell <showell30@yahoo.com>
Date2012-11-11 11:16 -0800
Message-ID<3b0b8e3b-6f0a-4337-89ab-235e938952b2@y5g2000pbi.googlegroups.com>
In reply to#33139
On Nov 11, 10:34 am, Peter Otten <__pete...@web.de> wrote:
> Steve Howell wrote:
> > On Nov 11, 1:09 am, Paul Rubin <no.em...@nospam.invalid> wrote:
> >> Cameron Simpson <c...@zip.com.au> writes:
> >> > | I'd prefer the original code ten times over this inaccessible beast.
> >> > Me too.
>
> >> Me, I like the itertools version better.  There's one chunk of data
> >> that goes through a succession of transforms each of which
> >> is very straightforward.
>
> > Thanks, Paul.
>
> > Even though I supplied the "inaccessible" itertools version, I can
> > understand why folks find it inaccessible.  As I said to the OP, there
> > was nothing wrong with the original imperative approach; I was simply
> > providing an alternative.
>
> > It took me a while to appreciate itertools, but the metaphor that
> > resonates with me is a Unix pipeline.  It's just a metaphor, so folks
> > shouldn't be too literal, but the idea here is this:
>
> >   page_nums -> pages -> valid_pages -> tweets
>
> > The transforms are this:
>
> >   page_nums -> pages: call API via imap
> >   pages -> valid_pages: take while true
> >   valid_pages -> tweets: use chain.from_iterable to flatten results
>
> > Here's the code again for context:
>
> >     def get_tweets(term):
> >         def get_page(page):
> >             return getSearch(term, page)
> >         page_nums = itertools.count(1)
> >         pages = itertools.imap(get_page, page_nums)
> >         valid_pages = itertools.takewhile(bool, pages)
> >         tweets = itertools.chain.from_iterable(valid_pages)
> >         return tweets
>
> Actually you supplied the "accessible" itertools version. For reference,
> here's the inaccessible version:
>
> class api:
>     """Twitter search API mock-up"""
>     pages = [
>         ["a", "b", "c"],
>         ["d", "e"],
>         ]
>     @staticmethod
>     def GetSearch(term, page):
>         assert term == "foo"
>         assert page >= 1
>         if page > len(api.pages):
>             return []
>         return api.pages[page-1]
>
> from collections import deque
> from functools import partial
> from itertools import chain, count, imap, takewhile
>
> def process(tweet):
>     print tweet
>
> term = "foo"
>
> deque(
>     imap(
>         process,
>         chain.from_iterable(
>             takewhile(bool, imap(partial(api.GetSearch, term), count(1))))),
>     maxlen=0)
>
> ;)

I know Peter's version is tongue in cheek, but I do think that it has
a certain expressive power, and it highlights three mind-expanding
Python modules.

Here's a re-flattened take on Peter's version ("Flat is better than
nested." -- PEP 20):

    term = "foo"
    search = partial(api.GetSearch, term)
    nums = count(1)
    paged_tweets = imap(search, nums)
    paged_tweets = takewhile(bool, paged_tweets)
    tweets = chain.from_iterable(paged_tweets)
    processed_tweets = imap(process, tweets)
    deque(processed_tweets, maxlen=0)

The use of deque to exhaust an iterator is slightly overboard IMHO,
but all the other lines of code can be fairly easily understood once
you read the docs.

    partial: http://docs.python.org/2/library/functools.html
    count, imap, takewhile, chain.from_iterable:
http://docs.python.org/2/library/itertools.html
    deque: http://docs.python.org/2/library/collections.html

[toc] | [prev] | [next] | [standalone]


#33156

FromCameron Simpson <cs@zip.com.au>
Date2012-11-12 11:43 +1100
Message-ID<mailman.3569.1352681039.27098.python-list@python.org>
In reply to#33140
On 11Nov2012 11:16, Steve Howell <showell30@yahoo.com> wrote:
| On Nov 11, 10:34 am, Peter Otten <__pete...@web.de> wrote:
| > Steve Howell wrote:
| > > On Nov 11, 1:09 am, Paul Rubin <no.em...@nospam.invalid> wrote:
| > >> Cameron Simpson <c...@zip.com.au> writes:
| > >> > | I'd prefer the original code ten times over this inaccessible beast.
| > >> > Me too.
| >
| > >> Me, I like the itertools version better.  There's one chunk of data
| > >> that goes through a succession of transforms each of which
| > >> is very straightforward.
| >
| > > Thanks, Paul.
| >
| > > Even though I supplied the "inaccessible" itertools version, I can
| > > understand why folks find it inaccessible.  As I said to the OP, there
| > > was nothing wrong with the original imperative approach; I was simply
| > > providing an alternative.
| >
| > > It took me a while to appreciate itertools, but the metaphor that
| > > resonates with me is a Unix pipeline.
[...]
| > Actually you supplied the "accessible" itertools version. For reference,
| > here's the inaccessible version:
[...]
| I know Peter's version is tongue in cheek, but I do think that it has
| a certain expressive power, and it highlights three mind-expanding
| Python modules.
| Here's a re-flattened take on Peter's version ("Flat is better than
| nested." -- PEP 20):
[...]

Ok, who's going to quiz the OP on his/her uptake of these techniques...
-- 
Cameron Simpson <cs@zip.com.au>

It's hard to make a man understand something when his livelihood depends
on him not understanding it. - Upton Sinclair

[toc] | [prev] | [next] | [standalone]


#33164

FromSteve Howell <showell30@yahoo.com>
Date2012-11-11 17:38 -0800
Message-ID<ffba84a4-79aa-48cc-b0ae-de1180dbb099@q5g2000pbk.googlegroups.com>
In reply to#33156
On Nov 11, 4:44 pm, Cameron Simpson <c...@zip.com.au> wrote:
> On 11Nov2012 11:16, Steve Howell <showel...@yahoo.com> wrote:
> | On Nov 11, 10:34 am, Peter Otten <__pete...@web.de> wrote:
> | > Steve Howell wrote:
> | > > On Nov 11, 1:09 am, Paul Rubin <no.em...@nospam.invalid> wrote:
> | > >> Cameron Simpson <c...@zip.com.au> writes:
> | > >> > | I'd prefer the original code ten times over this inaccessible beast.
> | > >> > Me too.
> | >
> | > >> Me, I like the itertools version better.  There's one chunk of data
> | > >> that goes through a succession of transforms each of which
> | > >> is very straightforward.
> | >
> | > > Thanks, Paul.
> | >
> | > > Even though I supplied the "inaccessible" itertools version, I can
> | > > understand why folks find it inaccessible.  As I said to the OP, there
> | > > was nothing wrong with the original imperative approach; I was simply
> | > > providing an alternative.
> | >
> | > > It took me a while to appreciate itertools, but the metaphor that
> | > > resonates with me is a Unix pipeline.
> [...]
> | > Actually you supplied the "accessible" itertools version. For reference,
> | > here's the inaccessible version:
> [...]
> | I know Peter's version is tongue in cheek, but I do think that it has
> | a certain expressive power, and it highlights three mind-expanding
> | Python modules.
> | Here's a re-flattened take on Peter's version ("Flat is better than
> | nested." -- PEP 20):
> [...]
>
> Ok, who's going to quiz the OP on his/her uptake of these techniques...

Cameron, with all due respect, I think you're missing the point.

Roy posted this code:

    page = 1
    while 1:
        r = api.GetSearch(term="foo", page=page)
        if not r:
            break
        for tweet in r:
            process(tweet)
        page += 1

In his own words, he described the loop as "gnarly" and the overall
code as "fidgety."

One way to eliminate the "while", the "if", and the "break" statements
is to use higher level constructs that are shipped with all modern
versions of Python, and which are well documented and well tested (and
fast, I might add):

    search = partial(api.GetSearch, "foo")
    paged_tweets = imap(search, count(1))
    paged_tweets = takewhile(bool, paged_tweets)
    tweets = chain.from_iterable(paged_tweets)
    for tweet in tweets:
        process(tweet)

The moral of the story is that you can avoid brittle loops by relying
on a well-tested library to work at a higher level of abstraction.

For this particular use case, the imperative version is fine, but for
more complex use cases, the loops are only gonna get more gnarly and
fidgety.


[toc] | [prev] | [next] | [standalone]


#33141

FromRoy Smith <roy@panix.com>
Date2012-11-11 14:23 -0500
Message-ID<roy-DA7578.14234611112012@news.panix.com>
In reply to#33139
In article <mailman.3562.1352658857.27098.python-list@python.org>,
 Peter Otten <__peter__@web.de> wrote:
 
> deque(
>     imap(
>         process,
>         chain.from_iterable(
>             takewhile(bool, imap(partial(api.GetSearch, term), count(1))))),
>     maxlen=0)
> 
> ;)

If I wanted STL, I would still be writing C++ :-)

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web