Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #33122 > unrolled thread
| Started by | Cameron Simpson <cs@zip.com.au> |
|---|---|
| First post | 2012-11-11 19:48 +1100 |
| Last post | 2012-11-11 14:23 -0500 |
| Articles | 11 — 5 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: A gnarly little python loop Cameron Simpson <cs@zip.com.au> - 2012-11-11 19:48 +1100
Re: A gnarly little python loop Paul Rubin <no.email@nospam.invalid> - 2012-11-11 01:09 -0800
Re: A gnarly little python loop Peter Otten <__peter__@web.de> - 2012-11-11 10:54 +0100
Re: A gnarly little python loop Steve Howell <showell30@yahoo.com> - 2012-11-11 09:16 -0800
Re: A gnarly little python loop Steve Howell <showell30@yahoo.com> - 2012-11-11 09:16 -0800
Re: A gnarly little python loop Steve Howell <showell30@yahoo.com> - 2012-11-11 09:29 -0800
Re: A gnarly little python loop Peter Otten <__peter__@web.de> - 2012-11-11 19:34 +0100
Re: A gnarly little python loop Steve Howell <showell30@yahoo.com> - 2012-11-11 11:16 -0800
Re: A gnarly little python loop Cameron Simpson <cs@zip.com.au> - 2012-11-12 11:43 +1100
Re: A gnarly little python loop Steve Howell <showell30@yahoo.com> - 2012-11-11 17:38 -0800
Re: A gnarly little python loop Roy Smith <roy@panix.com> - 2012-11-11 14:23 -0500
| From | Cameron Simpson <cs@zip.com.au> |
|---|---|
| Date | 2012-11-11 19:48 +1100 |
| Subject | Re: A gnarly little python loop |
| Message-ID | <mailman.3555.1352623728.27098.python-list@python.org> |
On 11Nov2012 08:56, Stefan Behnel <stefan_ml@behnel.de> wrote:
| Steve Howell, 11.11.2012 04:03:
| > On Nov 10, 2:58 pm, Roy Smith <r...@panix.com> wrote:
| >> page = 1
| >> while 1:
| >> r = api.GetSearch(term="foo", page=page)
| >> if not r:
| >> break
| >> for tweet in r:
| >> process(tweet)
| >> page += 1
| >>
| >> It works, but it seems excessively fidgety. Is there some cleaner way
| >> to refactor this?
| >
| > I think your code is perfectly readable and clean, but you can flatten
| > it like so:
| >
| > def get_tweets(term, get_page):
| > page_nums = itertools.count(1)
| > pages = itertools.imap(api.getSearch, page_nums)
| > valid_pages = itertools.takewhile(bool, pages)
| > tweets = itertools.chain.from_iterable(valid_pages)
| > return tweets
|
| I'd prefer the original code ten times over this inaccessible beast.
Me too.
--
Cameron Simpson <cs@zip.com.au>
In an insane society, the sane man must appear insane.
- Keith A. Schauer <keith@balrog.dseg.ti.com>
[toc] | [next] | [standalone]
| From | Paul Rubin <no.email@nospam.invalid> |
|---|---|
| Date | 2012-11-11 01:09 -0800 |
| Message-ID | <7x4nkwzesu.fsf@ruckus.brouhaha.com> |
| In reply to | #33122 |
Cameron Simpson <cs@zip.com.au> writes: > | I'd prefer the original code ten times over this inaccessible beast. > Me too. Me, I like the itertools version better. There's one chunk of data that goes through a succession of transforms each of which is very straightforward.
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2012-11-11 10:54 +0100 |
| Message-ID | <mailman.3557.1352627686.27098.python-list@python.org> |
| In reply to | #33123 |
Paul Rubin wrote:
> Cameron Simpson <cs@zip.com.au> writes:
>> | I'd prefer the original code ten times over this inaccessible beast.
>> Me too.
>
> Me, I like the itertools version better. There's one chunk of data
> that goes through a succession of transforms each of which
> is very straightforward.
[Steve Howell]
> def get_tweets(term, get_page):
> page_nums = itertools.count(1)
> pages = itertools.imap(api.getSearch, page_nums)
> valid_pages = itertools.takewhile(bool, pages)
> tweets = itertools.chain.from_iterable(valid_pages)
> return tweets
But did you spot the bug(s)?
My itertools-based version would look like this
def get_tweets(term):
pages = (api.GetSearch(term, pageno)
for pageno in itertools.count(1))
for page in itertools.takewhile(bool, pages):
yield from page
but I can understand that it's not everybody's cup of tea.
[toc] | [prev] | [next] | [standalone]
| From | Steve Howell <showell30@yahoo.com> |
|---|---|
| Date | 2012-11-11 09:16 -0800 |
| Message-ID | <c1f54b2d-0c2c-48a2-9185-4478ebd40e2f@googlegroups.com> |
| In reply to | #33125 |
On Sunday, November 11, 2012 1:54:46 AM UTC-8, Peter Otten wrote:
> Paul Rubin wrote:
>
>
>
> > Cameron Simpson <cs@zip.com.au> writes:
>
> >> | I'd prefer the original code ten times over this inaccessible beast.
>
> >> Me too.
>
> >
>
> > Me, I like the itertools version better. There's one chunk of data
>
> > that goes through a succession of transforms each of which
>
> > is very straightforward.
>
>
>
> [Steve Howell]
>
> > def get_tweets(term, get_page):
>
> > page_nums = itertools.count(1)
>
> > pages = itertools.imap(api.getSearch, page_nums)
>
> > valid_pages = itertools.takewhile(bool, pages)
>
> > tweets = itertools.chain.from_iterable(valid_pages)
>
> > return tweets
>
>
>
>
>
> But did you spot the bug(s)?
>
My first version was sketching out the technique, and I don't have handy access to the API.
Here is an improved version:
def get_tweets(term):
def get_page(page):
return getSearch(term, page)
page_nums = itertools.count(1)
pages = itertools.imap(get_page, page_nums)
valid_pages = itertools.takewhile(bool, pages)
tweets = itertools.chain.from_iterable(valid_pages)
return tweets
for tweet in get_tweets("foo"):
process(tweet)
This is what I used to test it:
def getSearch(term = "foo", page = 1):
# simulate api for testing
if page < 5:
return [
'page %d, tweet A for term %s' % (page, term),
'page %d, tweet B for term %s' % (page, term),
]
else:
return None
def process(tweet):
print tweet
[toc] | [prev] | [next] | [standalone]
| From | Steve Howell <showell30@yahoo.com> |
|---|---|
| Date | 2012-11-11 09:16 -0800 |
| Message-ID | <mailman.3561.1352654170.27098.python-list@python.org> |
| In reply to | #33125 |
On Sunday, November 11, 2012 1:54:46 AM UTC-8, Peter Otten wrote:
> Paul Rubin wrote:
>
>
>
> > Cameron Simpson <cs@zip.com.au> writes:
>
> >> | I'd prefer the original code ten times over this inaccessible beast.
>
> >> Me too.
>
> >
>
> > Me, I like the itertools version better. There's one chunk of data
>
> > that goes through a succession of transforms each of which
>
> > is very straightforward.
>
>
>
> [Steve Howell]
>
> > def get_tweets(term, get_page):
>
> > page_nums = itertools.count(1)
>
> > pages = itertools.imap(api.getSearch, page_nums)
>
> > valid_pages = itertools.takewhile(bool, pages)
>
> > tweets = itertools.chain.from_iterable(valid_pages)
>
> > return tweets
>
>
>
>
>
> But did you spot the bug(s)?
>
My first version was sketching out the technique, and I don't have handy access to the API.
Here is an improved version:
def get_tweets(term):
def get_page(page):
return getSearch(term, page)
page_nums = itertools.count(1)
pages = itertools.imap(get_page, page_nums)
valid_pages = itertools.takewhile(bool, pages)
tweets = itertools.chain.from_iterable(valid_pages)
return tweets
for tweet in get_tweets("foo"):
process(tweet)
This is what I used to test it:
def getSearch(term = "foo", page = 1):
# simulate api for testing
if page < 5:
return [
'page %d, tweet A for term %s' % (page, term),
'page %d, tweet B for term %s' % (page, term),
]
else:
return None
def process(tweet):
print tweet
[toc] | [prev] | [next] | [standalone]
| From | Steve Howell <showell30@yahoo.com> |
|---|---|
| Date | 2012-11-11 09:29 -0800 |
| Message-ID | <8be50a3e-0ba6-439f-b445-7dedeacdc1c7@lg12g2000pbb.googlegroups.com> |
| In reply to | #33123 |
On Nov 11, 1:09 am, Paul Rubin <no.em...@nospam.invalid> wrote:
> Cameron Simpson <c...@zip.com.au> writes:
> > | I'd prefer the original code ten times over this inaccessible beast.
> > Me too.
>
> Me, I like the itertools version better. There's one chunk of data
> that goes through a succession of transforms each of which
> is very straightforward.
Thanks, Paul.
Even though I supplied the "inaccessible" itertools version, I can
understand why folks find it inaccessible. As I said to the OP, there
was nothing wrong with the original imperative approach; I was simply
providing an alternative.
It took me a while to appreciate itertools, but the metaphor that
resonates with me is a Unix pipeline. It's just a metaphor, so folks
shouldn't be too literal, but the idea here is this:
page_nums -> pages -> valid_pages -> tweets
The transforms are this:
page_nums -> pages: call API via imap
pages -> valid_pages: take while true
valid_pages -> tweets: use chain.from_iterable to flatten results
Here's the code again for context:
def get_tweets(term):
def get_page(page):
return getSearch(term, page)
page_nums = itertools.count(1)
pages = itertools.imap(get_page, page_nums)
valid_pages = itertools.takewhile(bool, pages)
tweets = itertools.chain.from_iterable(valid_pages)
return tweets
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2012-11-11 19:34 +0100 |
| Message-ID | <mailman.3562.1352658857.27098.python-list@python.org> |
| In reply to | #33134 |
Steve Howell wrote:
> On Nov 11, 1:09 am, Paul Rubin <no.em...@nospam.invalid> wrote:
>> Cameron Simpson <c...@zip.com.au> writes:
>> > | I'd prefer the original code ten times over this inaccessible beast.
>> > Me too.
>>
>> Me, I like the itertools version better. There's one chunk of data
>> that goes through a succession of transforms each of which
>> is very straightforward.
>
> Thanks, Paul.
>
> Even though I supplied the "inaccessible" itertools version, I can
> understand why folks find it inaccessible. As I said to the OP, there
> was nothing wrong with the original imperative approach; I was simply
> providing an alternative.
>
> It took me a while to appreciate itertools, but the metaphor that
> resonates with me is a Unix pipeline. It's just a metaphor, so folks
> shouldn't be too literal, but the idea here is this:
>
> page_nums -> pages -> valid_pages -> tweets
>
> The transforms are this:
>
> page_nums -> pages: call API via imap
> pages -> valid_pages: take while true
> valid_pages -> tweets: use chain.from_iterable to flatten results
>
> Here's the code again for context:
>
> def get_tweets(term):
> def get_page(page):
> return getSearch(term, page)
> page_nums = itertools.count(1)
> pages = itertools.imap(get_page, page_nums)
> valid_pages = itertools.takewhile(bool, pages)
> tweets = itertools.chain.from_iterable(valid_pages)
> return tweets
>
Actually you supplied the "accessible" itertools version. For reference,
here's the inaccessible version:
class api:
"""Twitter search API mock-up"""
pages = [
["a", "b", "c"],
["d", "e"],
]
@staticmethod
def GetSearch(term, page):
assert term == "foo"
assert page >= 1
if page > len(api.pages):
return []
return api.pages[page-1]
from collections import deque
from functools import partial
from itertools import chain, count, imap, takewhile
def process(tweet):
print tweet
term = "foo"
deque(
imap(
process,
chain.from_iterable(
takewhile(bool, imap(partial(api.GetSearch, term), count(1))))),
maxlen=0)
;)
[toc] | [prev] | [next] | [standalone]
| From | Steve Howell <showell30@yahoo.com> |
|---|---|
| Date | 2012-11-11 11:16 -0800 |
| Message-ID | <3b0b8e3b-6f0a-4337-89ab-235e938952b2@y5g2000pbi.googlegroups.com> |
| In reply to | #33139 |
On Nov 11, 10:34 am, Peter Otten <__pete...@web.de> wrote:
> Steve Howell wrote:
> > On Nov 11, 1:09 am, Paul Rubin <no.em...@nospam.invalid> wrote:
> >> Cameron Simpson <c...@zip.com.au> writes:
> >> > | I'd prefer the original code ten times over this inaccessible beast.
> >> > Me too.
>
> >> Me, I like the itertools version better. There's one chunk of data
> >> that goes through a succession of transforms each of which
> >> is very straightforward.
>
> > Thanks, Paul.
>
> > Even though I supplied the "inaccessible" itertools version, I can
> > understand why folks find it inaccessible. As I said to the OP, there
> > was nothing wrong with the original imperative approach; I was simply
> > providing an alternative.
>
> > It took me a while to appreciate itertools, but the metaphor that
> > resonates with me is a Unix pipeline. It's just a metaphor, so folks
> > shouldn't be too literal, but the idea here is this:
>
> > page_nums -> pages -> valid_pages -> tweets
>
> > The transforms are this:
>
> > page_nums -> pages: call API via imap
> > pages -> valid_pages: take while true
> > valid_pages -> tweets: use chain.from_iterable to flatten results
>
> > Here's the code again for context:
>
> > def get_tweets(term):
> > def get_page(page):
> > return getSearch(term, page)
> > page_nums = itertools.count(1)
> > pages = itertools.imap(get_page, page_nums)
> > valid_pages = itertools.takewhile(bool, pages)
> > tweets = itertools.chain.from_iterable(valid_pages)
> > return tweets
>
> Actually you supplied the "accessible" itertools version. For reference,
> here's the inaccessible version:
>
> class api:
> """Twitter search API mock-up"""
> pages = [
> ["a", "b", "c"],
> ["d", "e"],
> ]
> @staticmethod
> def GetSearch(term, page):
> assert term == "foo"
> assert page >= 1
> if page > len(api.pages):
> return []
> return api.pages[page-1]
>
> from collections import deque
> from functools import partial
> from itertools import chain, count, imap, takewhile
>
> def process(tweet):
> print tweet
>
> term = "foo"
>
> deque(
> imap(
> process,
> chain.from_iterable(
> takewhile(bool, imap(partial(api.GetSearch, term), count(1))))),
> maxlen=0)
>
> ;)
I know Peter's version is tongue in cheek, but I do think that it has
a certain expressive power, and it highlights three mind-expanding
Python modules.
Here's a re-flattened take on Peter's version ("Flat is better than
nested." -- PEP 20):
term = "foo"
search = partial(api.GetSearch, term)
nums = count(1)
paged_tweets = imap(search, nums)
paged_tweets = takewhile(bool, paged_tweets)
tweets = chain.from_iterable(paged_tweets)
processed_tweets = imap(process, tweets)
deque(processed_tweets, maxlen=0)
The use of deque to exhaust an iterator is slightly overboard IMHO,
but all the other lines of code can be fairly easily understood once
you read the docs.
partial: http://docs.python.org/2/library/functools.html
count, imap, takewhile, chain.from_iterable:
http://docs.python.org/2/library/itertools.html
deque: http://docs.python.org/2/library/collections.html
[toc] | [prev] | [next] | [standalone]
| From | Cameron Simpson <cs@zip.com.au> |
|---|---|
| Date | 2012-11-12 11:43 +1100 |
| Message-ID | <mailman.3569.1352681039.27098.python-list@python.org> |
| In reply to | #33140 |
On 11Nov2012 11:16, Steve Howell <showell30@yahoo.com> wrote:
| On Nov 11, 10:34 am, Peter Otten <__pete...@web.de> wrote:
| > Steve Howell wrote:
| > > On Nov 11, 1:09 am, Paul Rubin <no.em...@nospam.invalid> wrote:
| > >> Cameron Simpson <c...@zip.com.au> writes:
| > >> > | I'd prefer the original code ten times over this inaccessible beast.
| > >> > Me too.
| >
| > >> Me, I like the itertools version better. There's one chunk of data
| > >> that goes through a succession of transforms each of which
| > >> is very straightforward.
| >
| > > Thanks, Paul.
| >
| > > Even though I supplied the "inaccessible" itertools version, I can
| > > understand why folks find it inaccessible. As I said to the OP, there
| > > was nothing wrong with the original imperative approach; I was simply
| > > providing an alternative.
| >
| > > It took me a while to appreciate itertools, but the metaphor that
| > > resonates with me is a Unix pipeline.
[...]
| > Actually you supplied the "accessible" itertools version. For reference,
| > here's the inaccessible version:
[...]
| I know Peter's version is tongue in cheek, but I do think that it has
| a certain expressive power, and it highlights three mind-expanding
| Python modules.
| Here's a re-flattened take on Peter's version ("Flat is better than
| nested." -- PEP 20):
[...]
Ok, who's going to quiz the OP on his/her uptake of these techniques...
--
Cameron Simpson <cs@zip.com.au>
It's hard to make a man understand something when his livelihood depends
on him not understanding it. - Upton Sinclair
[toc] | [prev] | [next] | [standalone]
| From | Steve Howell <showell30@yahoo.com> |
|---|---|
| Date | 2012-11-11 17:38 -0800 |
| Message-ID | <ffba84a4-79aa-48cc-b0ae-de1180dbb099@q5g2000pbk.googlegroups.com> |
| In reply to | #33156 |
On Nov 11, 4:44 pm, Cameron Simpson <c...@zip.com.au> wrote:
> On 11Nov2012 11:16, Steve Howell <showel...@yahoo.com> wrote:
> | On Nov 11, 10:34 am, Peter Otten <__pete...@web.de> wrote:
> | > Steve Howell wrote:
> | > > On Nov 11, 1:09 am, Paul Rubin <no.em...@nospam.invalid> wrote:
> | > >> Cameron Simpson <c...@zip.com.au> writes:
> | > >> > | I'd prefer the original code ten times over this inaccessible beast.
> | > >> > Me too.
> | >
> | > >> Me, I like the itertools version better. There's one chunk of data
> | > >> that goes through a succession of transforms each of which
> | > >> is very straightforward.
> | >
> | > > Thanks, Paul.
> | >
> | > > Even though I supplied the "inaccessible" itertools version, I can
> | > > understand why folks find it inaccessible. As I said to the OP, there
> | > > was nothing wrong with the original imperative approach; I was simply
> | > > providing an alternative.
> | >
> | > > It took me a while to appreciate itertools, but the metaphor that
> | > > resonates with me is a Unix pipeline.
> [...]
> | > Actually you supplied the "accessible" itertools version. For reference,
> | > here's the inaccessible version:
> [...]
> | I know Peter's version is tongue in cheek, but I do think that it has
> | a certain expressive power, and it highlights three mind-expanding
> | Python modules.
> | Here's a re-flattened take on Peter's version ("Flat is better than
> | nested." -- PEP 20):
> [...]
>
> Ok, who's going to quiz the OP on his/her uptake of these techniques...
Cameron, with all due respect, I think you're missing the point.
Roy posted this code:
page = 1
while 1:
r = api.GetSearch(term="foo", page=page)
if not r:
break
for tweet in r:
process(tweet)
page += 1
In his own words, he described the loop as "gnarly" and the overall
code as "fidgety."
One way to eliminate the "while", the "if", and the "break" statements
is to use higher level constructs that are shipped with all modern
versions of Python, and which are well documented and well tested (and
fast, I might add):
search = partial(api.GetSearch, "foo")
paged_tweets = imap(search, count(1))
paged_tweets = takewhile(bool, paged_tweets)
tweets = chain.from_iterable(paged_tweets)
for tweet in tweets:
process(tweet)
The moral of the story is that you can avoid brittle loops by relying
on a well-tested library to work at a higher level of abstraction.
For this particular use case, the imperative version is fine, but for
more complex use cases, the loops are only gonna get more gnarly and
fidgety.
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2012-11-11 14:23 -0500 |
| Message-ID | <roy-DA7578.14234611112012@news.panix.com> |
| In reply to | #33139 |
In article <mailman.3562.1352658857.27098.python-list@python.org>, Peter Otten <__peter__@web.de> wrote: > deque( > imap( > process, > chain.from_iterable( > takewhile(bool, imap(partial(api.GetSearch, term), count(1))))), > maxlen=0) > > ;) If I wanted STL, I would still be writing C++ :-)
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web