Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #33107 > unrolled thread
| Started by | Roy Smith <roy@panix.com> |
|---|---|
| First post | 2012-11-10 17:58 -0500 |
| Last post | 2012-11-12 20:14 -0800 |
| Articles | 10 — 8 participants |
Back to article view | Back to comp.lang.python
A gnarly little python loop Roy Smith <roy@panix.com> - 2012-11-10 17:58 -0500
Re: A gnarly little python loop Ian Kelly <ian.g.kelly@gmail.com> - 2012-11-10 16:17 -0700
Re: A gnarly little python loop Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-11-11 00:23 +0000
Re: A gnarly little python loop Steve Howell <showell@domaintools.com> - 2012-11-10 19:03 -0800
Re: A gnarly little python loop Stefan Behnel <stefan_ml@behnel.de> - 2012-11-11 08:56 +0100
Re: A gnarly little python loop rusi <rustompmody@gmail.com> - 2012-11-11 23:09 -0800
Re: A gnarly little python loop rusi <rustompmody@gmail.com> - 2012-11-12 07:21 -0800
Re: A gnarly little python loop Peter Otten <__peter__@web.de> - 2012-11-12 16:49 +0100
Re: A gnarly little python loop Steve Howell <showell30@yahoo.com> - 2012-11-12 08:09 -0800
Re: A gnarly little python loop rusi <rustompmody@gmail.com> - 2012-11-12 20:14 -0800
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2012-11-10 17:58 -0500 |
| Subject | A gnarly little python loop |
| Message-ID | <roy-9EBEAD.17581410112012@news.panix.com> |
I'm trying to pull down tweets with one of the many twitter APIs. The
particular one I'm using (python-twitter), has a call:
data = api.GetSearch(term="foo", page=page)
The way it works, you start with page=1. It returns a list of tweets.
If the list is empty, there are no more tweets. If the list is not
empty, you can try to get more tweets by asking for page=2, page=3, etc.
I've got:
page = 1
while 1:
r = api.GetSearch(term="foo", page=page)
if not r:
break
for tweet in r:
process(tweet)
page += 1
It works, but it seems excessively fidgety. Is there some cleaner way
to refactor this?
[toc] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2012-11-10 16:17 -0700 |
| Message-ID | <mailman.3546.1352589460.27098.python-list@python.org> |
| In reply to | #33107 |
On Sat, Nov 10, 2012 at 3:58 PM, Roy Smith <roy@panix.com> wrote:
> I'm trying to pull down tweets with one of the many twitter APIs. The
> particular one I'm using (python-twitter), has a call:
>
> data = api.GetSearch(term="foo", page=page)
>
> The way it works, you start with page=1. It returns a list of tweets.
> If the list is empty, there are no more tweets. If the list is not
> empty, you can try to get more tweets by asking for page=2, page=3, etc.
> I've got:
>
> page = 1
> while 1:
> r = api.GetSearch(term="foo", page=page)
> if not r:
> break
> for tweet in r:
> process(tweet)
> page += 1
>
> It works, but it seems excessively fidgety. Is there some cleaner way
> to refactor this?
I'd do something like this:
def get_tweets(term):
for page in itertools.count(1):
r = api.GetSearch(term, page)
if not r:
break
for tweet in r:
yield tweet
for tweet in get_tweets("foo"):
process(tweet)
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-11-11 00:23 +0000 |
| Message-ID | <509eefeb$0$29980$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #33107 |
On Sat, 10 Nov 2012 17:58:14 -0500, Roy Smith wrote:
> The way it works, you start with page=1. It returns a list of tweets.
> If the list is empty, there are no more tweets. If the list is not
> empty, you can try to get more tweets by asking for page=2, page=3, etc.
> I've got:
>
> page = 1
> while 1:
> r = api.GetSearch(term="foo", page=page)
> if not r:
> break
> for tweet in r:
> process(tweet)
> page += 1
>
> It works, but it seems excessively fidgety. Is there some cleaner way
> to refactor this?
Seems clean enough to me. It does exactly what you need: loop until there
are no more tweets, process each tweet.
If you're allergic to nested loops, move the inner for-loop into a
function. Also you could get rid of the "if r: break".
page = 1
r = ["placeholder"]
while r:
r = api.GetSearch(term="foo", page=page)
process_all(tweets) # does nothing if r is empty
page += 1
Another way would be to use a for list for the outer loop.
for page in xrange(1, sys.maxint):
r = api.GetSearch(term="foo", page=page)
if not r: break
process_all(r)
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Steve Howell <showell@domaintools.com> |
|---|---|
| Date | 2012-11-10 19:03 -0800 |
| Message-ID | <5a260a79-818d-47a8-9404-37b014587730@px4g2000pbc.googlegroups.com> |
| In reply to | #33107 |
On Nov 10, 2:58 pm, Roy Smith <r...@panix.com> wrote:
> I'm trying to pull down tweets with one of the many twitter APIs. The
> particular one I'm using (python-twitter), has a call:
>
> data = api.GetSearch(term="foo", page=page)
>
> The way it works, you start with page=1. It returns a list of tweets.
> If the list is empty, there are no more tweets. If the list is not
> empty, you can try to get more tweets by asking for page=2, page=3, etc.
> I've got:
>
> page = 1
> while 1:
> r = api.GetSearch(term="foo", page=page)
> if not r:
> break
> for tweet in r:
> process(tweet)
> page += 1
>
> It works, but it seems excessively fidgety. Is there some cleaner way
> to refactor this?
I think your code is perfectly readable and clean, but you can flatten
it like so:
def get_tweets(term, get_page):
page_nums = itertools.count(1)
pages = itertools.imap(api.getSearch, page_nums)
valid_pages = itertools.takewhile(bool, pages)
tweets = itertools.chain.from_iterable(valid_pages)
return tweets
[toc] | [prev] | [next] | [standalone]
| From | Stefan Behnel <stefan_ml@behnel.de> |
|---|---|
| Date | 2012-11-11 08:56 +0100 |
| Message-ID | <mailman.3554.1352620798.27098.python-list@python.org> |
| In reply to | #33116 |
Steve Howell, 11.11.2012 04:03: > On Nov 10, 2:58 pm, Roy Smith <r...@panix.com> wrote: >> I'm trying to pull down tweets with one of the many twitter APIs. The >> particular one I'm using (python-twitter), has a call: >> >> data = api.GetSearch(term="foo", page=page) >> >> The way it works, you start with page=1. It returns a list of tweets. >> If the list is empty, there are no more tweets. If the list is not >> empty, you can try to get more tweets by asking for page=2, page=3, etc. >> I've got: >> >> page = 1 >> while 1: >> r = api.GetSearch(term="foo", page=page) >> if not r: >> break >> for tweet in r: >> process(tweet) >> page += 1 >> >> It works, but it seems excessively fidgety. Is there some cleaner way >> to refactor this? > > I think your code is perfectly readable and clean, but you can flatten > it like so: > > def get_tweets(term, get_page): > page_nums = itertools.count(1) > pages = itertools.imap(api.getSearch, page_nums) > valid_pages = itertools.takewhile(bool, pages) > tweets = itertools.chain.from_iterable(valid_pages) > return tweets I'd prefer the original code ten times over this inaccessible beast. Stefan
[toc] | [prev] | [next] | [standalone]
| From | rusi <rustompmody@gmail.com> |
|---|---|
| Date | 2012-11-11 23:09 -0800 |
| Message-ID | <a61c52b7-eb49-45a5-a4b4-8e4c6b4acaf1@v9g2000pbi.googlegroups.com> |
| In reply to | #33107 |
On Nov 11, 3:58 am, Roy Smith <r...@panix.com> wrote:
> I'm trying to pull down tweets with one of the many twitter APIs. The
> particular one I'm using (python-twitter), has a call:
>
> data = api.GetSearch(term="foo", page=page)
>
> The way it works, you start with page=1. It returns a list of tweets.
> If the list is empty, there are no more tweets. If the list is not
> empty, you can try to get more tweets by asking for page=2, page=3, etc.
> I've got:
>
> page = 1
> while 1:
> r = api.GetSearch(term="foo", page=page)
> if not r:
> break
> for tweet in r:
> process(tweet)
> page += 1
>
> It works, but it seems excessively fidgety. Is there some cleaner way
> to refactor this?
This is a classic problem -- structure clash of parallel loops -- nd
Steve Howell has given the classic solution using the fact that
generators in python simulate/implement lazy lists.
As David Beazley http://www.dabeaz.com/coroutines/ explains,
coroutines are more general than generators and you can use those if
you prefer.
The classic problem used to be stated like this:
There is an input in cards of 80 columns.
It needs to be copied onto printer of 132 columns.
The structure clash arises because after reading 80 chars a new card
has to be read; after printing 132 chars a linefeed has to be given.
To pythonize the problem, lets replace the 80,132 by 3,4, ie take the
char-square
abc
def
ghi
and produce
abcd
efgh
i
The important difference (explained nicely by Beazley) is that in
generators the for-loop pulls the generators, in coroutines, the
'generator' pushes the consuming coroutines.
---------------
from __future__ import print_function
s= ["abc", "def", "ghi"]
# Coroutine-infrastructure from pep 342
def consumer(func):
def wrapper(*args,**kw):
gen = func(*args, **kw)
gen.next()
return gen
return wrapper
@consumer
def endStage():
while True:
for i in range(0,4):
print((yield), sep='', end='')
print("\n", sep='', end='')
def genStage(s, target):
for line in s:
for i in range(0,3):
target.send(line[i])
if __name__ == '__main__':
genStage(s, endStage())
[toc] | [prev] | [next] | [standalone]
| From | rusi <rustompmody@gmail.com> |
|---|---|
| Date | 2012-11-12 07:21 -0800 |
| Message-ID | <bf637b4d-c257-48af-bb3f-b5438f341c67@nl3g2000pbc.googlegroups.com> |
| In reply to | #33167 |
On Nov 12, 12:09 pm, rusi <rustompm...@gmail.com> wrote:
> This is a classic problem -- structure clash of parallel loops
<rest snipped>
Sorry wrong solution :D
The fidgetiness is entirely due to python not allowing C-style loops
like these:
>> while ((c=getchar()!= EOF) { ... }
Putting it into coroutine form, it becomes something like the
following [Untested since I dont have the API]. Clearly the
fidgetiness is there as before and now with extra coroutine plumbing
def genStage(term, target):
page = 1
while 1:
r = api.GetSearch(term="foo", page=page)
if not r: break
for tweet in r: target.send(tweet)
page += 1
@consumer
def endStage():
while True: process((yield))
if __name__ == '__main__':
genStage("foo", endStage())
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2012-11-12 16:49 +0100 |
| Message-ID | <mailman.3588.1352735375.27098.python-list@python.org> |
| In reply to | #33185 |
rusi wrote:
> The fidgetiness is entirely due to python not allowing C-style loops
> like these:
> >>> while ((c=getchar()!= EOF) { ... }
for c in iter(getchar, EOF):
...
> Clearly the fidgetiness is there as before and now with extra coroutine
> plumbing
Hmm, very funny...
[toc] | [prev] | [next] | [standalone]
| From | Steve Howell <showell30@yahoo.com> |
|---|---|
| Date | 2012-11-12 08:09 -0800 |
| Message-ID | <c51bc296-f300-4f83-ac12-3f31217ba8fb@n2g2000pbp.googlegroups.com> |
| In reply to | #33185 |
On Nov 12, 7:21 am, rusi <rustompm...@gmail.com> wrote:
> On Nov 12, 12:09 pm, rusi <rustompm...@gmail.com> wrote:> This is a classic problem -- structure clash of parallel loops
>
> <rest snipped>
>
> Sorry wrong solution :D
>
> The fidgetiness is entirely due to python not allowing C-style loops
> like these:
>
> >> while ((c=getchar()!= EOF) { ... }
> [...]
There are actually three fidgety things going on:
1. The API is 1-based instead of 0-based.
2. You don't know the number of pages in advance.
3. You want to process tweets, not pages of tweets.
Here's yet another take on the problem:
# wrap fidgety 1-based api
def search(i):
return api.GetSearch("foo", i+1)
paged_tweets = (search(i) for i in count())
# handle sentinel
paged_tweets = iter(paged_tweets.next, [])
# flatten pages
tweets = chain.from_iterable(paged_tweets)
for tweet in tweets:
process(tweet)
[toc] | [prev] | [next] | [standalone]
| From | rusi <rustompmody@gmail.com> |
|---|---|
| Date | 2012-11-12 20:14 -0800 |
| Message-ID | <bd59d069-1041-40b7-a8eb-30c2d9595467@v9g2000pbi.googlegroups.com> |
| In reply to | #33188 |
On Nov 12, 9:09 pm, Steve Howell <showel...@yahoo.com> wrote:
> On Nov 12, 7:21 am, rusi <rustompm...@gmail.com> wrote:
>
> > On Nov 12, 12:09 pm, rusi <rustompm...@gmail.com> wrote:> This is a classic problem -- structure clash of parallel loops
>
> > <rest snipped>
>
> > Sorry wrong solution :D
>
> > The fidgetiness is entirely due to python not allowing C-style loops
> > like these:
>
> > >> while ((c=getchar()!= EOF) { ... }
> > [...]
>
> There are actually three fidgety things going on:
>
> 1. The API is 1-based instead of 0-based.
> 2. You don't know the number of pages in advance.
> 3. You want to process tweets, not pages of tweets.
>
> Here's yet another take on the problem:
>
> # wrap fidgety 1-based api
> def search(i):
> return api.GetSearch("foo", i+1)
>
> paged_tweets = (search(i) for i in count())
>
> # handle sentinel
> paged_tweets = iter(paged_tweets.next, [])
>
> # flatten pages
> tweets = chain.from_iterable(paged_tweets)
> for tweet in tweets:
> process(tweet)
[Steve Howell]
Nice on the whole -- thanks
Could not the 1-based-ness be dealt with by using count(1)?
ie use
paged_tweets = (api.GetSearch("foo", i) for i in count(1))
{Peter]
> >>> while ((c=getchar()!= EOF) { ... }
for c in iter(getchar, EOF):
...
Thanks. Learnt something
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web