Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #44190 > unrolled thread

Nested iteration?

Started byroy@panix.com (Roy Smith)
First post2013-04-23 11:40 -0400
Last post2013-04-23 23:42 +0100
Articles 15 — 8 participants

Back to article view | Back to comp.lang.python


Contents

  Nested iteration? roy@panix.com (Roy Smith) - 2013-04-23 11:40 -0400
    Re: Nested iteration? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 17:05 +0100
    Re: Nested iteration? Ian Kelly <ian.g.kelly@gmail.com> - 2013-04-23 10:05 -0600
    Re: Nested iteration? Peter Otten <__peter__@web.de> - 2013-04-23 18:15 +0200
    Re: Nested iteration? Chris Angelico <rosuav@gmail.com> - 2013-04-24 02:21 +1000
    Re: Nested iteration? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-23 16:35 +0000
    Re: Nested iteration? Ian Kelly <ian.g.kelly@gmail.com> - 2013-04-23 10:30 -0600
    Re: Nested iteration? Ian Kelly <ian.g.kelly@gmail.com> - 2013-04-23 10:39 -0600
    Re: Nested iteration? Chris Angelico <rosuav@gmail.com> - 2013-04-24 02:42 +1000
      Re: Nested iteration? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-23 16:53 +0000
    Re: Nested iteration? Terry Jan Reedy <tjreedy@udel.edu> - 2013-04-23 16:49 -0400
    Re: Nested iteration? Joshua Landau <joshua.landau.ws@gmail.com> - 2013-04-23 22:14 +0100
    Re: Nested iteration? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 22:29 +0100
    Re: Nested iteration? Joshua Landau <joshua.landau.ws@gmail.com> - 2013-04-23 22:41 +0100
    Re: Nested iteration? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 23:42 +0100

#44190 — Nested iteration?

Fromroy@panix.com (Roy Smith)
Date2013-04-23 11:40 -0400
SubjectNested iteration?
Message-ID<kl6a1f$k2l$1@panix2.panix.com>
In reviewing somebody else's code today, I found the following
construct (eliding some details):

    f = open(filename)
    for line in f:
        if re.search(pattern1, line):
            outer_line = f.next()
            for inner_line in f:
	     	if re.search(pattern2, inner_line):
                    inner_line = f.next()

Somewhat to my surprise, the code worked.  I didn't know it was legal
to do nested iterations over the same iterable (not to mention mixing
calls to next() with for-loops).  Is this guaranteed to work in all
situations?

[toc] | [next] | [standalone]


#44193

FromOscar Benjamin <oscar.j.benjamin@gmail.com>
Date2013-04-23 17:05 +0100
Message-ID<mailman.980.1366733143.3114.python-list@python.org>
In reply to#44190
On 23 April 2013 16:40, Roy Smith <roy@panix.com> wrote:
> In reviewing somebody else's code today, I found the following
> construct (eliding some details):
>
>     f = open(filename)
>     for line in f:
>         if re.search(pattern1, line):
>             outer_line = f.next()
>             for inner_line in f:
>                 if re.search(pattern2, inner_line):
>                     inner_line = f.next()
>
> Somewhat to my surprise, the code worked.  I didn't know it was legal
> to do nested iterations over the same iterable (not to mention mixing
> calls to next() with for-loops).  Is this guaranteed to work in all
> situations?

For Python 3 you'd need next(f) instead of f.next(). Otherwise, yes,
this works just fine with any non-restarting iterator (i.e. so that
__iter__ just returns self rather than a new iterator).

I recently posted in another thread about why it's a bad idea to call
next() without catching StopIteration though. I wouldn't accept the
code above for that reason.


Oscar

[toc] | [prev] | [next] | [standalone]


#44194

FromIan Kelly <ian.g.kelly@gmail.com>
Date2013-04-23 10:05 -0600
Message-ID<mailman.981.1366733175.3114.python-list@python.org>
In reply to#44190
On Tue, Apr 23, 2013 at 9:40 AM, Roy Smith <roy@panix.com> wrote:
> In reviewing somebody else's code today, I found the following
> construct (eliding some details):
>
>     f = open(filename)
>     for line in f:
>         if re.search(pattern1, line):
>             outer_line = f.next()
>             for inner_line in f:
>                 if re.search(pattern2, inner_line):
>                     inner_line = f.next()
>
> Somewhat to my surprise, the code worked.  I didn't know it was legal
> to do nested iterations over the same iterable (not to mention mixing
> calls to next() with for-loops).  Is this guaranteed to work in all
> situations?

Yes, although the results will be different depending on whether the
iterable stores its iteration state on itself (like a file object) or
in the iterator (like a list).  In the latter case, you would simply
have two independent simultaneous iterations of the same object.  You
can replicate the same effect in the latter case though by getting an
iterator from the object and explicitly looping over the same
iterator, like so:

i = iter(range(10))
for x in i:
    if x % 4 == 1:
        for y in i:
            if y % 4 == 3:
                print("%d + %d = %d" % (x, y, x+y))
                break

[toc] | [prev] | [next] | [standalone]


#44195

FromPeter Otten <__peter__@web.de>
Date2013-04-23 18:15 +0200
Message-ID<mailman.982.1366733742.3114.python-list@python.org>
In reply to#44190
Roy Smith wrote:

> In reviewing somebody else's code today, I found the following
> construct (eliding some details):
> 
>     f = open(filename)
>     for line in f:
>         if re.search(pattern1, line):
>             outer_line = f.next()
>             for inner_line in f:
> if re.search(pattern2, inner_line):
>                     inner_line = f.next()
> 
> Somewhat to my surprise, the code worked.  I didn't know it was legal
> to do nested iterations over the same iterable (not to mention mixing
> calls to next() with for-loops).  Is this guaranteed to work in all
> situations?

That depends on what you mean by "all". A well-behaved iterator like 
Python's file object allows mixing of for loops and next(...) calls, but 
stupid people who deserve to burn in hell sometimes do

class MyIterable:
    def __iter__(self):
         reset_internal_counter()
         return self


with the consequence that every for loop implicitly resets the iterator's 
state.

[toc] | [prev] | [next] | [standalone]


#44196

FromChris Angelico <rosuav@gmail.com>
Date2013-04-24 02:21 +1000
Message-ID<mailman.983.1366734096.3114.python-list@python.org>
In reply to#44190
On Wed, Apr 24, 2013 at 1:40 AM, Roy Smith <roy@panix.com> wrote:
> In reviewing somebody else's code today, I found the following
> construct (eliding some details):
>
>     f = open(filename)
>     for line in f:
>         if re.search(pattern1, line):
>             outer_line = f.next()
>             for inner_line in f:
>                 if re.search(pattern2, inner_line):
>                     inner_line = f.next()
>
> Somewhat to my surprise, the code worked.  I didn't know it was legal
> to do nested iterations over the same iterable (not to mention mixing
> calls to next() with for-loops).  Is this guaranteed to work in all
> situations?

The definition of the for loop is sufficiently simple that this is
safe, with the caveat already mentioned (that __iter__ is just
returning self). And calling next() inside the loop will simply
terminate the loop if there's nothing there, so I'd not have a problem
with code like that - for instance, if I wanted to iterate over pairs
of lines, I'd happily do this:

for line1 in f:
  line2=next(f)
  print(line2)
  print(line1)

That'll happily swap pairs, ignoring any stray line at the end of the
file. Why bother catching StopIteration just to break?

ChrisA

[toc] | [prev] | [next] | [standalone]


#44198

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-04-23 16:35 +0000
Message-ID<5176b83e$0$29977$c3e8da3$5496439d@news.astraweb.com>
In reply to#44190
On Tue, 23 Apr 2013 11:40:31 -0400, Roy Smith wrote:

> In reviewing somebody else's code today, I found the following construct
> (eliding some details):
> 
>     f = open(filename)
>     for line in f:
>         if re.search(pattern1, line):
>             outer_line = f.next()
>             for inner_line in f:
> 	     	if re.search(pattern2, inner_line):
>                     inner_line = f.next()
> 
> Somewhat to my surprise, the code worked.  I didn't know it was legal to
> do nested iterations over the same iterable (not to mention mixing calls
> to next() with for-loops).  Is this guaranteed to work in all
> situations?


In "all" situations? No of course not, this is Python, you can write 
nasty code that explodes the *second* time you iterate over it, but not 
the first.

class Demo:
    flag = False
    def __iter__(self):
        if self.flag:
            raise RuntimeError("don't do that!")
        self.flag = True
        return iter([1, 2, 3])


But under normal circumstances with normal iterables, yes, it's fine. If 
the object is a sequence, like lists or strings, each for-loop is 
independent of the others:

py> s = "ab"
py> for c in s:
...     for k in s:
...             print c, k
...
a a
a b
b a
b b


If the object is an iterator, each loop consumes a single value:

py> it = iter("abcd")
py> for c in it:
...     for k in it:
...             print c, k
...
a b
a c
a d


Each time you call next(), a single value is consumed. It doesn't matter 
whether you have one for-loop calling next() behind the scenes, or ten 
loops, or you call next() yourself, the same rule applies.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#44199

FromIan Kelly <ian.g.kelly@gmail.com>
Date2013-04-23 10:30 -0600
Message-ID<mailman.984.1366735164.3114.python-list@python.org>
In reply to#44190
On Tue, Apr 23, 2013 at 10:21 AM, Chris Angelico <rosuav@gmail.com> wrote:
> The definition of the for loop is sufficiently simple that this is
> safe, with the caveat already mentioned (that __iter__ is just
> returning self). And calling next() inside the loop will simply
> terminate the loop if there's nothing there, so I'd not have a problem
> with code like that - for instance, if I wanted to iterate over pairs
> of lines, I'd happily do this:
>
> for line1 in f:
>   line2=next(f)
>   print(line2)
>   print(line1)
>
> That'll happily swap pairs, ignoring any stray line at the end of the
> file. Why bother catching StopIteration just to break?

The next() there will *not* "simply terminate the loop" if it raises a
StopIteration; for loops do not catch StopIteration exceptions that
are raised from the body of the loop.  The StopIteration will continue
to propagate until it is caught or it reaches the sys.excepthook.  In
unusual circumstances, it is even possible that it could cause some
*other* loop higher in the stack to break (i.e. if the current code is
being run as a result of the next() method being called by the looping
construct).

[toc] | [prev] | [next] | [standalone]


#44200

FromIan Kelly <ian.g.kelly@gmail.com>
Date2013-04-23 10:39 -0600
Message-ID<mailman.985.1366735209.3114.python-list@python.org>
In reply to#44190
On Tue, Apr 23, 2013 at 10:30 AM, Ian Kelly <ian.g.kelly@gmail.com> wrote:
> On Tue, Apr 23, 2013 at 10:21 AM, Chris Angelico <rosuav@gmail.com> wrote:
>> The definition of the for loop is sufficiently simple that this is
>> safe, with the caveat already mentioned (that __iter__ is just
>> returning self). And calling next() inside the loop will simply
>> terminate the loop if there's nothing there, so I'd not have a problem
>> with code like that - for instance, if I wanted to iterate over pairs
>> of lines, I'd happily do this:
>>
>> for line1 in f:
>>   line2=next(f)
>>   print(line2)
>>   print(line1)
>>
>> That'll happily swap pairs, ignoring any stray line at the end of the
>> file. Why bother catching StopIteration just to break?
>
> The next() there will *not* "simply terminate the loop" if it raises a
> StopIteration; for loops do not catch StopIteration exceptions that
> are raised from the body of the loop.  The StopIteration will continue
> to propagate until it is caught or it reaches the sys.excepthook.  In
> unusual circumstances, it is even possible that it could cause some
> *other* loop higher in the stack to break (i.e. if the current code is
> being run as a result of the next() method being called by the looping
> construct).

To expand on this, the prevailing wisdom here is that calls to next()
should always be guarded with a StopIteration exception handler.  The
one exception to this is when the next() call is inside the body of a
generator function, and the exception handler would cause the
generator to exit anyway; in that case there is little difference
between "except StopIteration: return" and letting the StopIteration
propagate to the generator object.

[toc] | [prev] | [next] | [standalone]


#44201

FromChris Angelico <rosuav@gmail.com>
Date2013-04-24 02:42 +1000
Message-ID<mailman.986.1366735371.3114.python-list@python.org>
In reply to#44190
On Wed, Apr 24, 2013 at 2:30 AM, Ian Kelly <ian.g.kelly@gmail.com> wrote:
> On Tue, Apr 23, 2013 at 10:21 AM, Chris Angelico <rosuav@gmail.com> wrote:
>> The definition of the for loop is sufficiently simple that this is
>> safe, with the caveat already mentioned (that __iter__ is just
>> returning self). And calling next() inside the loop will simply
>> terminate the loop if there's nothing there, so I'd not have a problem
>> with code like that - for instance, if I wanted to iterate over pairs
>> of lines, I'd happily do this:
>>
>> for line1 in f:
>>   line2=next(f)
>>   print(line2)
>>   print(line1)
>>
>> That'll happily swap pairs, ignoring any stray line at the end of the
>> file. Why bother catching StopIteration just to break?
>
> The next() there will *not* "simply terminate the loop" if it raises a
> StopIteration; for loops do not catch StopIteration exceptions that
> are raised from the body of the loop.  The StopIteration will continue
> to propagate until it is caught or it reaches the sys.excepthook.  In
> unusual circumstances, it is even possible that it could cause some
> *other* loop higher in the stack to break (i.e. if the current code is
> being run as a result of the next() method being called by the looping
> construct).

Ah, whoops, my bad. This is what I get for not checking. I know I've
done weird stuff with for loops before, but I guess it was fiddling
inside the top of it, not in its body.

I love this list. If I make a mistake, it's sure to be caught by
someone else. The record is guaranteed to be set straight. Thanks Ian!

ChrisA

[toc] | [prev] | [next] | [standalone]


#44202

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-04-23 16:53 +0000
Message-ID<5176bc85$0$29977$c3e8da3$5496439d@news.astraweb.com>
In reply to#44201
On Wed, 24 Apr 2013 02:42:41 +1000, Chris Angelico wrote:

> I love this list. If I make a mistake, it's sure to be caught by someone
> else.

No it's not!


Are-you-here-for-the-five-minute-argument-or-the-full-ten-minutes-ly y'rs,


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#44214

FromTerry Jan Reedy <tjreedy@udel.edu>
Date2013-04-23 16:49 -0400
Message-ID<mailman.993.1366750170.3114.python-list@python.org>
In reply to#44190
On 4/23/2013 11:40 AM, Roy Smith wrote:
> In reviewing somebody else's code today, I found the following
> construct (eliding some details):
>
>      f = open(filename)
>      for line in f:
>          if re.search(pattern1, line):
>              outer_line = f.next()
>              for inner_line in f:
> 	     	if re.search(pattern2, inner_line):
>                      inner_line = f.next()

Did you possibly elide a 'break' after the inner_line assignment?

> Somewhat to my surprise, the code worked.

Without a break, the inner loop will continue iterating through the rest 
of the file (billions of lines?) looking for pattern2 and re-binding 
inner-line if there is another line or raising StopIteration if there is 
not. Does this really constitute 'working'?

This is quite aside from issue of what one wants if there is no pattern1 
or if there is no line after the first match (probably not 
StopIteration) or if there is no pattern2.

> I didn't know it was legal to do nested iterations over the same iterable

Yes, but the effect is quite different for iterators (start where the 
outer iteration left off) and non-iterators (restart at the beginning).

r = range(2)
for i in r:
     for j in r:
         print(i,j)
# this is a common idiom to get all pairs
0 0
0 1
1 0
1 1

ri= iter(range(3))
for i in ri:
     for j in ri:
         print(i,j)
# this is somewhat deceptive as the outer loop executes just once
0 1
0 2

I personally would add a 'break' after 'outer_line = next(f)', since the 
first loop is effectively done anyway at that point, and dedent the 
second for statement. I find to following clearer

ri= iter(range(3))
for i in ri:
     break
for j in ri:
     print(i,j)
# this makes it clear that the first loop executes just once
0 1
0 2

I would only nest if the inner loop could terminate without exhausting 
the iterator and I wanted the outer loop to then resume.

__
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#44215

FromJoshua Landau <joshua.landau.ws@gmail.com>
Date2013-04-23 22:14 +0100
Message-ID<mailman.994.1366751696.3114.python-list@python.org>
In reply to#44190

[Multipart message — attachments visible in raw view] — view raw

On 23 April 2013 21:49, Terry Jan Reedy <tjreedy@udel.edu> wrote:

> ri= iter(range(3))
> for i in ri:
>     for j in ri:
>         print(i,j)
> # this is somewhat deceptive as the outer loop executes just once
> 0 1
> 0 2
>
> I personally would add a 'break' after 'outer_line = next(f)', since the
> first loop is effectively done anyway at that point, and dedent the second
> for statement. I find to following clearer
>
> ri= iter(range(3))
> for i in ri:
>     break
> for j in ri:
>     print(i,j)
> # this makes it clear that the first loop executes just once
> 0 1
> 0 2
>
> I would only nest if the inner loop could terminate without exhausting the
> iterator and I wanted the outer loop to then resume.
>

Surely a normal programmer would think "next(ri, None)" rather than a loop
that just breaks.

[toc] | [prev] | [next] | [standalone]


#44217

FromOscar Benjamin <oscar.j.benjamin@gmail.com>
Date2013-04-23 22:29 +0100
Message-ID<mailman.996.1366752575.3114.python-list@python.org>
In reply to#44190
On 23 April 2013 17:30, Ian Kelly <ian.g.kelly@gmail.com> wrote:
> On Tue, Apr 23, 2013 at 10:21 AM, Chris Angelico <rosuav@gmail.com> wrote:
>> The definition of the for loop is sufficiently simple that this is
>> safe, with the caveat already mentioned (that __iter__ is just
>> returning self). And calling next() inside the loop will simply
>> terminate the loop if there's nothing there, so I'd not have a problem
>> with code like that - for instance, if I wanted to iterate over pairs
>> of lines, I'd happily do this:
>>
>> for line1 in f:
>>   line2=next(f)
>>   print(line2)
>>   print(line1)
>>
>> That'll happily swap pairs, ignoring any stray line at the end of the
>> file. Why bother catching StopIteration just to break?
>
> The next() there will *not* "simply terminate the loop" if it raises a
> StopIteration; for loops do not catch StopIteration exceptions that
> are raised from the body of the loop.  The StopIteration will continue
> to propagate until it is caught or it reaches the sys.excepthook.  In
> unusual circumstances, it is even possible that it could cause some
> *other* loop higher in the stack to break (i.e. if the current code is
> being run as a result of the next() method being called by the looping
> construct).

I don't find that the circumstances are unusual. Pretty much any time
one of the functions in the call stack is a generator this problem
will occur if StopIteration propagates.

I just thought I'd add that Python 3 has a convenient way to avoid
this problem with next() which is to use the starred unpacking syntax:

>>> numbers = [1, 2, 3, 4]
>>> first, *numbers = numbers
>>> first
1
>>> for x in numbers:
...     print(x)
...
2
3
4
>>> first, *numbers = []
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: need more than 0 values to unpack

Since we get a ValueError instead of a StopIteration we don't have the
described problem.


Oscar

[toc] | [prev] | [next] | [standalone]


#44221

FromJoshua Landau <joshua.landau.ws@gmail.com>
Date2013-04-23 22:41 +0100
Message-ID<mailman.997.1366753308.3114.python-list@python.org>
In reply to#44190

[Multipart message — attachments visible in raw view] — view raw

On 23 April 2013 22:29, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:

> I just thought I'd add that Python 3 has a convenient way to avoid
> this problem with next() which is to use the starred unpacking syntax:
>
> >>> numbers = [1, 2, 3, 4]
> >>> first, *numbers = numbers
>

That creates a new list every time. You'll not want that over
try-next-except if you're doing this in a loop, and on addition (if you
were talking in context) your method will exhaust the iterator in the outer
loop.

[toc] | [prev] | [next] | [standalone]


#44230

FromOscar Benjamin <oscar.j.benjamin@gmail.com>
Date2013-04-23 23:42 +0100
Message-ID<mailman.1004.1366756982.3114.python-list@python.org>
In reply to#44190
On 23 April 2013 22:41, Joshua Landau <joshua.landau.ws@gmail.com> wrote:
> On 23 April 2013 22:29, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
>>
>> I just thought I'd add that Python 3 has a convenient way to avoid
>> this problem with next() which is to use the starred unpacking syntax:
>>
>> >>> numbers = [1, 2, 3, 4]
>> >>> first, *numbers = numbers
>
>
> That creates a new list every time. You'll not want that over
> try-next-except if you're doing this in a loop, and on addition (if you were
> talking in context) your method will exhaust the iterator in the outer loop.

Oh, you're right. I'm not using Python 3 yet and I assumed without
checking that it would be giving me an iterator rather than unpacking
everything into a list.

Then the best I can think of is a helper function:

>>> def unpack(iterable, count):
...   iterator = iter(iterable)
...   for n in range(count):
...     yield next(iterator)
...   yield iterator
...
>>> numbers = [1, 2, 3, 4]
>>> first, numbers = unpack(numbers, 1)
>>> first
1
>>> numbers
<list_iterator object at 0x24e1590>
>>> list(numbers)
[2, 3, 4]
>>> first, numbers = unpack([], 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: need more than 0 values to unpack


Oscar

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web