Groups > comp.lang.python > #26500 > unrolled thread

when an iterable object is exhausted or not

Started by	Franck Ditter <franck@ditter.org>
First post	2012-08-04 21:20 +0200
Last post	2012-08-05 02:13 +0000
Articles	7 — 6 participants

Back to article view | Back to comp.lang.python

  when an iterable object is exhausted or not Franck Ditter <franck@ditter.org> - 2012-08-04 21:20 +0200
    Re: when an iterable object is exhausted or not Tim Roberts <timr@probo.com> - 2012-08-04 12:44 -0700
      Re: when an iterable object is exhausted or not Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-05 00:47 +0000
    Re: when an iterable object is exhausted or not MRAB <python@mrabarnett.plus.com> - 2012-08-04 21:11 +0100
    Re: when an iterable object is exhausted or not Tim Chase <python.list@tim.thechases.com> - 2012-08-04 15:24 -0500
    Re: when an iterable object is exhausted or not Terry Reedy <tjreedy@udel.edu> - 2012-08-04 17:04 -0400
    Re: when an iterable object is exhausted or not Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-05 02:13 +0000

#26500 — when an iterable object is exhausted or not

From	Franck Ditter <franck@ditter.org>
Date	2012-08-04 21:20 +0200
Subject	when an iterable object is exhausted or not
Message-ID	<franck-DC86E6.21203604082012@news.free.fr>

Two similar iterable objects but with a different behavior :

$$$ i = range(2,5)
$$$ for x in i : print(x,end=' ')

2 3 4 
$$$ for x in i : print(x,end=' ')        # i is not exhausted   

2 3 4 

--------- Compare with :

$$$ i = filter(lambda c : c.isdigit(), 'a1b2c3')
$$$ for x in i : print(x,end=' ')

1 2 3 
$$$ for x in i : print(x,end=' ')        # i is exhausted

$$$ 

IMHO, this should not happen in Py3k.
What is the rationale of this (bad ?) design, which forces the programmer
to memorize which one is exhaustable and which one is not ?...

      franck

[toc] | [next] | [standalone]

#26502

From	Tim Roberts <timr@probo.com>
Date	2012-08-04 12:44 -0700
Message-ID	<fquq18989mkh0drbtveulff3n0dkbnhjrh@4ax.com>
In reply to	#26500

Franck Ditter <franck@ditter.org> wrote:
>
>Two similar iterable objects but with a different behavior :
>
>$$$ i = range(2,5)
>$$$ for x in i : print(x,end=' ')
>
>2 3 4 
>$$$ for x in i : print(x,end=' ')        # i is not exhausted   
>
>2 3 4 
>
>--------- Compare with :
>
>$$$ i = filter(lambda c : c.isdigit(), 'a1b2c3')
>$$$ for x in i : print(x,end=' ')
>
>1 2 3 
>$$$ for x in i : print(x,end=' ')        # i is exhausted
>
>$$$ 
>
>IMHO, this should not happen in Py3k.

It's interesting that it DOESN'T happen in Python 2.  The first "i" is of
type list, the second "i" is of type string, and both are restartable.

What's the type of "i" in the second case in Python 3?
-- 
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

[toc] | [prev] | [next] | [standalone]

#26515

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2012-08-05 00:47 +0000
Message-ID	<501dc286$0$29978$c3e8da3$5496439d@news.astraweb.com>
In reply to	#26502

On Sat, 04 Aug 2012 12:44:07 -0700, Tim Roberts wrote:

>>$$$ i = filter(lambda c : c.isdigit(), 'a1b2c3') 
>>$$$ for x in i : print(x,end=' ')
>>1 2 3
>>$$$ for x in i : print(x,end=' ')        # i is exhausted
>>$$$
>>
>>IMHO, this should not happen in Py3k.
> 
> It's interesting that it DOESN'T happen in Python 2.  The first "i" is
> of type list, the second "i" is of type string, and both are
> restartable.
>
> What's the type of "i" in the second case in Python 3?

In Python 3, filter returns a lazy iterator, a "filter object". It 
generates items on demand.

In Python 2, filter is eager, not lazy, and generates items all up-front. 
If the input is a string, it generates a string; if the input is a tuple, 
it generates a tuple; otherwise it generates a list.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#26504

From	MRAB <python@mrabarnett.plus.com>
Date	2012-08-04 21:11 +0100
Message-ID	<mailman.2947.1344111131.4697.python-list@python.org>
In reply to	#26500

On 04/08/2012 20:20, Franck Ditter wrote:
> Two similar iterable objects but with a different behavior :
>
> $$$ i = range(2,5)
> $$$ for x in i : print(x,end=' ')
>
> 2 3 4
> $$$ for x in i : print(x,end=' ')        # i is not exhausted
>
> 2 3 4
>
> --------- Compare with :
>
> $$$ i = filter(lambda c : c.isdigit(), 'a1b2c3')
> $$$ for x in i : print(x,end=' ')
>
> 1 2 3
> $$$ for x in i : print(x,end=' ')        # i is exhausted
>
> $$$
>
> IMHO, this should not happen in Py3k.
> What is the rationale of this (bad ?) design, which forces the programmer
> to memorize which one is exhaustable and which one is not ?...
>
'range' returns a 'range' object:

 >>> i = range(2,5)
 >>> i
range(2, 5)

The 'for' loop passes it to 'iter' to get an iterator:

 >>> iter(i)
<range_iterator object at 0x0135DB30>

More importantly:

 >>> iter(i) is i
False

In other words, when asked for an iterator, the 'range' object always
creates a new one.


On the other hand, 'filter' returns a 'filter' object:

 >>> i = filter(lambda c : c.isdigit(), 'a1b2c3')
 >>> i
<filter object at 0x01360B30>

The 'for' loop passes it to 'iter' to get an iterator:

 >>> iter(i)
<filter object at 0x01360B30>

More importantly:

 >>> iter(i) is i
True

In other words, the 'filter' object returns _itself_ as the iterator.

[toc] | [prev] | [next] | [standalone]

#26505

From	Tim Chase <python.list@tim.thechases.com>
Date	2012-08-04 15:24 -0500
Message-ID	<mailman.2948.1344111770.4697.python-list@python.org>
In reply to	#26500

On 08/04/12 14:20, Franck Ditter wrote:
> Two similar iterable objects but with a different behavior :
> 
> $$$ i = range(2,5)
> $$$ for x in i : print(x,end=' ')
> 
> 2 3 4 
> $$$ for x in i : print(x,end=' ')        # i is not exhausted   
> 
> 2 3 4 
> 
> --------- Compare with :
> 
> $$$ i = filter(lambda c : c.isdigit(), 'a1b2c3')
> $$$ for x in i : print(x,end=' ')
> 
> 1 2 3 
> $$$ for x in i : print(x,end=' ')        # i is exhausted
> 
> $$$ 
> 
> IMHO, this should not happen in Py3k.
> What is the rationale of this (bad ?) design, which forces the programmer
> to memorize which one is exhaustable and which one is not ?...

I can't speak to the rationale, but it seems that a range() object
has some extra features that a normal iter doesn't:

  >>> i = iter(range(2,5))
  >>> for x in i: print (x, end=' ')
  ...
  2 3 4 >>> for x in i: print (x, end=' ')
  ...

(your 2nd behavior, and what I'd expect).

So my guess would be that the "for {var} in {thing}" triggers a
re-calling of range.__iter__ since it's not an iterator to begin with.

-tkc

[toc] | [prev] | [next] | [standalone]

#26509

From	Terry Reedy <tjreedy@udel.edu>
Date	2012-08-04 17:04 -0400
Message-ID	<mailman.2950.1344114320.4697.python-list@python.org>
In reply to	#26500

On 8/4/2012 4:24 PM, Tim Chase wrote:
> On 08/04/12 14:20, Franck Ditter wrote:
>> Two similar iterable objects but with a different behavior :
>>
>> $$$ i = range(2,5)
>> $$$ for x in i : print(x,end=' ')
>>
>> 2 3 4
>> $$$ for x in i : print(x,end=' ')        # i is not exhausted
>>
>> 2 3 4
>>
>> --------- Compare with :
>>
>> $$$ i = filter(lambda c : c.isdigit(), 'a1b2c3')
>> $$$ for x in i : print(x,end=' ')
>>
>> 1 2 3
>> $$$ for x in i : print(x,end=' ')        # i is exhausted
>>
>> $$$
>>
>> IMHO, this should not happen in Py3k.
>> What is the rationale of this (bad ?) design, which forces the programmer
>> to memorize which one is exhaustable and which one is not ?...
>
> I can't speak to the rationale, but it seems that a range() object
> has some extra features that a normal iter doesn't:
>
>    >>> i = iter(range(2,5))
>    >>> for x in i: print (x, end=' ')
>    ...
>    2 3 4 >>> for x in i: print (x, end=' ')
>    ...
>
> (your 2nd behavior, and what I'd expect).
>
> So my guess would be that the "for {var} in {thing}" triggers a
> re-calling of range.__iter__ since it's not an iterator to begin with.

range produces a re-iterable range object because it can. The result is 
self-contained with 3 data attributes, so it can create rangeiterators 
on demand.

filter, on the other hand, depends on an external iterable and it cannot 
depend on that external object being re-iterable. So even if we 
programmed filter() to produce a filter object that produced 
filteriterators, the latter would often only work for the first. Also, 
If you want the filtered collection more than once, you should just save 
it. On the other hand, reproducing counts with a rangeiterator is nearly 
as fast as looking them up in a saved list, and much more memory efficient.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#26519

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2012-08-05 02:13 +0000
Message-ID	<501dd6bd$0$29978$c3e8da3$5496439d@news.astraweb.com>
In reply to	#26500

On Sat, 04 Aug 2012 21:20:36 +0200, Franck Ditter wrote:

> Two similar iterable objects but with a different behavior :
[...]
> IMHO, this should not happen in Py3k. What is the rationale of this (bad
> ?) design, which forces the programmer to memorize which one is
> exhaustable and which one is not ?...

What makes you say that they are "similar" iterable objects? Except that 
they are both iterable, they are very different. You might as well say 
that lists and dicts are "similar iterable objects".

filter objects are iterators, and so obey the intentionally simple 
iterator protocol. range objects are iterable but not iterators, and do 
not obey the iterator protocol.

py> it = filter(lambda x: x, set('abc'))
py> iter(it) is it
True
py> x = range(1, 15, 2)
py> iter(x) is x
False

filter relies on its input, which it consumes as it does. Since the input 
may not be restartable, filter cannot be restartable either. For 
simplicity, filter does not try to be "clever" and guess when the input 
argument is restartable. Instead, it simply and consistently behaves the 
same for any iterable input.

range is a specialist iterable data structure that does not consume 
anything. It computes its output lazily, but that is the only similarity 
with iterators. There's no need to limit range objects to the simple 
iterator protocol, since they don't consume their input -- a single 
object can easily compute its output as often as you want. range objects 
are indexable and sliceable:

py> x = range(1, 15, 2)
py> x[4]
9
py> x[2:4]
range(5, 9, 2)

Why artificially make range objects unrestartable just to satisfy 
compatibility with iterators?

The caller already has to remember that range and filter take different 
arguments, do different things, and return different objects. Why is it 
hard to remember that range is restartable and filter is not?

-- 
Steven

[toc] | [prev] | [standalone]

csiph-web

when an iterable object is exhausted or not

Contents

#26500 — when an iterable object is exhausted or not

#26502

#26515

#26504

#26505

#26509

#26519