Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #95181 > unrolled thread

Iterators membership testing

Started byPierre Quentel <pierre.quentel@gmail.com>
First post2015-08-09 02:06 -0700
Last post2015-08-09 08:09 -0500
Articles 12 — 6 participants

Back to article view | Back to comp.lang.python


Contents

  Iterators membership testing Pierre Quentel <pierre.quentel@gmail.com> - 2015-08-09 02:06 -0700
    Re: Iterators membership testing Chris Angelico <rosuav@gmail.com> - 2015-08-09 19:24 +1000
      Re: Iterators membership testing Pierre Quentel <pierre.quentel@gmail.com> - 2015-08-09 02:55 -0700
        Re: Iterators membership testing Chris Angelico <rosuav@gmail.com> - 2015-08-09 20:13 +1000
          Re: Iterators membership testing Pierre Quentel <pierre.quentel@gmail.com> - 2015-08-09 04:00 -0700
            Re: Iterators membership testing Chris Angelico <rosuav@gmail.com> - 2015-08-09 21:10 +1000
        Re: Iterators membership testing Laura Creighton <lac@openend.se> - 2015-08-09 13:30 +0200
        Re: Iterators membership testing Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-08-09 14:49 +0100
    Re: Iterators membership testing Chris Angelico <rosuav@gmail.com> - 2015-08-09 23:11 +1000
    Re: Iterators membership testing Tim Chase <python.list@tim.thechases.com> - 2015-08-09 08:09 -0500
    Re: Iterators membership testing Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-08-09 14:45 +0100
    Re: Iterators membership testing Tim Chase <tim@thechases.com> - 2015-08-09 08:09 -0500

#95181 — Iterators membership testing

FromPierre Quentel <pierre.quentel@gmail.com>
Date2015-08-09 02:06 -0700
SubjectIterators membership testing
Message-ID<88256581-75d4-4f77-81f0-9e3e25baecbc@googlegroups.com>
The documentation at https://docs.python.org/3.5/reference/expressions.html#not-in says :

"For user-defined classes which do not define __contains__() but do define 
__iter__(), x in y is true if some value z with x == z is produced while 
iterating over y. If an exception is raised during the iteration, it is as if 
in raised that exception."

However, if I define a class with

class A:
    
    def __init__(self, x):
        self.x = x
        self.counter = -1
    
    def __iter__(self):
        return self
    
    def __next__(self):
        self.counter += 1
        if self.counter >= self.x:
            raise StopIteration
        return self.counter

and test for membership with

iterator = A(10)

for i in iterator:
    assert i in iterator, '%s not found' %i

I get an assertion error. Setting a trace on __next__ suggests that for 
membership testing, the interpreter consumes the iterator until the searched 
value is found (or until exhaustion), then it resumes iteration at this point.
For instance :

>>> iterator = A(10)
>>> for i in iterator:
...  print(i)
...  assert i+1 in iterator
...
0
2
4
6
8
>>>

If this is correct, should the documentation mention it ?

[toc] | [next] | [standalone]


#95182

FromChris Angelico <rosuav@gmail.com>
Date2015-08-09 19:24 +1000
Message-ID<mailman.2.1439112286.3627.python-list@python.org>
In reply to#95181
On Sun, Aug 9, 2015 at 7:06 PM, Pierre Quentel <pierre.quentel@gmail.com> wrote:
> "For user-defined classes which do not define __contains__() but do define
> __iter__(), x in y is true if some value z with x == z is produced while
> iterating over y. If an exception is raised during the iteration, it is as if
> in raised that exception."
>
> ...
> I get an assertion error. Setting a trace on __next__ suggests that for
> membership testing, the interpreter consumes the iterator until the searched
> value is found (or until exhaustion), then it resumes iteration at this point.

That's exactly right. The only way for the interpreter to handle 'in'
on an iterator is something like this:

def contains(iter, obj):
    for val in iter:
        if val == obj: return True
    return False

That's what the docs describe. So what you have is something like this:

for i in iterator:
    for j in iterator:
        if i == j: break
    else:
        assert False, '%s not found' %i

You're dragging values from the same iterator, so you're consuming it
as part of your membership test. You can do this kind of thing:

>>> 5 in A(10)
True

but if you've already consumed a few values, those won't be in the
iterator any more:

>>> x = A(10)
>>> next(x)
0
>>> next(x)
1
>>> next(x)
2
>>> next(x)
3
>>> 2 in x
False

This is simply how iterators work. They're very different from
repeatable iterables like lists or range objects, where you _can_ test
for membership that way:

>>> x = [10,20,30]
>>> for i in x: assert i in x
...
>>> x = iter([10,20,30])
>>> for i in x: assert i in x
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

Note that I _could_ create a list that would pass this assertion,
simply by duplicating every value:

>>> x = iter([10,10,20,20,30,30])
>>> for i in x: assert i in x
...

But it's iterating only three times here, and the 'in' check is
consuming the other three values. Once your A(10) has yielded some
value, it will never yield it again, so the assertion can never pass.

Does that explain matters?

ChrisA

[toc] | [prev] | [next] | [standalone]


#95183

FromPierre Quentel <pierre.quentel@gmail.com>
Date2015-08-09 02:55 -0700
Message-ID<b1c0218c-05dd-402b-b2f0-5e0836146f76@googlegroups.com>
In reply to#95182
Le dimanche 9 août 2015 11:25:17 UTC+2, Chris Angelico a écrit :
> On Sun, Aug 9, 2015 at 7:06 PM, Pierre Quentel <pierre.quentel@gmail.com> wrote:
> > "For user-defined classes which do not define __contains__() but do define
> > __iter__(), x in y is true if some value z with x == z is produced while
> > iterating over y. If an exception is raised during the iteration, it is as if
> > in raised that exception."
> >
> > ...
> > I get an assertion error. Setting a trace on __next__ suggests that for
> > membership testing, the interpreter consumes the iterator until the searched
> > value is found (or until exhaustion), then it resumes iteration at this point.
> 
> That's exactly right. The only way for the interpreter to handle 'in'
> on an iterator is something like this:
> 
> def contains(iter, obj):
>     for val in iter:
>         if val == obj: return True
>     return False
> 
> That's what the docs describe. So what you have is something like this:
> 
> for i in iterator:
>     for j in iterator:
>         if i == j: break
>     else:
>         assert False, '%s not found' %i
> 
> You're dragging values from the same iterator, so you're consuming it
> as part of your membership test. You can do this kind of thing:
> 
> >>> 5 in A(10)
> True
> 
> but if you've already consumed a few values, those won't be in the
> iterator any more:
> 
> >>> x = A(10)
> >>> next(x)
> 0
> >>> next(x)
> 1
> >>> next(x)
> 2
> >>> next(x)
> 3
> >>> 2 in x
> False
> 
> This is simply how iterators work. They're very different from
> repeatable iterables like lists or range objects, where you _can_ test
> for membership that way:
> 
> >>> x = [10,20,30]
> >>> for i in x: assert i in x
> ...
> >>> x = iter([10,20,30])
> >>> for i in x: assert i in x
> ...
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> AssertionError
> 
> Note that I _could_ create a list that would pass this assertion,
> simply by duplicating every value:
> 
> >>> x = iter([10,10,20,20,30,30])
> >>> for i in x: assert i in x
> ...
> 
> But it's iterating only three times here, and the 'in' check is
> consuming the other three values. Once your A(10) has yielded some
> value, it will never yield it again, so the assertion can never pass.
> 
> Does that explain matters?
> 
> ChrisA

Thanks for the explanation. I understand that an iterator can't test membership any other way, but I'm still worried about how the documentation explains it. Reading it, I naively expected that an iterator which produces the integer 0 such as the one included in my post would say that "0 in iterator" is True, because "some value z with 0 == z is produced while iterating over y".

Shouldn't it be more clear if the documentation said something like : "For user-defined classes which do not define __contains__() but do define __iter__(), x in y consumes the iterator y until some value z with x == z is produced" ?

[toc] | [prev] | [next] | [standalone]


#95184

FromChris Angelico <rosuav@gmail.com>
Date2015-08-09 20:13 +1000
Message-ID<mailman.3.1439115197.3627.python-list@python.org>
In reply to#95183
On Sun, Aug 9, 2015 at 7:55 PM, Pierre Quentel <pierre.quentel@gmail.com> wrote:
> Thanks for the explanation. I understand that an iterator can't test membership any other way, but I'm still worried about how the documentation explains it. Reading it, I naively expected that an iterator which produces the integer 0 such as the one included in my post would say that "0 in iterator" is True, because "some value z with 0 == z is produced while iterating over y".
>
> Shouldn't it be more clear if the documentation said something like : "For user-defined classes which do not define __contains__() but do define __iter__(), x in y consumes the iterator y until some value z with x == z is produced" ?
>

Consuming the iterator is implicit in the description, because the
only way to iterate over an iterator is to call next() on it, which
consumes values. But the description covers all *iterables*, and the
caveat applies only to *iterators*. Compare:

class A:
    def __init__(self, x):
        self.x = x

    def __iter__(self):
        counter = -1
        while counter < self.x:
            counter += 1
            yield counter

This is the same code as you had, except that it's an iterable that
returns a _different_ object, and thus is not itself an iterator (in
this case, A().__iter__() returns a generator object). Note the
similarity of code to your example, and how easy it is to convert your
code to be a generator and make it reiterable. Now let's do that
membership test:

>>> iterable = A(10)
>>> for i in iterable:
...     assert i in iterable
...

No error. We can repeatedly iterate over this generator factory, and
each iteration starts a new instance of the generator, which thus
contains all the same values. Adding a print() will show that every
number is indeed iterated over.

The trap you're seeing here is that iterating over an iterator always
consumes it, but mentally, you're expecting this to be iterating over
a new instance of the same sequence. That's perfectly understandable,
but due to the extreme flexibility of the iterator and iterable
protocols, there's no way for the interpreter to say anything
different. Make your object reiterable, and then it'll behave the way
your brain is expecting... but the docs aren't incorrect here.

ChrisA

[toc] | [prev] | [next] | [standalone]


#95185

FromPierre Quentel <pierre.quentel@gmail.com>
Date2015-08-09 04:00 -0700
Message-ID<09cb3411-be3b-4875-9042-af96c7f63504@googlegroups.com>
In reply to#95184
> The trap you're seeing here is that iterating over an iterator always
> consumes it, but mentally, you're expecting this to be iterating over
> a new instance of the same sequence. 

No, I just tried to apply what I read in the docs :

1. I have y = A(10) which is an instance of a class which does not define __contains__ but does define __iter__

2. The value z = 0 is produced while iterating over y.

3. The sentence "x in y is true if some value z with x == z is produced while iterating over y" lead me to think that "0 in y" would be true.

[toc] | [prev] | [next] | [standalone]


#95186

FromChris Angelico <rosuav@gmail.com>
Date2015-08-09 21:10 +1000
Message-ID<mailman.4.1439118634.3627.python-list@python.org>
In reply to#95185
On Sun, Aug 9, 2015 at 9:00 PM, Pierre Quentel <pierre.quentel@gmail.com> wrote:
>> The trap you're seeing here is that iterating over an iterator always
>> consumes it, but mentally, you're expecting this to be iterating over
>> a new instance of the same sequence.
>
> No, I just tried to apply what I read in the docs :
>
> 1. I have y = A(10) which is an instance of a class which does not define __contains__ but does define __iter__
>
> 2. The value z = 0 is produced while iterating over y.
>
> 3. The sentence "x in y is true if some value z with x == z is produced while iterating over y" lead me to think that "0 in y" would be true.

You're almost right. The significance here is that once you've
partially iterated over y, the value z will not be produced while
iterating over y - so *at that point in time*, y does not contain z.
Iterators (non-infinite ones, at least) shrink over time, so the set
of objects they contain will shrink.

ChrisA

[toc] | [prev] | [next] | [standalone]


#95187

FromLaura Creighton <lac@openend.se>
Date2015-08-09 13:30 +0200
Message-ID<mailman.5.1439119855.3627.python-list@python.org>
In reply to#95183
Maybe add something about this here?
https://docs.python.org/2/tutorial/classes.html#iterators

Laura

[toc] | [prev] | [next] | [standalone]


#95191

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2015-08-09 14:49 +0100
Message-ID<mailman.9.1439128206.3627.python-list@python.org>
In reply to#95183
On 09/08/2015 12:30, Laura Creighton wrote:
> Maybe add something about this here?
> https://docs.python.org/2/tutorial/classes.html#iterators
>
> Laura
>

Better still https://docs.python.org/3/tutorial/classes.html#iterators

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#95188

FromChris Angelico <rosuav@gmail.com>
Date2015-08-09 23:11 +1000
Message-ID<mailman.6.1439125888.3627.python-list@python.org>
In reply to#95181
On Sun, Aug 9, 2015 at 11:09 PM, Tim Chase <tim@thechases.com> wrote:
> On 2015-08-09 19:24, Chris Angelico wrote:
>> That's exactly right. The only way for the interpreter to handle
>> 'in' on an iterator is something like this:
>>
>> def contains(iter, obj):
>>     for val in iter:
>>         if val == obj: return True
>>     return False
>
> Which can nicely be written as
>
>   any(i == obj for obj in iter)

Indeed it can, although I'm not sure whether it'd just have been
another step in the explanation :) Whether or not that makes perfect
sense depends on the reader.

ChrisA

[toc] | [prev] | [next] | [standalone]


#95189

FromTim Chase <python.list@tim.thechases.com>
Date2015-08-09 08:09 -0500
Message-ID<mailman.7.1439126856.3627.python-list@python.org>
In reply to#95181
On 2015-08-09 19:24, Chris Angelico wrote:
> That's exactly right. The only way for the interpreter to handle
> 'in' on an iterator is something like this:
> 
> def contains(iter, obj):
>     for val in iter:
>         if val == obj: return True
>     return False

Which can nicely be written as

  any(i == obj for obj in iter)

The addition of any/all initially struck me as a "why?! this is so
easy to write in-line" moment, only to find myself using them all()
the time. :-) The code-intention becomes so much clearer.  Even
back-ported them to 2.4 code that I maintain.

-tkc


[toc] | [prev] | [next] | [standalone]


#95190

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2015-08-09 14:45 +0100
Message-ID<mailman.8.1439127948.3627.python-list@python.org>
In reply to#95181
On 09/08/2015 14:11, Chris Angelico wrote:
> On Sun, Aug 9, 2015 at 11:09 PM, Tim Chase <tim@thechases.com> wrote:
>> On 2015-08-09 19:24, Chris Angelico wrote:
>>> That's exactly right. The only way for the interpreter to handle
>>> 'in' on an iterator is something like this:
>>>
>>> def contains(iter, obj):
>>>      for val in iter:
>>>          if val == obj: return True
>>>      return False
>>
>> Which can nicely be written as
>>
>>    any(i == obj for obj in iter)
>
> Indeed it can, although I'm not sure whether it'd just have been
> another step in the explanation :) Whether or not that makes perfect
> sense depends on the reader.
>
> ChrisA
>

This one probably won't make make sense to most readers.

any("Australian batsmen scored any() runs recently?")

See also 
http://www.cricketcountry.com/photos/photo-england-supporters-sell-hardly-used-australian-bats-314378 
:)

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#95214

FromTim Chase <tim@thechases.com>
Date2015-08-09 08:09 -0500
Message-ID<mailman.34.1439200820.3627.python-list@python.org>
In reply to#95181
On 2015-08-09 19:24, Chris Angelico wrote:
> That's exactly right. The only way for the interpreter to handle
> 'in' on an iterator is something like this:
> 
> def contains(iter, obj):
>     for val in iter:
>         if val == obj: return True
>     return False

Which can nicely be written as

  any(i == obj for obj in iter)

The addition of any/all initially struck me as a "why?! this is so
easy to write in-line" moment, only to find myself using them all()
the time. :-) The code-intention becomes so much clearer.  Even
back-ported them to 2.4 code that I maintain.

-tkc


[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web