Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #95181 > unrolled thread
| Started by | Pierre Quentel <pierre.quentel@gmail.com> |
|---|---|
| First post | 2015-08-09 02:06 -0700 |
| Last post | 2015-08-09 08:09 -0500 |
| Articles | 12 — 6 participants |
Back to article view | Back to comp.lang.python
Iterators membership testing Pierre Quentel <pierre.quentel@gmail.com> - 2015-08-09 02:06 -0700
Re: Iterators membership testing Chris Angelico <rosuav@gmail.com> - 2015-08-09 19:24 +1000
Re: Iterators membership testing Pierre Quentel <pierre.quentel@gmail.com> - 2015-08-09 02:55 -0700
Re: Iterators membership testing Chris Angelico <rosuav@gmail.com> - 2015-08-09 20:13 +1000
Re: Iterators membership testing Pierre Quentel <pierre.quentel@gmail.com> - 2015-08-09 04:00 -0700
Re: Iterators membership testing Chris Angelico <rosuav@gmail.com> - 2015-08-09 21:10 +1000
Re: Iterators membership testing Laura Creighton <lac@openend.se> - 2015-08-09 13:30 +0200
Re: Iterators membership testing Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-08-09 14:49 +0100
Re: Iterators membership testing Chris Angelico <rosuav@gmail.com> - 2015-08-09 23:11 +1000
Re: Iterators membership testing Tim Chase <python.list@tim.thechases.com> - 2015-08-09 08:09 -0500
Re: Iterators membership testing Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-08-09 14:45 +0100
Re: Iterators membership testing Tim Chase <tim@thechases.com> - 2015-08-09 08:09 -0500
| From | Pierre Quentel <pierre.quentel@gmail.com> |
|---|---|
| Date | 2015-08-09 02:06 -0700 |
| Subject | Iterators membership testing |
| Message-ID | <88256581-75d4-4f77-81f0-9e3e25baecbc@googlegroups.com> |
The documentation at https://docs.python.org/3.5/reference/expressions.html#not-in says :
"For user-defined classes which do not define __contains__() but do define
__iter__(), x in y is true if some value z with x == z is produced while
iterating over y. If an exception is raised during the iteration, it is as if
in raised that exception."
However, if I define a class with
class A:
def __init__(self, x):
self.x = x
self.counter = -1
def __iter__(self):
return self
def __next__(self):
self.counter += 1
if self.counter >= self.x:
raise StopIteration
return self.counter
and test for membership with
iterator = A(10)
for i in iterator:
assert i in iterator, '%s not found' %i
I get an assertion error. Setting a trace on __next__ suggests that for
membership testing, the interpreter consumes the iterator until the searched
value is found (or until exhaustion), then it resumes iteration at this point.
For instance :
>>> iterator = A(10)
>>> for i in iterator:
... print(i)
... assert i+1 in iterator
...
0
2
4
6
8
>>>
If this is correct, should the documentation mention it ?
[toc] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-08-09 19:24 +1000 |
| Message-ID | <mailman.2.1439112286.3627.python-list@python.org> |
| In reply to | #95181 |
On Sun, Aug 9, 2015 at 7:06 PM, Pierre Quentel <pierre.quentel@gmail.com> wrote:
> "For user-defined classes which do not define __contains__() but do define
> __iter__(), x in y is true if some value z with x == z is produced while
> iterating over y. If an exception is raised during the iteration, it is as if
> in raised that exception."
>
> ...
> I get an assertion error. Setting a trace on __next__ suggests that for
> membership testing, the interpreter consumes the iterator until the searched
> value is found (or until exhaustion), then it resumes iteration at this point.
That's exactly right. The only way for the interpreter to handle 'in'
on an iterator is something like this:
def contains(iter, obj):
for val in iter:
if val == obj: return True
return False
That's what the docs describe. So what you have is something like this:
for i in iterator:
for j in iterator:
if i == j: break
else:
assert False, '%s not found' %i
You're dragging values from the same iterator, so you're consuming it
as part of your membership test. You can do this kind of thing:
>>> 5 in A(10)
True
but if you've already consumed a few values, those won't be in the
iterator any more:
>>> x = A(10)
>>> next(x)
0
>>> next(x)
1
>>> next(x)
2
>>> next(x)
3
>>> 2 in x
False
This is simply how iterators work. They're very different from
repeatable iterables like lists or range objects, where you _can_ test
for membership that way:
>>> x = [10,20,30]
>>> for i in x: assert i in x
...
>>> x = iter([10,20,30])
>>> for i in x: assert i in x
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError
Note that I _could_ create a list that would pass this assertion,
simply by duplicating every value:
>>> x = iter([10,10,20,20,30,30])
>>> for i in x: assert i in x
...
But it's iterating only three times here, and the 'in' check is
consuming the other three values. Once your A(10) has yielded some
value, it will never yield it again, so the assertion can never pass.
Does that explain matters?
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Pierre Quentel <pierre.quentel@gmail.com> |
|---|---|
| Date | 2015-08-09 02:55 -0700 |
| Message-ID | <b1c0218c-05dd-402b-b2f0-5e0836146f76@googlegroups.com> |
| In reply to | #95182 |
Le dimanche 9 août 2015 11:25:17 UTC+2, Chris Angelico a écrit : > On Sun, Aug 9, 2015 at 7:06 PM, Pierre Quentel <pierre.quentel@gmail.com> wrote: > > "For user-defined classes which do not define __contains__() but do define > > __iter__(), x in y is true if some value z with x == z is produced while > > iterating over y. If an exception is raised during the iteration, it is as if > > in raised that exception." > > > > ... > > I get an assertion error. Setting a trace on __next__ suggests that for > > membership testing, the interpreter consumes the iterator until the searched > > value is found (or until exhaustion), then it resumes iteration at this point. > > That's exactly right. The only way for the interpreter to handle 'in' > on an iterator is something like this: > > def contains(iter, obj): > for val in iter: > if val == obj: return True > return False > > That's what the docs describe. So what you have is something like this: > > for i in iterator: > for j in iterator: > if i == j: break > else: > assert False, '%s not found' %i > > You're dragging values from the same iterator, so you're consuming it > as part of your membership test. You can do this kind of thing: > > >>> 5 in A(10) > True > > but if you've already consumed a few values, those won't be in the > iterator any more: > > >>> x = A(10) > >>> next(x) > 0 > >>> next(x) > 1 > >>> next(x) > 2 > >>> next(x) > 3 > >>> 2 in x > False > > This is simply how iterators work. They're very different from > repeatable iterables like lists or range objects, where you _can_ test > for membership that way: > > >>> x = [10,20,30] > >>> for i in x: assert i in x > ... > >>> x = iter([10,20,30]) > >>> for i in x: assert i in x > ... > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > AssertionError > > Note that I _could_ create a list that would pass this assertion, > simply by duplicating every value: > > >>> x = iter([10,10,20,20,30,30]) > >>> for i in x: assert i in x > ... > > But it's iterating only three times here, and the 'in' check is > consuming the other three values. Once your A(10) has yielded some > value, it will never yield it again, so the assertion can never pass. > > Does that explain matters? > > ChrisA Thanks for the explanation. I understand that an iterator can't test membership any other way, but I'm still worried about how the documentation explains it. Reading it, I naively expected that an iterator which produces the integer 0 such as the one included in my post would say that "0 in iterator" is True, because "some value z with 0 == z is produced while iterating over y". Shouldn't it be more clear if the documentation said something like : "For user-defined classes which do not define __contains__() but do define __iter__(), x in y consumes the iterator y until some value z with x == z is produced" ?
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-08-09 20:13 +1000 |
| Message-ID | <mailman.3.1439115197.3627.python-list@python.org> |
| In reply to | #95183 |
On Sun, Aug 9, 2015 at 7:55 PM, Pierre Quentel <pierre.quentel@gmail.com> wrote:
> Thanks for the explanation. I understand that an iterator can't test membership any other way, but I'm still worried about how the documentation explains it. Reading it, I naively expected that an iterator which produces the integer 0 such as the one included in my post would say that "0 in iterator" is True, because "some value z with 0 == z is produced while iterating over y".
>
> Shouldn't it be more clear if the documentation said something like : "For user-defined classes which do not define __contains__() but do define __iter__(), x in y consumes the iterator y until some value z with x == z is produced" ?
>
Consuming the iterator is implicit in the description, because the
only way to iterate over an iterator is to call next() on it, which
consumes values. But the description covers all *iterables*, and the
caveat applies only to *iterators*. Compare:
class A:
def __init__(self, x):
self.x = x
def __iter__(self):
counter = -1
while counter < self.x:
counter += 1
yield counter
This is the same code as you had, except that it's an iterable that
returns a _different_ object, and thus is not itself an iterator (in
this case, A().__iter__() returns a generator object). Note the
similarity of code to your example, and how easy it is to convert your
code to be a generator and make it reiterable. Now let's do that
membership test:
>>> iterable = A(10)
>>> for i in iterable:
... assert i in iterable
...
No error. We can repeatedly iterate over this generator factory, and
each iteration starts a new instance of the generator, which thus
contains all the same values. Adding a print() will show that every
number is indeed iterated over.
The trap you're seeing here is that iterating over an iterator always
consumes it, but mentally, you're expecting this to be iterating over
a new instance of the same sequence. That's perfectly understandable,
but due to the extreme flexibility of the iterator and iterable
protocols, there's no way for the interpreter to say anything
different. Make your object reiterable, and then it'll behave the way
your brain is expecting... but the docs aren't incorrect here.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Pierre Quentel <pierre.quentel@gmail.com> |
|---|---|
| Date | 2015-08-09 04:00 -0700 |
| Message-ID | <09cb3411-be3b-4875-9042-af96c7f63504@googlegroups.com> |
| In reply to | #95184 |
> The trap you're seeing here is that iterating over an iterator always > consumes it, but mentally, you're expecting this to be iterating over > a new instance of the same sequence. No, I just tried to apply what I read in the docs : 1. I have y = A(10) which is an instance of a class which does not define __contains__ but does define __iter__ 2. The value z = 0 is produced while iterating over y. 3. The sentence "x in y is true if some value z with x == z is produced while iterating over y" lead me to think that "0 in y" would be true.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-08-09 21:10 +1000 |
| Message-ID | <mailman.4.1439118634.3627.python-list@python.org> |
| In reply to | #95185 |
On Sun, Aug 9, 2015 at 9:00 PM, Pierre Quentel <pierre.quentel@gmail.com> wrote: >> The trap you're seeing here is that iterating over an iterator always >> consumes it, but mentally, you're expecting this to be iterating over >> a new instance of the same sequence. > > No, I just tried to apply what I read in the docs : > > 1. I have y = A(10) which is an instance of a class which does not define __contains__ but does define __iter__ > > 2. The value z = 0 is produced while iterating over y. > > 3. The sentence "x in y is true if some value z with x == z is produced while iterating over y" lead me to think that "0 in y" would be true. You're almost right. The significance here is that once you've partially iterated over y, the value z will not be produced while iterating over y - so *at that point in time*, y does not contain z. Iterators (non-infinite ones, at least) shrink over time, so the set of objects they contain will shrink. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Laura Creighton <lac@openend.se> |
|---|---|
| Date | 2015-08-09 13:30 +0200 |
| Message-ID | <mailman.5.1439119855.3627.python-list@python.org> |
| In reply to | #95183 |
Maybe add something about this here? https://docs.python.org/2/tutorial/classes.html#iterators Laura
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2015-08-09 14:49 +0100 |
| Message-ID | <mailman.9.1439128206.3627.python-list@python.org> |
| In reply to | #95183 |
On 09/08/2015 12:30, Laura Creighton wrote: > Maybe add something about this here? > https://docs.python.org/2/tutorial/classes.html#iterators > > Laura > Better still https://docs.python.org/3/tutorial/classes.html#iterators -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-08-09 23:11 +1000 |
| Message-ID | <mailman.6.1439125888.3627.python-list@python.org> |
| In reply to | #95181 |
On Sun, Aug 9, 2015 at 11:09 PM, Tim Chase <tim@thechases.com> wrote: > On 2015-08-09 19:24, Chris Angelico wrote: >> That's exactly right. The only way for the interpreter to handle >> 'in' on an iterator is something like this: >> >> def contains(iter, obj): >> for val in iter: >> if val == obj: return True >> return False > > Which can nicely be written as > > any(i == obj for obj in iter) Indeed it can, although I'm not sure whether it'd just have been another step in the explanation :) Whether or not that makes perfect sense depends on the reader. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2015-08-09 08:09 -0500 |
| Message-ID | <mailman.7.1439126856.3627.python-list@python.org> |
| In reply to | #95181 |
On 2015-08-09 19:24, Chris Angelico wrote: > That's exactly right. The only way for the interpreter to handle > 'in' on an iterator is something like this: > > def contains(iter, obj): > for val in iter: > if val == obj: return True > return False Which can nicely be written as any(i == obj for obj in iter) The addition of any/all initially struck me as a "why?! this is so easy to write in-line" moment, only to find myself using them all() the time. :-) The code-intention becomes so much clearer. Even back-ported them to 2.4 code that I maintain. -tkc
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2015-08-09 14:45 +0100 |
| Message-ID | <mailman.8.1439127948.3627.python-list@python.org> |
| In reply to | #95181 |
On 09/08/2015 14:11, Chris Angelico wrote:
> On Sun, Aug 9, 2015 at 11:09 PM, Tim Chase <tim@thechases.com> wrote:
>> On 2015-08-09 19:24, Chris Angelico wrote:
>>> That's exactly right. The only way for the interpreter to handle
>>> 'in' on an iterator is something like this:
>>>
>>> def contains(iter, obj):
>>> for val in iter:
>>> if val == obj: return True
>>> return False
>>
>> Which can nicely be written as
>>
>> any(i == obj for obj in iter)
>
> Indeed it can, although I'm not sure whether it'd just have been
> another step in the explanation :) Whether or not that makes perfect
> sense depends on the reader.
>
> ChrisA
>
This one probably won't make make sense to most readers.
any("Australian batsmen scored any() runs recently?")
See also
http://www.cricketcountry.com/photos/photo-england-supporters-sell-hardly-used-australian-bats-314378
:)
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.
Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Tim Chase <tim@thechases.com> |
|---|---|
| Date | 2015-08-09 08:09 -0500 |
| Message-ID | <mailman.34.1439200820.3627.python-list@python.org> |
| In reply to | #95181 |
On 2015-08-09 19:24, Chris Angelico wrote: > That's exactly right. The only way for the interpreter to handle > 'in' on an iterator is something like this: > > def contains(iter, obj): > for val in iter: > if val == obj: return True > return False Which can nicely be written as any(i == obj for obj in iter) The addition of any/all initially struck me as a "why?! this is so easy to write in-line" moment, only to find myself using them all() the time. :-) The code-intention becomes so much clearer. Even back-ported them to 2.4 code that I maintain. -tkc
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web