Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #27194 > unrolled thread
| Started by | Aaron Brady <castironpi@gmail.com> |
|---|---|
| First post | 2012-08-16 11:00 -0700 |
| Last post | 2012-08-17 10:47 -0700 |
| Articles | 20 on this page of 25 — 9 participants |
Back to article view | Back to comp.lang.python
set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-16 11:00 -0700
Re: set and dict iteration Dave Angel <d@davea.name> - 2012-08-16 15:49 -0400
Re: set and dict iteration Paul Rubin <no.email@nospam.invalid> - 2012-08-16 14:26 -0700
Re: set and dict iteration Dave Angel <davea@dejaviewphoto.com> - 2012-08-16 19:11 -0400
Re: set and dict iteration Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-17 02:24 +0000
Re: set and dict iteration Paul Rubin <no.email@nospam.invalid> - 2012-08-16 19:30 -0700
Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-17 11:11 -0700
Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-17 11:37 -0700
Re: set and dict iteration Chris Angelico <rosuav@gmail.com> - 2012-08-18 07:57 +1000
Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-18 13:29 -0700
Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-18 13:29 -0700
Re: set and dict iteration MRAB <python@mrabarnett.plus.com> - 2012-08-18 23:14 +0100
Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-18 19:28 -0700
Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-23 09:49 -0700
Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-23 09:49 -0700
Re: set and dict iteration Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-23 18:11 +0000
Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-18 19:28 -0700
Re: set and dict iteration Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-16 17:43 -0600
Re: set and dict iteration Paul Rubin <no.email@nospam.invalid> - 2012-08-16 18:01 -0700
Re: set and dict iteration Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> - 2012-08-17 13:16 +0200
Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-17 11:03 -0700
Re: set and dict iteration Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-16 16:55 -0600
Re: set and dict iteration Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-16 17:07 -0600
Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-17 10:47 -0700
Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-17 10:47 -0700
Page 1 of 2 [1] 2 Next page →
| From | Aaron Brady <castironpi@gmail.com> |
|---|---|
| Date | 2012-08-16 11:00 -0700 |
| Subject | set and dict iteration |
| Message-ID | <b8dd3aca-2a87-4124-ad6e-66a8720af99a@googlegroups.com> |
Hello,
I observed an inconsistency in the behavior of 'set' and 'dict' iterators. It is "by design" according to the docs.
'''
http://docs.python.org/dev/library/stdtypes.html#dict-views
iter(dictview). Iterating views while adding or deleting entries in the dictionary may raise a RuntimeError or fail to iterate over all entries.
'''
The 'set' has the same behavior. Iteration might also complete successfully.
The inconsistency is, if we remove an element from a set and add another during iteration, the new element might appear later in the iteration, and might not, depending on the hash code; therefore comparing the size of the set between iterations isn't adequate. Example:
http://home.comcast.net/~castironpi-misc/clpy-0062%20set%20iterators.py
'''
# py: { 'ver': '3' }
set0= set( ( 1, 2 ) )
iter0= iter( set0 )
print( next( iter0 ) )
set0.add( 3 )
set0.remove( 2 )
print( next( iter0 ) )
print( )
set0= set( ( 6, 7 ) )
iter0= iter( set0 )
print( next( iter0 ) )
set0.add( 8 )
set0.remove( 7 )
print( next( iter0 ) )
'''
Output:
'''
1
3
6
Traceback (most recent call last):
File [...] line 22, in <module>
print( next( iter0 ) )
StopIteration
'''
Iteration should behave the same regardless of the contents of the set. Continuing iteration over sets and dicts after a modification isn't defined; it should unconditionally raise an error.
What's going on, is '8' is added before the position of the iterator due to hashing in the second part, but the size doesn't change, so the iterator reaches the end of the set after '7' is removed.
The inconsistency isn't easily solved. One possibility is to use a timestamp or other serial index in the object and iterators, and compare them on every iteration to determine if a modification has occurred.
Another possibility which the author prefers, is to maintain a secondary collection of the iterators of an object, and invalidate them upon modification. The applicable collection structure is a doubly-linked linked list, informally depicted:
http://home.comcast.net/~castironpi-misc/clpy-0062%20set%20iterators.png
Upon modification, the set traverses its iterators, setting an 'invalid' flag on each; and subsequent calls to any of them raise an 'IterationError'. Adding and removing iterators to and from the secondary list is performed in O( 1 ) time with no penalty.
The above example depicted a 'Set'. 'Dicts' have the same anomaly, but the solution is ambiguous, since dict values can be changed meaningfully without altering the structure of the object. In the author's opinion, the dict should not raise an 'IterationError' on value changes, only key changes like the set, but the argument isn't conclusive.
[toc] | [next] | [standalone]
| From | Dave Angel <d@davea.name> |
|---|---|
| Date | 2012-08-16 15:49 -0400 |
| Message-ID | <mailman.3389.1345146609.4697.python-list@python.org> |
| In reply to | #27194 |
On 08/16/2012 02:00 PM, Aaron Brady wrote: > Hello, > > I observed an inconsistency in the behavior of 'set' and 'dict' iterators. It is "by design" according to the docs. > > ''' > http://docs.python.org/dev/library/stdtypes.html#dict-views > > iter(dictview). Iterating views while adding or deleting entries in the dictionary may raise a RuntimeError or fail to iterate over all entries. > ''' > > The 'set' has the same behavior. Iteration might also complete successfully. > > The inconsistency is, if we remove an element from a set and add another during iteration, the new element might appear later in the iteration, and might not, depending on the hash code; therefore comparing the size of the set between iterations isn't adequate. Example: > <SNIP> > > > Iteration should behave the same regardless of the contents of the set. Continuing iteration over sets and dicts after a modification isn't defined; it should unconditionally raise an error. Why is it the iterator's job to protect against the user's bug? The doc is clear enough. If you don't change the collection, you won't have a problem. > <SNIP more details>. Everything else is implementation defined. Why should an implementation be forced to have ANY extra data structure to detect a static bug in the caller's code? -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | Paul Rubin <no.email@nospam.invalid> |
|---|---|
| Date | 2012-08-16 14:26 -0700 |
| Message-ID | <7xy5le7cli.fsf@ruckus.brouhaha.com> |
| In reply to | #27195 |
Dave Angel <d@davea.name> writes: > Everything else is implementation defined. Why should an implementation > be forced to have ANY extra data structure to detect a static bug in the > caller's code? For the same reason the interpreter checks for type errors at runtime and raises TypeError, instead of letting the program go into the weeds.
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@dejaviewphoto.com> |
|---|---|
| Date | 2012-08-16 19:11 -0400 |
| Message-ID | <mailman.3404.1345158704.4697.python-list@python.org> |
| In reply to | #27202 |
On 08/16/2012 05:26 PM, Paul Rubin wrote: > Dave Angel <d@davea.name> writes: >> Everything else is implementation defined. Why should an implementation >> be forced to have ANY extra data structure to detect a static bug in the >> caller's code? > For the same reason the interpreter checks for type errors at runtime > and raises TypeError, instead of letting the program go into the weeds. There's an enormous difference between type errors, which affect the low level dispatch, and checking for whether a dict has changed and may have invalidated the iterator. If we were really going to keep track of what iterators are tracking a given dict or set, why stop there? Why not check if another process has changed a file we're iterating through? Or ...
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-08-17 02:24 +0000 |
| Message-ID | <502dab6c$0$29978$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #27213 |
On Thu, 16 Aug 2012 19:11:19 -0400, Dave Angel wrote: > On 08/16/2012 05:26 PM, Paul Rubin wrote: >> Dave Angel <d@davea.name> writes: >>> Everything else is implementation defined. Why should an >>> implementation be forced to have ANY extra data structure to detect a >>> static bug in the caller's code? >> For the same reason the interpreter checks for type errors at runtime >> and raises TypeError, instead of letting the program go into the weeds. > > There's an enormous difference between type errors, which affect the low > level dispatch, and checking for whether a dict has changed and may have > invalidated the iterator. If we were really going to keep track of what > iterators are tracking a given dict or set, why stop there? Why not > check if another process has changed a file we're iterating through? Or > ... Which is why Python doesn't do it -- because it is (claimed to be) excessively expensive for the benefit that you would get. Not because it is a matter of principle that data integrity is unimportant. Data integrity *is* important, but in the opinion of the people who wrote these particular data structures, the effort required to guarantee correct iteration in the face of mutation is too expensive for the benefit. Are they right? I don't know. I know that the list sort method goes to a lot of trouble to prevent code from modifying lists while they are being sorted. During the sort, the list temporarily appears to be empty to anything which attempts to access it. So at least sometimes, the Python developers spend effort to ensure data integrity. Luckily, Python is open source. If anyone thinks that sets and dicts should include more code protecting against mutation-during-iteration, they are more than welcome to come up with a patch. Don't forget unit and regression tests, and also a set of timing results which show that the slow-down isn't excessive. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Paul Rubin <no.email@nospam.invalid> |
|---|---|
| Date | 2012-08-16 19:30 -0700 |
| Message-ID | <7xboiadzcd.fsf@ruckus.brouhaha.com> |
| In reply to | #27218 |
Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes: > Luckily, Python is open source. If anyone thinks that sets and dicts > should include more code protecting against mutation-during-iteration, > they are more than welcome to come up with a patch. Don't forget unit and > regression tests, and also a set of timing results which show that the > slow-down isn't excessive. It could be a debugging option, in which case even a fairly significant slowdown is acceptable.
[toc] | [prev] | [next] | [standalone]
| From | Aaron Brady <castironpi@gmail.com> |
|---|---|
| Date | 2012-08-17 11:11 -0700 |
| Message-ID | <de5c72da-404f-4569-bcf1-78bdeebf813a@googlegroups.com> |
| In reply to | #27220 |
On Thursday, August 16, 2012 9:30:42 PM UTC-5, Paul Rubin wrote: > Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes: > > > Luckily, Python is open source. If anyone thinks that sets and dicts > > > should include more code protecting against mutation-during-iteration, > > > they are more than welcome to come up with a patch. Don't forget unit and > > > regression tests, and also a set of timing results which show that the > > > slow-down isn't excessive. > > > > It could be a debugging option, in which case even a fairly significant > > slowdown is acceptable. Another possibility is to use the 'gc.get_referrers' mechanism to obtain the iterators. import gc a= set( ( 0, 1, 2 ) ) b= iter( a ) c= iter( a ) d= iter( a ) print( gc.get_referrers( a ) ) Output: [<set_iterator object at 0x00C0B9E0>, <set_iterator object at 0x00C0BA08>, <set_iterator object at 0x00C0BA30>, [others] ] This approach wouldn't be as time-efficient as a dedicated secondary structure, due to the other objects which refer to the set, including variable namespaces.
[toc] | [prev] | [next] | [standalone]
| From | Aaron Brady <castironpi@gmail.com> |
|---|---|
| Date | 2012-08-17 11:37 -0700 |
| Message-ID | <fe95c29c-2289-4e9c-870e-e3c475f13459@googlegroups.com> |
| In reply to | #27218 |
On Thursday, August 16, 2012 9:24:44 PM UTC-5, Steven D'Aprano wrote: > On Thu, 16 Aug 2012 19:11:19 -0400, Dave Angel wrote: > > > > > On 08/16/2012 05:26 PM, Paul Rubin wrote: > > >> Dave Angel <d@davea.name> writes: > > >>> Everything else is implementation defined. Why should an > > >>> implementation be forced to have ANY extra data structure to detect a > > >>> static bug in the caller's code? > > >> For the same reason the interpreter checks for type errors at runtime > > >> and raises TypeError, instead of letting the program go into the weeds. > > > > > > There's an enormous difference between type errors, which affect the low > > > level dispatch, and checking for whether a dict has changed and may have > > > invalidated the iterator. If we were really going to keep track of what > > > iterators are tracking a given dict or set, why stop there? Why not > > > check if another process has changed a file we're iterating through? Or > > > ... > > > > Which is why Python doesn't do it -- because it is (claimed to be) > > excessively expensive for the benefit that you would get. > > > > Not because it is a matter of principle that data integrity is > > unimportant. Data integrity *is* important, but in the opinion of the > > people who wrote these particular data structures, the effort required to > > guarantee correct iteration in the face of mutation is too expensive for > > the benefit. > > > > Are they right? I don't know. I know that the list sort method goes to a > > lot of trouble to prevent code from modifying lists while they are being > > sorted. During the sort, the list temporarily appears to be empty to > > anything which attempts to access it. So at least sometimes, the Python > > developers spend effort to ensure data integrity. > > > > Luckily, Python is open source. If anyone thinks that sets and dicts > > should include more code protecting against mutation-during-iteration, > > they are more than welcome to come up with a patch. Don't forget unit and > > regression tests, and also a set of timing results which show that the > > slow-down isn't excessive. I contribute a patch some time ago. It wasn't accepted. However this thread seems to show a moderately more favorable sentiment than that one. Is there a problem with hacking on the Beta? Or should I wait for the Release? Does anyone want to help me with the changes? Perhaps P. Rubin could contribute the variation he suggested as well.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2012-08-18 07:57 +1000 |
| Message-ID | <mailman.3435.1345240665.4697.python-list@python.org> |
| In reply to | #27255 |
On Sat, Aug 18, 2012 at 4:37 AM, Aaron Brady <castironpi@gmail.com> wrote: > Is there a problem with hacking on the Beta? Nope. Hack on the beta, then when the release arrives, rebase your work onto it. I doubt that anything of this nature will be changed between now and then. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Aaron Brady <castironpi@gmail.com> |
|---|---|
| Date | 2012-08-18 13:29 -0700 |
| Message-ID | <mailman.3473.1345321756.4697.python-list@python.org> |
| In reply to | #27272 |
On Friday, August 17, 2012 4:57:41 PM UTC-5, Chris Angelico wrote: > On Sat, Aug 18, 2012 at 4:37 AM, Aaron Brady <castironpi@gmail.com> wrote: > > > Is there a problem with hacking on the Beta? > > > > Nope. Hack on the beta, then when the release arrives, rebase your > > work onto it. I doubt that anything of this nature will be changed > > between now and then. > > > > ChrisA Thanks Chris, your post was encouraging. I have a question about involving the 'tp_clear' field of the types. http://docs.python.org/dev/c-api/typeobj.html#PyTypeObject.tp_clear ''' ...The tuple type does not implement a tp_clear function, because it’s possible to prove that no reference cycle can be composed entirely of tuples. ''' I didn't follow the reasoning in the proof; the premise is necessary but IMHO not obviously sufficient. Nevertheless, the earlier diagram contains an overt homogeneous reference cycle. Reposting: http://home.comcast.net/~castironpi-misc/clpy-0062%20set%20iterators.png In my estimate, the 'tp_traverse' and 'tp_clear' fields of the set doesn't need to visit the auxiliary collection; the same fields of the iterators don't need to visit the primary set or other iterators; and references in the linked list don't need to be included in the iterators' reference counts. Can someone who is more familiar with the cycle detector and cycle breaker, help prove or disprove the above?
[toc] | [prev] | [next] | [standalone]
| From | Aaron Brady <castironpi@gmail.com> |
|---|---|
| Date | 2012-08-18 13:29 -0700 |
| Message-ID | <d4708687-2925-421a-b755-969d6dac731a@googlegroups.com> |
| In reply to | #27272 |
On Friday, August 17, 2012 4:57:41 PM UTC-5, Chris Angelico wrote: > On Sat, Aug 18, 2012 at 4:37 AM, Aaron Brady <castironpi@gmail.com> wrote: > > > Is there a problem with hacking on the Beta? > > > > Nope. Hack on the beta, then when the release arrives, rebase your > > work onto it. I doubt that anything of this nature will be changed > > between now and then. > > > > ChrisA Thanks Chris, your post was encouraging. I have a question about involving the 'tp_clear' field of the types. http://docs.python.org/dev/c-api/typeobj.html#PyTypeObject.tp_clear ''' ...The tuple type does not implement a tp_clear function, because it’s possible to prove that no reference cycle can be composed entirely of tuples. ''' I didn't follow the reasoning in the proof; the premise is necessary but IMHO not obviously sufficient. Nevertheless, the earlier diagram contains an overt homogeneous reference cycle. Reposting: http://home.comcast.net/~castironpi-misc/clpy-0062%20set%20iterators.png In my estimate, the 'tp_traverse' and 'tp_clear' fields of the set doesn't need to visit the auxiliary collection; the same fields of the iterators don't need to visit the primary set or other iterators; and references in the linked list don't need to be included in the iterators' reference counts. Can someone who is more familiar with the cycle detector and cycle breaker, help prove or disprove the above?
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2012-08-18 23:14 +0100 |
| Message-ID | <mailman.3476.1345328046.4697.python-list@python.org> |
| In reply to | #27333 |
On 18/08/2012 21:29, Aaron Brady wrote: > On Friday, August 17, 2012 4:57:41 PM UTC-5, Chris Angelico wrote: >> On Sat, Aug 18, 2012 at 4:37 AM, Aaron Brady <castironpi@gmail.com> wrote: >> >> > Is there a problem with hacking on the Beta? >> >> >> >> Nope. Hack on the beta, then when the release arrives, rebase your >> >> work onto it. I doubt that anything of this nature will be changed >> >> between now and then. >> >> >> >> ChrisA > > Thanks Chris, your post was encouraging. > > I have a question about involving the 'tp_clear' field of the types. > > http://docs.python.org/dev/c-api/typeobj.html#PyTypeObject.tp_clear > > ''' > ...The tuple type does not implement a tp_clear function, because it’s possible to prove that no reference cycle can be composed entirely of tuples. > ''' > > I didn't follow the reasoning in the proof; the premise is necessary but IMHO not obviously sufficient. Nevertheless, the earlier diagram contains an overt homogeneous reference cycle. > > Reposting: http://home.comcast.net/~castironpi-misc/clpy-0062%20set%20iterators.png > > In my estimate, the 'tp_traverse' and 'tp_clear' fields of the set doesn't need to visit the auxiliary collection; the same fields of the iterators don't need to visit the primary set or other iterators; and references in the linked list don't need to be included in the iterators' reference counts. > > Can someone who is more familiar with the cycle detector and cycle breaker, help prove or disprove the above? > In simple terms, when you create an immutable object it can contain only references to pre-existing objects, but in order to create a cycle you need to make an object refer to another which is created later, so it's not possible to create a cycle out of immutable objects. However, using Python's C API it _is_ possible to create such a cycle, by mutating an otherwise-immutable tuple (see PyTuple_SetItem and PyTuple_SET_ITEM).
[toc] | [prev] | [next] | [standalone]
| From | Aaron Brady <castironpi@gmail.com> |
|---|---|
| Date | 2012-08-18 19:28 -0700 |
| Message-ID | <mailman.3480.1345343315.4697.python-list@python.org> |
| In reply to | #27335 |
On Saturday, August 18, 2012 5:14:05 PM UTC-5, MRAB wrote: > On 18/08/2012 21:29, Aaron Brady wrote: > > > On Friday, August 17, 2012 4:57:41 PM UTC-5, Chris Angelico wrote: > > >> On Sat, Aug 18, 2012 at 4:37 AM, Aaron Brady <castironpi@gmail.com> wrote: > > >> > > >> > Is there a problem with hacking on the Beta? > > >> > > >> > > >> > > >> Nope. Hack on the beta, then when the release arrives, rebase your > > >> > > >> work onto it. I doubt that anything of this nature will be changed > > >> > > >> between now and then. > > >> > > >> > > >> > > >> ChrisA > > > > > > Thanks Chris, your post was encouraging. > > > > > > I have a question about involving the 'tp_clear' field of the types. > > > > > > http://docs.python.org/dev/c-api/typeobj.html#PyTypeObject.tp_clear > > > > > > ''' > > > ...The tuple type does not implement a tp_clear function, because it’s possible to prove that no reference cycle can be composed entirely of tuples. > > > ''' > > > > > > I didn't follow the reasoning in the proof; the premise is necessary but IMHO not obviously sufficient. Nevertheless, the earlier diagram contains an overt homogeneous reference cycle. > > > > > > Reposting: http://home.comcast.net/~castironpi-misc/clpy-0062%20set%20iterators.png > > > > > > In my estimate, the 'tp_traverse' and 'tp_clear' fields of the set doesn't need to visit the auxiliary collection; the same fields of the iterators don't need to visit the primary set or other iterators; and references in the linked list don't need to be included in the iterators' reference counts. > > > > > > Can someone who is more familiar with the cycle detector and cycle breaker, help prove or disprove the above? > > > > > In simple terms, when you create an immutable object it can contain > > only references to pre-existing objects, but in order to create a cycle > > you need to make an object refer to another which is created later, so > > it's not possible to create a cycle out of immutable objects. > > > > However, using Python's C API it _is_ possible to create such a cycle, > > by mutating an otherwise-immutable tuple (see PyTuple_SetItem and > > PyTuple_SET_ITEM). Are there any precedents for storing uncounted references to PyObject's? One apparent problematic case is creating an iterator to a set, then adding it to the set. However the operation is a modification, and causes the iterator to be removed from the secondary list before the set is examined for collection. Otherwise, the iterator keeps a counted reference to the set, but the set does not keep a counted reference to the iterator, so the iterator will always be freed first. Therefore, the set's secondary list will be empty when the set is freed. Concurrent addition and deletion of iterators should be disabled, and the iterators should remove themselves from the set's secondary list before they decrement their references to the set. Please refresh the earlier diagram; counted references are distinguished separately. Reposting: http://home.comcast.net/~castironpi-misc/clpy-0062%20set%20iterators.png
[toc] | [prev] | [next] | [standalone]
| From | Aaron Brady <castironpi@gmail.com> |
|---|---|
| Date | 2012-08-23 09:49 -0700 |
| Message-ID | <c7452db1-5b78-4d54-81a1-1c8683631d6e@googlegroups.com> |
| In reply to | #27339 |
On Saturday, August 18, 2012 9:28:32 PM UTC-5, Aaron Brady wrote: > On Saturday, August 18, 2012 5:14:05 PM UTC-5, MRAB wrote: > > > On 18/08/2012 21:29, Aaron Brady wrote: > > > > On Friday, August 17, 2012 4:57:41 PM UTC-5, Chris Angelico wrote: > > > >> On Sat, Aug 18, 2012 at 4:37 AM, Aaron Brady <castironpi@gmail.com> wrote: > > > >> > Is there a problem with hacking on the Beta? > > > >> Nope. Hack on the beta, then when the release arrives, rebase your > > > >> work onto it. I doubt that anything of this nature will be changed > > > >> between now and then. > > > >> ChrisA > > > > Thanks Chris, your post was encouraging. > > > > I have a question about involving the 'tp_clear' field of the types. > > > > http://docs.python.org/dev/c-api/typeobj.html#PyTypeObject.tp_clear > > > > ''' > > > > ...The tuple type does not implement a tp_clear function, because it’s possible to prove that no reference cycle can be composed entirely of tuples. > > > > ''' > > > > I didn't follow the reasoning in the proof; the premise is necessary but IMHO not obviously sufficient. Nevertheless, the earlier diagram contains an overt homogeneous reference cycle. > > > > Reposting: http://home.comcast.net/~castironpi-misc/clpy-0062%20set%20iterators.png > > > > In my estimate, the 'tp_traverse' and 'tp_clear' fields of the set doesn't need to visit the auxiliary collection; the same fields of the iterators don't need to visit the primary set or other iterators; and references in the linked list don't need to be included in the iterators' reference counts. > > > > Can someone who is more familiar with the cycle detector and cycle breaker, help prove or disprove the above? > > > In simple terms, when you create an immutable object it can contain > > > only references to pre-existing objects, but in order to create a cycle > > > you need to make an object refer to another which is created later, so > > > it's not possible to create a cycle out of immutable objects. > > > However, using Python's C API it _is_ possible to create such a cycle, > > > by mutating an otherwise-immutable tuple (see PyTuple_SetItem and > > > PyTuple_SET_ITEM). > > Are there any precedents for storing uncounted references to PyObject's? > > One apparent problematic case is creating an iterator to a set, then adding it to the set. However the operation is a modification, and causes the iterator to be removed from the secondary list before the set is examined for collection. > > Otherwise, the iterator keeps a counted reference to the set, but the set does not keep a counted reference to the iterator, so the iterator will always be freed first. Therefore, the set's secondary list will be empty when the set is freed. > > Concurrent addition and deletion of iterators should be disabled, and the iterators should remove themselves from the set's secondary list before they decrement their references to the set. > > Please refresh the earlier diagram; counted references are distinguished separately. Reposting: http://home.comcast.net/~castironpi-misc/clpy-0062%20set%20iterators.png The patch for the above is only 40-60 lines. However it introduces two new concepts. The first is a "linked list", a classic dynamic data structure, first developed in 1955, cf. http://en.wikipedia.org/wiki/Linked_list . Linked lists are absent in Python, including the standard library and CPython implementation, beyond the weak reference mechanism and garbage collector. The "collections.deque" structure shares some of the linked list interface but uses arrays. The second is "uncounted references". The uncounted references are references to "set iterators" exclusively, exist only internally to "set" objects, and are invisible to the rest of the program. The reason for the exception is that iterators are unique in the Python Data Model; iterators consist of a single immutable reference, unlike both immutable types such as strings and numbers, as well as container types. Counted references could be used instead, but would be consistently wasted work for the garbage collector, though the benefit to programmers' peace of mind could be significant. Please share your opinion! Do you agree that the internal list resolves the inconsistency? Do you agree with the strategy? Do you agree that uncounted references are justified to introduce, or are counted references preferable?
[toc] | [prev] | [next] | [standalone]
| From | Aaron Brady <castironpi@gmail.com> |
|---|---|
| Date | 2012-08-23 09:49 -0700 |
| Message-ID | <mailman.3721.1345740590.4697.python-list@python.org> |
| In reply to | #27339 |
On Saturday, August 18, 2012 9:28:32 PM UTC-5, Aaron Brady wrote: > On Saturday, August 18, 2012 5:14:05 PM UTC-5, MRAB wrote: > > > On 18/08/2012 21:29, Aaron Brady wrote: > > > > On Friday, August 17, 2012 4:57:41 PM UTC-5, Chris Angelico wrote: > > > >> On Sat, Aug 18, 2012 at 4:37 AM, Aaron Brady <castironpi@gmail.com> wrote: > > > >> > Is there a problem with hacking on the Beta? > > > >> Nope. Hack on the beta, then when the release arrives, rebase your > > > >> work onto it. I doubt that anything of this nature will be changed > > > >> between now and then. > > > >> ChrisA > > > > Thanks Chris, your post was encouraging. > > > > I have a question about involving the 'tp_clear' field of the types. > > > > http://docs.python.org/dev/c-api/typeobj.html#PyTypeObject.tp_clear > > > > ''' > > > > ...The tuple type does not implement a tp_clear function, because it’s possible to prove that no reference cycle can be composed entirely of tuples. > > > > ''' > > > > I didn't follow the reasoning in the proof; the premise is necessary but IMHO not obviously sufficient. Nevertheless, the earlier diagram contains an overt homogeneous reference cycle. > > > > Reposting: http://home.comcast.net/~castironpi-misc/clpy-0062%20set%20iterators.png > > > > In my estimate, the 'tp_traverse' and 'tp_clear' fields of the set doesn't need to visit the auxiliary collection; the same fields of the iterators don't need to visit the primary set or other iterators; and references in the linked list don't need to be included in the iterators' reference counts. > > > > Can someone who is more familiar with the cycle detector and cycle breaker, help prove or disprove the above? > > > In simple terms, when you create an immutable object it can contain > > > only references to pre-existing objects, but in order to create a cycle > > > you need to make an object refer to another which is created later, so > > > it's not possible to create a cycle out of immutable objects. > > > However, using Python's C API it _is_ possible to create such a cycle, > > > by mutating an otherwise-immutable tuple (see PyTuple_SetItem and > > > PyTuple_SET_ITEM). > > Are there any precedents for storing uncounted references to PyObject's? > > One apparent problematic case is creating an iterator to a set, then adding it to the set. However the operation is a modification, and causes the iterator to be removed from the secondary list before the set is examined for collection. > > Otherwise, the iterator keeps a counted reference to the set, but the set does not keep a counted reference to the iterator, so the iterator will always be freed first. Therefore, the set's secondary list will be empty when the set is freed. > > Concurrent addition and deletion of iterators should be disabled, and the iterators should remove themselves from the set's secondary list before they decrement their references to the set. > > Please refresh the earlier diagram; counted references are distinguished separately. Reposting: http://home.comcast.net/~castironpi-misc/clpy-0062%20set%20iterators.png The patch for the above is only 40-60 lines. However it introduces two new concepts. The first is a "linked list", a classic dynamic data structure, first developed in 1955, cf. http://en.wikipedia.org/wiki/Linked_list . Linked lists are absent in Python, including the standard library and CPython implementation, beyond the weak reference mechanism and garbage collector. The "collections.deque" structure shares some of the linked list interface but uses arrays. The second is "uncounted references". The uncounted references are references to "set iterators" exclusively, exist only internally to "set" objects, and are invisible to the rest of the program. The reason for the exception is that iterators are unique in the Python Data Model; iterators consist of a single immutable reference, unlike both immutable types such as strings and numbers, as well as container types. Counted references could be used instead, but would be consistently wasted work for the garbage collector, though the benefit to programmers' peace of mind could be significant. Please share your opinion! Do you agree that the internal list resolves the inconsistency? Do you agree with the strategy? Do you agree that uncounted references are justified to introduce, or are counted references preferable?
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-08-23 18:11 +0000 |
| Message-ID | <50367242$0$6574$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #27745 |
On Thu, 23 Aug 2012 09:49:41 -0700, Aaron Brady wrote: [...] > The patch for the above is only 40-60 lines. However it introduces two > new concepts. > > The first is a "linked list", a classic dynamic data structure, first > developed in 1955, cf. http://en.wikipedia.org/wiki/Linked_list . > Linked lists are absent in Python They certainly are not. There's merely no named "linked list" class. Linked lists are used by collections.ChainMap, tracebacks, xml.dom, Abstract Syntax Trees, and probably many other places. (Well, technically some of these are trees rather than lists.) You can trivially create a linked list: x = [a, [b, [c, [d, [e, None]]]]] is equivalent to a singly-linked list with five nodes. Only less efficient. > The second is "uncounted references". The uncounted references are > references to "set iterators" exclusively, exist only internally to > "set" objects, and are invisible to the rest of the program. The reason > for the exception is that iterators are unique in the Python Data Model; > iterators consist of a single immutable reference, unlike both immutable > types such as strings and numbers, as well as container types. Counted > references could be used instead, but would be consistently wasted work > for the garbage collector, though the benefit to programmers' peace of > mind could be significant. The usual way to implement "uncounted references" is by using weakrefs. Why invent yet another form of weakref? -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Aaron Brady <castironpi@gmail.com> |
|---|---|
| Date | 2012-08-18 19:28 -0700 |
| Message-ID | <5325fd9d-e7d8-4549-9e36-aa0553870308@googlegroups.com> |
| In reply to | #27335 |
On Saturday, August 18, 2012 5:14:05 PM UTC-5, MRAB wrote: > On 18/08/2012 21:29, Aaron Brady wrote: > > > On Friday, August 17, 2012 4:57:41 PM UTC-5, Chris Angelico wrote: > > >> On Sat, Aug 18, 2012 at 4:37 AM, Aaron Brady <castironpi@gmail.com> wrote: > > >> > > >> > Is there a problem with hacking on the Beta? > > >> > > >> > > >> > > >> Nope. Hack on the beta, then when the release arrives, rebase your > > >> > > >> work onto it. I doubt that anything of this nature will be changed > > >> > > >> between now and then. > > >> > > >> > > >> > > >> ChrisA > > > > > > Thanks Chris, your post was encouraging. > > > > > > I have a question about involving the 'tp_clear' field of the types. > > > > > > http://docs.python.org/dev/c-api/typeobj.html#PyTypeObject.tp_clear > > > > > > ''' > > > ...The tuple type does not implement a tp_clear function, because it’s possible to prove that no reference cycle can be composed entirely of tuples. > > > ''' > > > > > > I didn't follow the reasoning in the proof; the premise is necessary but IMHO not obviously sufficient. Nevertheless, the earlier diagram contains an overt homogeneous reference cycle. > > > > > > Reposting: http://home.comcast.net/~castironpi-misc/clpy-0062%20set%20iterators.png > > > > > > In my estimate, the 'tp_traverse' and 'tp_clear' fields of the set doesn't need to visit the auxiliary collection; the same fields of the iterators don't need to visit the primary set or other iterators; and references in the linked list don't need to be included in the iterators' reference counts. > > > > > > Can someone who is more familiar with the cycle detector and cycle breaker, help prove or disprove the above? > > > > > In simple terms, when you create an immutable object it can contain > > only references to pre-existing objects, but in order to create a cycle > > you need to make an object refer to another which is created later, so > > it's not possible to create a cycle out of immutable objects. > > > > However, using Python's C API it _is_ possible to create such a cycle, > > by mutating an otherwise-immutable tuple (see PyTuple_SetItem and > > PyTuple_SET_ITEM). Are there any precedents for storing uncounted references to PyObject's? One apparent problematic case is creating an iterator to a set, then adding it to the set. However the operation is a modification, and causes the iterator to be removed from the secondary list before the set is examined for collection. Otherwise, the iterator keeps a counted reference to the set, but the set does not keep a counted reference to the iterator, so the iterator will always be freed first. Therefore, the set's secondary list will be empty when the set is freed. Concurrent addition and deletion of iterators should be disabled, and the iterators should remove themselves from the set's secondary list before they decrement their references to the set. Please refresh the earlier diagram; counted references are distinguished separately. Reposting: http://home.comcast.net/~castironpi-misc/clpy-0062%20set%20iterators.png
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2012-08-16 17:43 -0600 |
| Message-ID | <mailman.3405.1345160667.4697.python-list@python.org> |
| In reply to | #27202 |
On Thu, Aug 16, 2012 at 5:11 PM, Dave Angel <davea@dejaviewphoto.com> wrote: > There's an enormous difference between type errors, which affect the low > level dispatch, and checking for whether a dict has changed and may have > invalidated the iterator. If we were really going to keep track of what > iterators are tracking a given dict or set, why stop there? Why not > check if another process has changed a file we're iterating through? Or ... How does this affect low-level dispatch (Python 2.7)? >>> class Foo(object): ... def bar(self): ... return self ... >>> Foo().bar() <__main__.Foo object at 0x00CBEAB0> >>> Foo.bar(Foo()) <__main__.Foo object at 0x00CC9390> >>> Foo.bar(object()) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unbound method bar() must be called with Foo instance as first argument (got object instance instead) There is no low-level need for this TypeError -- it's purely a case of not letting the developer shoot himself in the foot. Although to be honest the interpreter doesn't give quite enough rope (to mix metaphors) in this case, and I'm glad for the sake of duck typing that they removed this particular error in Python 3. With regard to key insertion and deletion while iterating over a dict or set, though, there is just no good reason to be doing that (especially as the result is very implementation-specific), and I wouldn't mind a more complete low-level check against it as long as it's not too expensive (which is not clearly the case with the current suggestion at all).
[toc] | [prev] | [next] | [standalone]
| From | Paul Rubin <no.email@nospam.invalid> |
|---|---|
| Date | 2012-08-16 18:01 -0700 |
| Message-ID | <7xvcgi8h70.fsf@ruckus.brouhaha.com> |
| In reply to | #27214 |
Ian Kelly <ian.g.kelly@gmail.com> writes: > With regard to key insertion and deletion while iterating over a dict > or set, though, there is just no good reason to be doing that > (especially as the result is very implementation-specific), and I > wouldn't mind a more complete low-level check against it as long as > it's not too expensive (which is not clearly the case with the current > suggestion at all). One possible approach is to freeze the dictionary against modification while any iterator is open on it. You could keep a count of active iterators in the dict structure, adjusting it whenever an iterator is created or closed/destroyed.
[toc] | [prev] | [next] | [standalone]
| From | Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> |
|---|---|
| Date | 2012-08-17 13:16 +0200 |
| Message-ID | <lol1g9-42s.ln1@satorlaser.homedns.org> |
| In reply to | #27216 |
Am 17.08.2012 03:01, schrieb Paul Rubin: > Ian Kelly <ian.g.kelly@gmail.com> writes: >> With regard to key insertion and deletion while iterating over a dict >> or set, though, there is just no good reason to be doing that >> (especially as the result is very implementation-specific), and I >> wouldn't mind a more complete low-level check against it as long as >> it's not too expensive (which is not clearly the case with the current >> suggestion at all). > > One possible approach is to freeze the dictionary against modification > while any iterator is open on it. You could keep a count of active > iterators in the dict structure, adjusting it whenever an iterator is > created or closed/destroyed. What if there is an iterator left over from a loop that was terminated early? That could block access to the sequence even though nothing is /really/ iterating over it. I personally prefer a reliable error, at least when __debug__ is set. Someone suggested a timestamp or a list of active iterators, which both sound reasonable. The two should be O(1) and O(#iterators) in complexity on all mutating operations and O(1) on iteration, so they should be acceptable. With a C implementation it probably boils down to very few cycles (checking a pointer/incrementing an integer). I can't say if this is feasible without compromising performance though, at the very least it requires an additional member in all dicts and iterators. Uli
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.lang.python
csiph-web