Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #28281

Re: set and dict iteration

Newsgroups comp.lang.python
Date 2012-09-02 10:43 -0700
References (7 earlier) <d4708687-2925-421a-b755-969d6dac731a@googlegroups.com> <mailman.3476.1345328046.4697.python-list@python.org> <mailman.3480.1345343315.4697.python-list@python.org> <c7452db1-5b78-4d54-81a1-1c8683631d6e@googlegroups.com> <mailman.3883.1346095064.4697.python-list@python.org>
Subject Re: set and dict iteration
From Aaron Brady <castironpi@gmail.com>
Message-ID <mailman.94.1346607808.27098.python-list@python.org> (permalink)

Show all headers | View raw


On Monday, August 27, 2012 2:17:45 PM UTC-5, Ian wrote:
> On Thu, Aug 23, 2012 at 10:49 AM, Aaron Brady <castironpi@gmail.com> wrote:
> 
> > The patch for the above is only 40-60 lines.  However it introduces two new concepts.
> 
> 
> 
> Is there a link to the patch?

Please see below.  It grew somewhat during development.

> > The first is a "linked list".
SNIP.
 
> > The second is "uncounted references".  The uncounted references are references to "set iterators" exclusively, exist only internally to "set" objects, and are invisible to the rest of the program.  The reason for the exception is that iterators are unique in the Python Data Model; iterators consist of a single immutable reference, unlike both immutable types such as strings and numbers, as well as container types.  Counted references could be used instead, but would be consistently wasted work for the garbage collector, though the benefit to programmers' peace of mind could be significant.
> 
> >
> 
> > Please share your opinion!  Do you agree that the internal list resolves the inconsistency?  Do you agree with the strategy?  Do you agree that uncounted references are justified to introduce, or are counted references preferable?
> 
> 
> 
> This feature is a hard sell as it is; I think that adding uncounted
> 
> references into the mix is only going to make that worse.  May I
> 
> suggest an alternate approach?  Internally tag each set or dict with a
> 
> "version", which is just a C int.  Every time the hash table is
> 
> modified, increment the version.  When an iterator is created, store
> 
> the current version on the iterator.  When the iterator is advanced,
> 
> check that the iterator version matches the dict/set version.  If
> 
> they're not equal, raise an error.
> 
> 
> 
> This should add less overhead than the linked list without any
> 
> concerns about reference counting.  It does introduce a small bug in
> 
> that an error condition could be "missed", if the version is
> 
> incremented a multiple of 2**32 or 2**64 times between iterations --
> 
> but how often is that really likely to occur?  Bearing in mind that
> 
> this error is meant for debugging and not production error handling,
> 
> you could even make the version a single byte and I'd still be fine
> 
> with that.
> 
> 
> 
> Cheers,
> 
> Ian


Hi Ian,

We could use a Python long object for the version index to prevent overflow.  Combined with P. Rubin's idea to count the number of open iterators, most use cases still wouldn't exceed a single word comparison; we could reset the counter when there weren't any.  Using the linked list collection, modification operations are expensive in rare cases.  Using the version index, iteration is expensive in rare cases.

I was more interested in the linked list for conceptual reasons, so I developed it further.  Changelog, diff file, test suite, and links are below.  The devs should be aware that a competing patch might be developed.  I would be pleased to hear what everybody thinks of it!

Linked list with uncounted references implementation for Win32.

Added:
- 'set_clear_ex' and 'set_clear_internal_ex' methods, differ in invalidation and conditional invalidation behavior and return type..  The 'set.clear()' method and 'tp_clear' type field both called the same method.
- 'set_invalidate_iter_linked' method.  Iterate over the iterators of a set, mark them invalid, and clear the list.
- 'setiter_unlink_internal' method.  Remove the iterator from the set's linked list of iterators.
- 'IterationError', global.
- New fields:
-- PySetObject: setiterobject *iter_linked.  Pointer to the first element of the linked list of the iterators of the set.
-- setiterobject: setiterobject *linked_pred, *linked_succ.  Predecessor and successor nodes in the linked list of iterators of the same set.
-- setiterobject: char ob_valid.  Validation status of the iterator.
- Result is compared with original in 'set_intersection_update' and '_multi' to determine whether to invalidate the list of iterators.  Asymptotic running time is unchanged.
- Pending: add 'tp_clear' field to 'PySetIter_Type'?
- Test script included, 'large numbers' test pending.

6 files changed: { setobject.h, setobject.c, exceptions.c, pyerrors.h, python3.def, python33stub.def }.  Test script 'set_iterator_test.py' new.  Linked list interface and pseudocode 'patch_pseudocode.txt'.
Zip file:
http://home.comcast.net/~castironpi-misc/clpy-0062-set_iterator_patch.zip
Diff file of 3.3.0b2:
http://home.comcast.net/~castironpi-misc/clpy-0062-set_iterator_diff.txt

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-16 11:00 -0700
  Re: set and dict iteration Dave Angel <d@davea.name> - 2012-08-16 15:49 -0400
    Re: set and dict iteration Paul Rubin <no.email@nospam.invalid> - 2012-08-16 14:26 -0700
      Re: set and dict iteration Dave Angel <davea@dejaviewphoto.com> - 2012-08-16 19:11 -0400
        Re: set and dict iteration Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-17 02:24 +0000
          Re: set and dict iteration Paul Rubin <no.email@nospam.invalid> - 2012-08-16 19:30 -0700
            Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-17 11:11 -0700
          Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-17 11:37 -0700
            Re: set and dict iteration Chris Angelico <rosuav@gmail.com> - 2012-08-18 07:57 +1000
              Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-18 13:29 -0700
              Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-18 13:29 -0700
                Re: set and dict iteration MRAB <python@mrabarnett.plus.com> - 2012-08-18 23:14 +0100
                Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-18 19:28 -0700
                Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-23 09:49 -0700
                Re: set and dict iteration Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-27 13:17 -0600
                Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-09-02 10:43 -0700
                Re: set and dict iteration Ian Kelly <ian.g.kelly@gmail.com> - 2012-09-03 13:29 -0600
                Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-09-03 13:04 -0700
                Re: set and dict iteration Dave Angel <d@davea.name> - 2012-09-03 16:27 -0400
                Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-09-03 17:24 -0700
                Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-09-03 17:24 -0700
                Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-09-03 13:04 -0700
                Re: set and dict iteration Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-04 01:26 +0000
                Re: set and dict iteration Dave Angel <d@davea.name> - 2012-09-03 21:50 -0400
                Re: set and dict iteration Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-04 01:59 +0000
                Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-09-08 08:42 -0700
                Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-09-02 10:43 -0700
                Re: set and dict iteration Serhiy Storchaka <storchaka@gmail.com> - 2012-09-04 00:46 +0300
                Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-23 09:49 -0700
                Re: set and dict iteration Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-23 18:11 +0000
                Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-27 11:10 -0700
                Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-09-08 15:07 -0700
                Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-18 19:28 -0700
                Re: set and dict iteration Thomas Rachel <nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de> - 2012-09-08 22:06 +0200
                Re: set and dict iteration Hans Mulder <hansmu@xs4all.nl> - 2012-09-08 23:18 +0200
                Re: set and dict iteration MRAB <python@mrabarnett.plus.com> - 2012-09-08 22:22 +0100
      Re: set and dict iteration Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-16 17:43 -0600
        Re: set and dict iteration Paul Rubin <no.email@nospam.invalid> - 2012-08-16 18:01 -0700
          Re: set and dict iteration Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> - 2012-08-17 13:16 +0200
          Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-17 11:03 -0700
          Re: set and dict iteration 88888 Dihedral <dihedral88888@googlemail.com> - 2012-09-10 13:14 -0700
  Re: set and dict iteration Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-16 16:55 -0600
  Re: set and dict iteration Ian Kelly <ian.g.kelly@gmail.com> - 2012-08-16 17:07 -0600
    Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-17 10:47 -0700
    Re: set and dict iteration Aaron Brady <castironpi@gmail.com> - 2012-08-17 10:47 -0700

csiph-web