Path: csiph.com!usenet.pasdenom.info!news.albasani.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'exception': 0.03; 'debugging': 0.05; 'diff': 0.05; 'method.': 0.05; 'lines.': 0.07; 'matches': 0.07; 'objects,': 0.07; 'python': 0.09; 'any.': 0.09; 'created,': 0.09; 'dict': 0.09; 'garbage': 0.09; 'handling,': 0.09; 'immutable': 0.09; 'internally': 0.09; 'invalidate': 0.09; 'iterate': 0.09; 'methods,': 0.09; 'predecessor': 0.09; 'subject:set': 0.09; 'to:addr:comp.lang.python': 0.09; 'bug': 0.10; 'cc:addr:python-list': 0.10; "wouldn't": 0.11; 'suggest': 0.11; 'aug': 0.13; 'index': 0.13; 'cases': 0.15; 'file,': 0.15; 'modification': 0.15; '"set': 0.16; '10:49': 0.16; 'added:': 0.16; 'advanced,': 0.16; 'conditional': 0.16; 'devs': 0.16; 'fields:': 0.16; 'included,': 0.16; 'increment': 0.16; 'incremented': 0.16; 'int.': 0.16; 'introduces': 0.16; 'invisible': 0.16; 'is;': 0.16; 'iteration': 0.16; 'iterator': 0.16; 'iterator.': 0.16; 'iterators': 0.16; 'iterators,': 0.16; 'justified': 0.16; 'nodes': 0.16; 'set,': 0.16; 'successor': 0.16; 'worse.': 0.16; 'wrote:': 0.17; 'alternate': 0.17; 'byte': 0.17; 'char': 0.17; 'differ': 0.17; 'element': 0.17; 'pointer': 0.17; 'thu,': 0.17; 'version.': 0.17; 'error.': 0.21; 'meant': 0.21; 'file:': 0.22; "i'd": 0.22; 'cheers,': 0.23; 'cc:2**0': 0.23; 'monday,': 0.23; 'references': 0.23; 'patch': 0.24; 'raise': 0.24; 'idea': 0.24; 'second': 0.24; 'feature': 0.24; 'script': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'am,': 0.27; 'prevent': 0.27; 'mix': 0.27; 'set.': 0.27; 'interface': 0.27; 'rest': 0.28; 'fine': 0.28; 'cases.': 0.29; 'container': 0.29; 'hash': 0.29; 'index,': 0.29; 'modified,': 0.29; 'overhead': 0.29; 'thinks': 0.29; 'types.': 0.29; 'reset': 0.29; "skip:' 10": 0.30; 'that.': 0.30; 'unlike': 0.30; 'error': 0.30; 'file': 0.32; "skip:' 20": 0.32; 'running': 0.32; 'could': 0.32; 'consist': 0.33; 'instead,': 0.33; 'url:home': 0.33; 'likely': 0.33; 'operations': 0.33; 'version': 0.34; 'agree': 0.34; 'received:google.com': 0.34; 'list': 0.35; 'clear': 0.35; 'compared': 0.35; 'exist': 0.35; 'open': 0.35; 'table': 0.35; 'received:209.85': 0.35; 'there': 0.35; 'list.': 0.35; 'add': 0.36; 'really': 0.36; 'but': 0.36; 'development.': 0.36; 'expensive': 0.36; 'method': 0.36; 'test': 0.36; 'should': 0.36; 'does': 0.37; 'two': 0.37; 'times': 0.63; 'hear': 0.63; 'more': 0.63; 'behavior': 0.64; 'it!': 0.64; 'concerns': 0.65; 'exceed': 0.65; 'reasons,': 0.65; 'august': 0.66; 'pleased': 0.66; 'below.': 0.68; 'everybody': 0.69; 'benefit': 0.70; 'introduce': 0.80; 'counting.': 0.84; 'developed.': 0.84; 'ian,': 0.84; 'url:zip': 0.84; 'grew': 0.91 Newsgroups: comp.lang.python Date: Sun, 2 Sep 2012 10:43:20 -0700 (PDT) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=67.184.78.189; posting-account=MQ3pigoAAACeFzUFjVAePnOjOJMNlvq9 References: <7xy5le7cli.fsf@ruckus.brouhaha.com> <502dab6c$0$29978$c3e8da3$5496439d@news.astraweb.com> User-Agent: G2/1.0 X-Google-Web-Client: true X-Google-IP: 67.184.78.189 MIME-Version: 1.0 Subject: Re: set and dict iteration From: Aaron Brady To: comp.lang.python@googlegroups.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Python X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Message-ID: Lines: 120 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1346607808 news.xs4all.nl 6843 [2001:888:2000:d::a6]:48646 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:28281 On Monday, August 27, 2012 2:17:45 PM UTC-5, Ian wrote: > On Thu, Aug 23, 2012 at 10:49 AM, Aaron Brady wrot= e: >=20 > > The patch for the above is only 40-60 lines. However it introduces two= new concepts. >=20 >=20 >=20 > Is there a link to the patch? Please see below. It grew somewhat during development. > > The first is a "linked list". SNIP. =20 > > The second is "uncounted references". The uncounted references are ref= erences to "set iterators" exclusively, exist only internally to "set" obje= cts, and are invisible to the rest of the program. The reason for the exce= ption is that iterators are unique in the Python Data Model; iterators cons= ist of a single immutable reference, unlike both immutable types such as st= rings and numbers, as well as container types. Counted references could be= used instead, but would be consistently wasted work for the garbage collec= tor, though the benefit to programmers' peace of mind could be significant. >=20 > > >=20 > > Please share your opinion! Do you agree that the internal list resolve= s the inconsistency? Do you agree with the strategy? Do you agree that un= counted references are justified to introduce, or are counted references pr= eferable? >=20 >=20 >=20 > This feature is a hard sell as it is; I think that adding uncounted >=20 > references into the mix is only going to make that worse. May I >=20 > suggest an alternate approach? Internally tag each set or dict with a >=20 > "version", which is just a C int. Every time the hash table is >=20 > modified, increment the version. When an iterator is created, store >=20 > the current version on the iterator. When the iterator is advanced, >=20 > check that the iterator version matches the dict/set version. If >=20 > they're not equal, raise an error. >=20 >=20 >=20 > This should add less overhead than the linked list without any >=20 > concerns about reference counting. It does introduce a small bug in >=20 > that an error condition could be "missed", if the version is >=20 > incremented a multiple of 2**32 or 2**64 times between iterations -- >=20 > but how often is that really likely to occur? Bearing in mind that >=20 > this error is meant for debugging and not production error handling, >=20 > you could even make the version a single byte and I'd still be fine >=20 > with that. >=20 >=20 >=20 > Cheers, >=20 > Ian Hi Ian, We could use a Python long object for the version index to prevent overflow= . Combined with P. Rubin's idea to count the number of open iterators, mos= t use cases still wouldn't exceed a single word comparison; we could reset = the counter when there weren't any. Using the linked list collection, modi= fication operations are expensive in rare cases. Using the version index, = iteration is expensive in rare cases. I was more interested in the linked list for conceptual reasons, so I devel= oped it further. Changelog, diff file, test suite, and links are below. T= he devs should be aware that a competing patch might be developed. I would= be pleased to hear what everybody thinks of it! Linked list with uncounted references implementation for Win32. Added: - 'set_clear_ex' and 'set_clear_internal_ex' methods, differ in invalidatio= n and conditional invalidation behavior and return type.. The 'set.clear()= ' method and 'tp_clear' type field both called the same method. - 'set_invalidate_iter_linked' method. Iterate over the iterators of a set= , mark them invalid, and clear the list. - 'setiter_unlink_internal' method. Remove the iterator from the set's lin= ked list of iterators. - 'IterationError', global. - New fields: -- PySetObject: setiterobject *iter_linked. Pointer to the first element o= f the linked list of the iterators of the set. -- setiterobject: setiterobject *linked_pred, *linked_succ. Predecessor an= d successor nodes in the linked list of iterators of the same set. -- setiterobject: char ob_valid. Validation status of the iterator. - Result is compared with original in 'set_intersection_update' and '_multi= ' to determine whether to invalidate the list of iterators. Asymptotic run= ning time is unchanged. - Pending: add 'tp_clear' field to 'PySetIter_Type'? - Test script included, 'large numbers' test pending. 6 files changed: { setobject.h, setobject.c, exceptions.c, pyerrors.h, pyth= on3.def, python33stub.def }. Test script 'set_iterator_test.py' new. Link= ed list interface and pseudocode 'patch_pseudocode.txt'. Zip file: http://home.comcast.net/~castironpi-misc/clpy-0062-set_iterator_patch.zip Diff file of 3.3.0b2: http://home.comcast.net/~castironpi-misc/clpy-0062-set_iterator_diff.txt