Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'python,': 0.02; 'exception': 0.03; 'cpython': 0.05; 'debugging': 0.05; 'lines.': 0.07; 'matches': 0.07; 'objects,': 0.07; 'python': 0.09; 'absent': 0.09; 'created,': 0.09; 'dict': 0.09; 'garbage': 0.09; 'handling,': 0.09; 'immutable': 0.09; 'internally': 0.09; 'received:mail-lpp01m010-f46.google.com': 0.09; 'structure,': 0.09; 'subject:set': 0.09; 'weak': 0.09; 'bug': 0.10; 'suggest': 0.11; 'aug': 0.13; 'library': 0.15; '"set': 0.16; '10:49': 0.16; 'advanced,': 0.16; 'arrays.': 0.16; 'increment': 0.16; 'incremented': 0.16; 'int.': 0.16; 'introduces': 0.16; 'invisible': 0.16; 'is;': 0.16; 'iterator': 0.16; 'iterator.': 0.16; 'iterators': 0.16; 'justified': 0.16; 'worse.': 0.16; 'wrote:': 0.17; 'alternate': 0.17; 'byte': 0.17; 'mechanism': 0.17; 'thu,': 0.17; 'version.': 0.17; 'error.': 0.21; 'meant': 0.21; "i'd": 0.22; 'cheers,': 0.23; 'references': 0.23; 'patch': 0.24; 'raise': 0.24; 'second': 0.24; 'feature': 0.24; 'header:In- Reply-To:1': 0.25; 'url:wiki': 0.26; 'am,': 0.27; 'mix': 0.27; 'message-id:@mail.gmail.com': 0.27; 'interface': 0.27; 'rest': 0.28; 'fine': 0.28; 'container': 0.29; 'hash': 0.29; 'modified,': 0.29; 'overhead': 0.29; 'types.': 0.29; 'url:wikipedia': 0.29; 'that.': 0.30; 'received:209.85.215.46': 0.30; 'unlike': 0.30; 'error': 0.30; 'lists': 0.31; 'structure': 0.32; 'could': 0.32; 'consist': 0.33; 'instead,': 0.33; 'to:addr:python-list': 0.33; 'likely': 0.33; 'version': 0.34; 'agree': 0.34; 'received:google.com': 0.34; 'list': 0.35; 'exist': 0.35; 'table': 0.35; 'received:209.85': 0.35; 'there': 0.35; 'add': 0.36; 'really': 0.36; 'but': 0.36; 'url:org': 0.36; 'should': 0.36; 'beyond': 0.37; 'does': 0.37; 'two': 0.37; 'uses': 0.37; 'received:209': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'store': 0.38; 'some': 0.38; 'url:en': 0.38; 'to:addr:python.org': 0.39; 'skip:" 10': 0.40; 'header:Received:5': 0.40; 'think': 0.40; 'your': 0.60; 'link': 0.60; 'share': 0.61; 'first': 0.61; 'developed': 0.62; 'peace': 0.62; 'between': 0.63; 'production': 0.63; 'times': 0.63; 'concerns': 0.65; 'benefit': 0.70; 'introduce': 0.80; 'counting.': 0.84; 'to:name:python': 0.84; 'shares': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=ijoCRsVlq+zcufSIblbOOBOVwH1aC0c5lnwH347cLrY=; b=hx5NybZBGzLI8XQsJ9IypUoJlN30YIhOxTKbmkQ4HQL9AfOQAUkOpjSeDyWnojITCO msDUpFjKZRkU2Z1buaNI+iuxsbw15yQ0ul4oTi4hlaqR4m2d+3TLP6lejkHs5mE03eHg prHvw02kda+cJNjkxyDP8CkOjn+59IZr0QoJQyH5g3fJQuf54MMW52wXfouShwYyYXSR 1RfQBxFDf9/iRSDdLbKOgh7mfMmh0pR4fOCLbzRaGdh3hgrdr1pXDkie14brlkNY91YJ RArxmqCFY/b2igf8fcT5dZO3vk5+hLhPWQGTga6FVruPQyZPWFOF5VfaKFOEi+W6L6RN AMNA== MIME-Version: 1.0 In-Reply-To: References: <7xy5le7cli.fsf@ruckus.brouhaha.com> <502dab6c$0$29978$c3e8da3$5496439d@news.astraweb.com> From: Ian Kelly Date: Mon, 27 Aug 2012 13:17:12 -0600 Subject: Re: set and dict iteration To: Python Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 47 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1346095064 news.xs4all.nl 6883 [2001:888:2000:d::a6]:57307 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:27996 On Thu, Aug 23, 2012 at 10:49 AM, Aaron Brady wrote: > The patch for the above is only 40-60 lines. However it introduces two n= ew concepts. Is there a link to the patch? > The first is a "linked list", a classic dynamic data structure, first dev= eloped in 1955, cf. http://en.wikipedia.org/wiki/Linked_list . Linked list= s are absent in Python, including the standard library and CPython implemen= tation, beyond the weak reference mechanism and garbage collector. The "co= llections.deque" structure shares some of the linked list interface but use= s arrays. > > The second is "uncounted references". The uncounted references are refer= ences to "set iterators" exclusively, exist only internally to "set" object= s, and are invisible to the rest of the program. The reason for the except= ion is that iterators are unique in the Python Data Model; iterators consis= t of a single immutable reference, unlike both immutable types such as stri= ngs and numbers, as well as container types. Counted references could be u= sed instead, but would be consistently wasted work for the garbage collecto= r, though the benefit to programmers' peace of mind could be significant. > > Please share your opinion! Do you agree that the internal list resolves = the inconsistency? Do you agree with the strategy? Do you agree that unco= unted references are justified to introduce, or are counted references pref= erable? This feature is a hard sell as it is; I think that adding uncounted references into the mix is only going to make that worse. May I suggest an alternate approach? Internally tag each set or dict with a "version", which is just a C int. Every time the hash table is modified, increment the version. When an iterator is created, store the current version on the iterator. When the iterator is advanced, check that the iterator version matches the dict/set version. If they're not equal, raise an error. This should add less overhead than the linked list without any concerns about reference counting. It does introduce a small bug in that an error condition could be "missed", if the version is incremented a multiple of 2**32 or 2**64 times between iterations -- but how often is that really likely to occur? Bearing in mind that this error is meant for debugging and not production error handling, you could even make the version a single byte and I'd still be fine with that. Cheers, Ian