Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.015 X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'algorithm': 0.03; 'true,': 0.04; 'cpython': 0.05; 'subject:file': 0.07; 'dict': 0.09; 'lookup': 0.09; 'notation.': 0.09; 'runtime': 0.09; 'cc:addr :python-list': 0.10; 'def': 0.10; 'times,': 0.13; 'amortized': 0.16; 'clear.': 0.16; 'datasets': 0.16; 'datasets.': 0.16; 'dictionaries': 0.16; 'efficiency.': 0.16; 'entries,': 0.16; 'entry.': 0.16; 'foo(object):': 0.16; 'wrote:': 0.17; 'certainly': 0.17; 'refers': 0.17; '>>>': 0.18; 'code,': 0.18; '(or': 0.18; 'operations.': 0.22; 'cc:2**0': 0.23; 'statement': 0.23; 'cc:no real name:2**0': 0.24; 'specifically': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'common': 0.26; 'older': 0.27; 'andrew': 0.27; 'entries': 0.27; 'rest': 0.28; 'went': 0.28; '>>>>': 0.29; 'consistency': 0.29; 'dictionary': 0.29; 'hash': 0.29; 'long.': 0.29; 'sensible': 0.29; 'reporting': 0.29; 'skip:_ 10': 0.29; 'class': 0.29; "i'm": 0.29; 'becomes': 0.30; 'function': 0.30; 'code': 0.31; 'problem.': 0.32; 'cases,': 0.33; 'wrong': 0.34; 'doing': 0.35; 'pm,': 0.35; 'table': 0.35; 'there': 0.35; 'but': 0.36; "didn't": 0.36; 'client': 0.36; 'should': 0.36; 'too': 0.36; 'uses': 0.37; 'why': 0.37; 'subject:: ': 0.38; 'some': 0.38; 'several': 0.39; 'system.': 0.39; 'received:192': 0.39; 'space': 0.39; 'little': 0.39; 'where': 0.40; 'received:192.168': 0.40; 'your': 0.60; 'claim': 0.60; 'different': 0.63; 'more': 0.63; 'here': 0.65; 'results': 0.65; 'population': 0.65; 'header:Reply-To:1': 0.68; 'stated': 0.69; 'received:74.208': 0.71; 'reply-to:no real name:2**0': 0.72; '100%': 0.76; 'topic,': 0.78; 'actually,': 0.84; 'collision': 0.84; 'difference.': 0.84; 'different.': 0.84; 'glad': 0.86; 'increases': 0.91; 'sensibly': 0.91; 'suffer': 0.91; 'angel': 0.93; 'poorly': 0.93; 'survey,': 0.93; 'hundred': 0.95 Date: Thu, 09 Aug 2012 19:38:13 -0400 From: Dave Angel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: Andrew Cooper Subject: Re: save dictionary to a file without brackets. References: <930ab3d8-4ab9-446d-9970-ee811eb70a44@googlegroups.com> <50241F14.2060209@tim.thechases.com> <36EA3847-6713-4C12-B47B-9B5E10325F00@gmail.com> <502429C3.5000600@tim.thechases.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:cmWvKDjuB2MsAMs96hT66P6uFOmmW2JYI61Iey7/E02 nyK1Oy0dhTNMlRjazAs4EMQbUO7epGwVBjaPrFG6bPNDbc6752 vSmQV2lA+Ue594nxYz5U4uVNvRqUiitmZ9fxh55NwDutvLxLWK ViMyEuR+ZqXTy29yn8Z2jtVZ7ruSqlSUtoSBBRVY9XPMZm1Kqz Ts3RBEd0Fndg0CuNjDbBXUEsV0gOQvkNM5I1nxKG+lW1ljpJwQ TnfAIuXKfP3IA9klyMKzjCV/hLqhvMvp6WtykRtZuqfUTilXGh crgZmXJrBj3pt8dPNJWK7FuXbTAbIv7an81YLAhBqlYSAxYlQ= = Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list Reply-To: d@davea.name List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 62 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1344555521 news.xs4all.nl 6876 [2001:888:2000:d::a6]:54883 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:26835 On 08/09/2012 06:54 PM, Andrew Cooper wrote: > On 09/08/2012 23:26, Dave Angel wrote: >> On 08/09/2012 06:03 PM, Andrew Cooper wrote: >>> On 09/08/2012 22:34, Roman Vashkevich wrote: >>>> Actually, they are different. >>>> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference. >>>> Dict uses hashing to get a value from the dict and this is why it's O(1). >>>> >>> Sligtly off topic, but looking up a value in a dictionary is actually >>> O(n) for all other entries in the dict which suffer a hash collision >>> with the searched entry. >>> >>> True, a sensible choice of hash function will reduce n to 1 in common >>> cases, but it becomes an important consideration for larger datasets. >>> >>> ~Andrew >> I'm glad you're wrong for CPython's dictionaries. The only time the >> lookup would degenerate to O[n] would be if the hash table had only one >> slot. CPython sensibly increases the hash table size when it becomes >> too small for efficiency. >> >> >> Where have you seen dictionaries so poorly implemented? >> > Different n, which I should have made more clear. I was using it for > consistency with O() notation. My statement was O(n) where n is the > number of hash collisions. That's a little like doing a survey, and reporting the results as showing that 100% of the women hit their husbands, among the population of women who hit their husbands. In your original message, you already stated the assumption that a proper hash algorithm would be chosen, then went on to apparently claim that large datasets would still have an order n problem. That last is what I was challenging. The rest of your message here refers to client code, not the system. > The choice of hash algorithm (or several depending on the > implementation) should specifically be chosen to reduce collisions to > aid in efficient space utilisation and lookup times, but any > implementation must allow for collisions. There are certainly runtime > methods of improving efficiency using amortized operations. > > As for poor implementations, > > class Foo(object): > > ... > > def __hash__(self): > return 0 > > I seriously found that in some older code I had the misfortune of > reading. It didn't remain in that state for long. > > ~Andrew -- DaveA