Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Date: Thu, 09 Aug 2012 19:38:13 -0400
From: Dave Angel <d@davea.name>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120714 Thunderbird/14.0
MIME-Version: 1.0
To: Andrew Cooper <amc96@cam.ac.uk>
Subject: Re: save dictionary to a file without brackets.
References: <930ab3d8-4ab9-446d-9970-ee811eb70a44@googlegroups.com> <F1B463BB-19A6-4DB1-99B3-929CCBFB5920@gmail.com> <50241F14.2060209@tim.thechases.com> <36EA3847-6713-4C12-B47B-9B5E10325F00@gmail.com> <502429C3.5000600@tim.thechases.com> <mailman.3120.1344548050.4697.python-list@python.org> <MsWUr.1183213$%k.489563@fx20.am4> <mailman.3129.1344551238.4697.python-list@python.org> <ucXUr.1030527$2z2.380746@fx19.am4>
In-Reply-To: <ucXUr.1030527$2z2.380746@fx19.am4>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: python-list@python.org
Precedence: list
Reply-To: d@davea.name
Newsgroups: comp.lang.python
Message-ID: <mailman.3137.1344555521.4697.python-list@python.org>
Lines: 62
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:26835

On 08/09/2012 06:54 PM, Andrew Cooper wrote:
> On 09/08/2012 23:26, Dave Angel wrote:
>> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
>>> On 09/08/2012 22:34, Roman Vashkevich wrote:
>>>> Actually, they are different.
>>>> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
>>>> Dict uses hashing to get a value from the dict and this is why it's O(1).
>>>>
>>> Sligtly off topic, but looking up a value in a dictionary is actually
>>> O(n) for all other entries in the dict which suffer a hash collision
>>> with the searched entry.
>>>
>>> True, a sensible choice of hash function will reduce n to 1 in common
>>> cases, but it becomes an important consideration for larger datasets.
>>>
>>> ~Andrew
>> I'm glad you're wrong for CPython's dictionaries.  The only time the
>> lookup would degenerate to O[n] would be if the hash table had only one
>> slot.  CPython sensibly increases the hash table size when it becomes
>> too small for efficiency.
>>
>>
>> Where have you seen dictionaries so poorly implemented?
>>
> Different n, which I should have made more clear.  I was using it for
> consistency with O() notation.  My statement was O(n) where n is the
> number of hash collisions.
That's a little like doing a survey, and reporting the results as
showing that 100% of the women hit their husbands, among the population
of women who hit their husbands.

In your original message, you already stated the assumption that a
proper hash algorithm would be chosen, then went on to apparently claim
that large datasets would still have an order n problem.  That last is
what I was challenging.

The rest of your message here refers to client code, not the system.
> The choice of hash algorithm (or several depending on the
> implementation) should specifically be chosen to reduce collisions to
> aid in efficient space utilisation and lookup times, but any
> implementation must allow for collisions.  There are certainly runtime
> methods of improving efficiency using amortized operations.
>
> As for poor implementations,
>
> class Foo(object):
>
>     ...
>
>     def __hash__(self):
>         return 0
>
> I seriously found that in some older code I had the misfortune of
> reading.  It didn't remain in that state for long.
>
> ~Andrew


-- 

DaveA