Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #68505 > unrolled thread

Unexpected comparisons in dict lookup

Started bySteven D'Aprano <steve+comp.lang.python@pearwood.info>
First post2014-03-18 14:20 +0000
Last post2014-03-18 14:15 -0600
Articles 2 — 2 participants

Back to article view | Back to comp.lang.python


Contents

  Unexpected comparisons in dict lookup Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-18 14:20 +0000
    Re: Unexpected comparisons in dict lookup Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-18 14:15 -0600

#68505 — Unexpected comparisons in dict lookup

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-03-18 14:20 +0000
SubjectUnexpected comparisons in dict lookup
Message-ID<53285617$0$29994$c3e8da3$5496439d@news.astraweb.com>
I stumbled across this unexpected behaviour with Python 2.7 and 3.3. When 
you look up a key in a dict, the key is sometimes compared against other 
keys twice instead of just once.

First, a class that reports when it is being tested for equality, with a 
fixed hash so that we get collisions inside the dict:

class Spam:
    def __init__(self, label):
        self.label = label
    def __str__(self):
        return self.label
    def __hash__(self):
        return 100
    def __eq__(self, other):
        print("comparing %s with %s" % (self, other))
        return self is other


Create a dict and prepare for collisions:

x = Spam("x")
y = Spam("y")
d = {100: 1}  # hash(100) == 100


When we add x to the dict, it collides with the existing key 100, so I 
expect one call to Spam.__eq__, and that's exactly what I get:

py> d[x] = 200
comparing x with 100


But when I try adding y to the dict, it collides with 100, then it 
collides with x, then it apparently collides with x a second time:

py> d[y] = 300
comparing y with 100
comparing x with y
comparing x with y


I expected only two calls to __eq__, not three: first comparing with 100, 
then comparing with x. Checking for keys gives the same result:

py> x in d
comparing x with 100
True
py> y in d
comparing y with 100
comparing x with y
comparing x with y
True


What's more, checking for a key which is not present also compares three 
times instead of twice:

py> Spam("z") in d
comparing z with 100
comparing x with z
comparing x with z
comparing y with z
False


I expected it to compare z with 100, then x *once*, then y, then return 
False.

Why is the dict lookup comparing against x twice? It doesn't seem to be 
the fault of x. If I create a slightly different dict, with the 
collisions in a different order:

py> e = {x: 100}
py> e[100] = 200
comparing x with 100
py> e[y] = 300
comparing x with y
comparing y with 100
comparing y with 100



-- 
Steven

[toc] | [next] | [standalone]


#68514

FromIan Kelly <ian.g.kelly@gmail.com>
Date2014-03-18 14:15 -0600
Message-ID<mailman.8258.1395173776.18130.python-list@python.org>
In reply to#68505
On Tue, Mar 18, 2014 at 8:20 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> I stumbled across this unexpected behaviour with Python 2.7 and 3.3. When
> you look up a key in a dict, the key is sometimes compared against other
> keys twice instead of just once.

>From what I can see in the code, it adds a perturbation based on the
upper bits of the hash value to the probing scheme, to reduce
collisions for keys with unequal hashes.  On the downside, this
cancels the guarantee that each bucket can only be checked at most
once.  The perturbation gradually shifts to 0 after a few iterations,
so every bucket can still be reached within O(n) iterations.

See the comments starting at "Major subtleties ahead":
http://hg.python.org/cpython/file/f8b40d33e45d/Objects/dictobject.c#l106

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web