Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #68505 > unrolled thread
| Started by | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| First post | 2014-03-18 14:20 +0000 |
| Last post | 2014-03-18 14:15 -0600 |
| Articles | 2 — 2 participants |
Back to article view | Back to comp.lang.python
Unexpected comparisons in dict lookup Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-18 14:20 +0000
Re: Unexpected comparisons in dict lookup Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-18 14:15 -0600
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-03-18 14:20 +0000 |
| Subject | Unexpected comparisons in dict lookup |
| Message-ID | <53285617$0$29994$c3e8da3$5496439d@news.astraweb.com> |
I stumbled across this unexpected behaviour with Python 2.7 and 3.3. When
you look up a key in a dict, the key is sometimes compared against other
keys twice instead of just once.
First, a class that reports when it is being tested for equality, with a
fixed hash so that we get collisions inside the dict:
class Spam:
def __init__(self, label):
self.label = label
def __str__(self):
return self.label
def __hash__(self):
return 100
def __eq__(self, other):
print("comparing %s with %s" % (self, other))
return self is other
Create a dict and prepare for collisions:
x = Spam("x")
y = Spam("y")
d = {100: 1} # hash(100) == 100
When we add x to the dict, it collides with the existing key 100, so I
expect one call to Spam.__eq__, and that's exactly what I get:
py> d[x] = 200
comparing x with 100
But when I try adding y to the dict, it collides with 100, then it
collides with x, then it apparently collides with x a second time:
py> d[y] = 300
comparing y with 100
comparing x with y
comparing x with y
I expected only two calls to __eq__, not three: first comparing with 100,
then comparing with x. Checking for keys gives the same result:
py> x in d
comparing x with 100
True
py> y in d
comparing y with 100
comparing x with y
comparing x with y
True
What's more, checking for a key which is not present also compares three
times instead of twice:
py> Spam("z") in d
comparing z with 100
comparing x with z
comparing x with z
comparing y with z
False
I expected it to compare z with 100, then x *once*, then y, then return
False.
Why is the dict lookup comparing against x twice? It doesn't seem to be
the fault of x. If I create a slightly different dict, with the
collisions in a different order:
py> e = {x: 100}
py> e[100] = 200
comparing x with 100
py> e[y] = 300
comparing x with y
comparing y with 100
comparing y with 100
--
Steven
[toc] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2014-03-18 14:15 -0600 |
| Message-ID | <mailman.8258.1395173776.18130.python-list@python.org> |
| In reply to | #68505 |
On Tue, Mar 18, 2014 at 8:20 AM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > I stumbled across this unexpected behaviour with Python 2.7 and 3.3. When > you look up a key in a dict, the key is sometimes compared against other > keys twice instead of just once. >From what I can see in the code, it adds a perturbation based on the upper bits of the hash value to the probing scheme, to reduce collisions for keys with unequal hashes. On the downside, this cancels the guarantee that each bucket can only be checked at most once. The perturbation gradually shifts to 0 after a few iterations, so every bucket can still be reached within O(n) iterations. See the comments starting at "Major subtleties ahead": http://hg.python.org/cpython/file/f8b40d33e45d/Objects/dictobject.c#l106
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web