Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!ecngs!feeder2.ecngs.de!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'algorithm': 0.03; 'attribute': 0.05; "'a'": 0.07; 'attributes': 0.07; 'bug.': 0.07; 'class,': 0.07; 'space.': 0.07; 'python': 0.09; 'be:': 0.09; 'compression': 0.09; 'iterate': 0.09; 'solution,': 0.09; 'subclass': 0.09; 'subclasses': 0.09; 'tuple': 0.09; 'def': 0.10; 'thread': 0.11; 'to:name:python-list': 0.15; "'a',": 0.16; 'a():': 0.16; 'class).': 0.16; 'compare.': 0.16; 'dictionaries': 0.16; 'iterated': 0.16; 'keys.': 0.16; 'reliably': 0.16; 'set,': 0.16; 'simplest': 0.16; 'subject:key': 0.16; 'tuple,': 0.16; 'unordered,': 0.16; '\xc2\xa0if': 0.16; 'wrote:': 0.17; 'basically': 0.17; 'comparing': 0.17; 'instance': 0.17; 'tim': 0.18; '>>>': 0.18; 'code,': 0.18; '(or': 0.18; 'skip:v 30': 0.20; 'sort': 0.21; 'trying': 0.21; '>>>': 0.22; 'are.': 0.22; 'back.': 0.22; 'elements': 0.23; 'to:2**1': 0.23; 'idea': 0.24; 'testing': 0.24; 'pass': 0.25; 'header:In-Reply-To:1': 0.25; 'wrote': 0.26; 'values': 0.26; '(e.g.': 0.27; 'ago': 0.27; 'order.': 0.27; 'set.': 0.27; 'message-id:@mail.gmail.com': 0.27; 'correct': 0.28; 'all.': 0.28; 'represent': 0.28; 'container': 0.29; 'dictionary': 0.29; 'equality': 0.29; 'hash': 0.29; 'definition': 0.29; 'skip:_ 10': 0.29; 'skip:& 10': 0.29; 'probably': 0.29; 'class': 0.29; 'this.': 0.29; "i'm": 0.29; 'classes': 0.30; "skip:' 10": 0.30; 'returned': 0.30; 'function': 0.30; 'could': 0.32; 'affects': 0.33; 'instances': 0.33; 'point,': 0.33; 'safely': 0.33; 'to:addr:python-list': 0.33; 'equal': 0.33; 'know.': 0.33; 'another': 0.33; 'skip:& 20': 0.33; 'received:google.com': 0.34; 'self': 0.34; 'returning': 0.35; 'stable': 0.35; 'received:209.85': 0.35; 'something': 0.35; 'there': 0.35; 'really': 0.36; 'but': 0.36; 'smaller': 0.36; 'method': 0.36; 'should': 0.36; 'bad': 0.37; 'uses': 0.37; 'skip:v 20': 0.37; 'received:209': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'object': 0.38; 'things': 0.38; 'sure': 0.38; 'to:addr:python.org': 0.39; 'think': 0.40; 'your': 0.60; 'subject:, ': 0.61; 'containing': 0.61; 'back': 0.62; 'more': 0.63; 'confirm': 0.64; 'making': 0.64; 'taking': 0.65; 'want,': 0.65; 'account': 0.67; 'obvious': 0.71; '2013': 0.84; 'delaney': 0.84; 'preventing': 0.91 X-USANET-Received: from gwo2.mbox.net [127.0.0.1] by gwo2.mbox.net via mtad (C8.MAIN.3.82G) with ESMTP id 405RBRVqk4688Mo2; Mon, 18 Feb 2013 21:16:10 -0000 X-USANET-GWS2-Tagid: UNKN X-USANET-Source: 165.212.120.254 OUT tim.delaney@aptare.com S1P5HUB2.EXCHPROD.USA.NET X-USANET-MsgId: XID995RBRVqk6184Xo2 MIME-Version: 1.0 X-Received: by 10.182.235.49 with SMTP id uj17mr6623484obc.18.1361222169465; Mon, 18 Feb 2013 13:16:09 -0800 (PST) In-Reply-To: <662405274.6218650.1361217108266.JavaMail.root@sequans.com> References: <1349647696.6217309.1361216182980.JavaMail.root@sequans.com> <662405274.6218650.1361217108266.JavaMail.root@sequans.com> Date: Tue, 19 Feb 2013 08:16:09 +1100 Subject: Re: Instances as dictionary key, __hash__ and __eq__ From: Tim Delaney To: Jean-Michel Pichavant , python-list Content-Type: multipart/alternative; boundary="bcaec5396952df13f804d6063b3c" X-Originating-IP: [209.85.219.43] X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 192 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1361222182 news.xs4all.nl 6885 [2001:888:2000:d::a6]:41227 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:39129 --bcaec5396952df13f804d6063b3c Content-Type: text/plain; charset="UTF-8" On 19 February 2013 06:51, Jean-Michel Pichavant wrote: > Greetings, > > I opened something like a month ago a thread about hash functions and how > I could write classes which instances can be safely used as dictionary keys. > I though I had it but when I read back my code, I think I wrote yet > another bug. > > Consider the following simple (buggy) class, python 2.5 > > class FooSet(object): > """Define an algorithm set, containing pdcch/pdsch (or none).""" > def __init__(self, pdcch, pdsch): > self.pdcch = bool(pdcch) > self.pdsch = bool(pdsch) > # __hash__ and __eq__ allow to use the object as a dictionary key > def __hash__(self): > return hash((self.pdcch, self.pdsch)) > def __eq__(self, other): > return hash(self) == hash(other) > > Can you confirm that using the hash function for testing equality is a > very bad idea ? > Yes - it is a *very* bad idea. A hash by definition can produce collisions, since you are taking much larger amount of data and are trying to represent it in a smaller amount of space. It's effectively lossy compression - you can never reliably get the original back. > One obvious solution would be: > > def __eq__(self, other): > return self.pdsch = other.pdsch and self.pdcch == other.pdcch > This is a correct and the simplest way to do it. > But I was looking for a "standard" solution, that I could use for > basically all my container classes > > So I came up with these ones: > > def __hash__(self): > return hash(tuple(vars(self).values())) > def __eq__(self, other): > return vars(self) == vars(other) > > But I'm not sure about vars(self).values(), I don't really care about the > order of the values, but I need to be sure that for 2 equal dictionaries, > they will both return their values in the same order. > And that's the point, I'm not sure at all. > You cannot rely on this. Dictionaries are unordered, and the order that items are added affects the order that the elements will be iterated over. You could sort the vars by name (thus giving the stable order you need) but there's another flaw - vars() contains more than just the attributes you set. >>> class A(): ... pass ... >>> vars(A) mappingproxy({'__qualname__': 'A', '__dict__': , '__module__': '__main__', '__weakref__': , '__doc__': None}) So by using vars you are preventing instances of subclasses of your class from comparing equal to each other (or to instances of the base class). Additionally, If I'm making things much more complicated than they need to > be, let me know. > You are. There are ways to achieve what you want, but it requires a lot more setup and discipline. The simplest way is probably to have a _equal_fields() method that subclasses override, returning a tuple of the attributes that should be hashed. Then in __hash__() and __eq__ you iterate over the returned tuple, get the value for each attribute and either hash or compare. Of course, you have to take into account in __eq__ that the other instance may not have the same attributes (e.g. self is a subclass that uses extra attributes in its __hash__ and __eq__). Tim Delaney --bcaec5396952df13f804d6063b3c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On 19 February 2013 06:51, Jean-Michel Pichavant <je= anmichel@sequans.com> wrote:
Greetings,

I opened something like a month ago a thread about hash functions and how I= could write classes which instances can be safely used as dictionary keys.=
I though I had it but when I read back my code, I think I wrote yet another= bug.

Consider the following simple (buggy) class, python 2.5

class FooSet(object):
=C2=A0 =C2=A0 """Define an algorithm set, containing pdcch/p= dsch (or none)."""
=C2=A0 =C2=A0 def __init__(self, pdcch, pdsch):
=C2=A0 =C2=A0 =C2=A0 =C2=A0 self.pdcch =3D bool(pdcch)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 self.pdsch =3D bool(pdsch)
=C2=A0 =C2=A0 # __hash__ and __eq__ allow to use the object as a dictionary= key
=C2=A0 =C2=A0 def __hash__(self):
=C2=A0 =C2=A0 =C2=A0 =C2=A0 return hash((self.pdcch, self.pdsch))
=C2=A0 =C2=A0 def __eq__(self, other):
=C2=A0 =C2=A0 =C2=A0 =C2=A0 return hash(self) =3D=3D hash(other)

Can you confirm that using the hash function for testing equality is a very= bad idea ?

Yes - it is a *very* = bad idea. A hash by definition can produce collisions, since you are taking= much larger amount of data and are trying to represent it in a smaller amo= unt of space. It's effectively lossy compression - you can never reliab= ly get the original back.
=C2=A0
One obvious solution would be:

def __eq__(self, other):
=C2=A0 =C2=A0 return self.pdsch =3D other.pdsch and self.pdcch =3D=3D other= .pdcch

This is a correct and the = simplest way to do it.
=C2=A0
But I was looking for a "standard" solution, that I could use for= basically all my container classes

So I came up with these ones:

def __hash__(self):
=C2=A0 =C2=A0 return hash(tuple(vars(self).values()))
def __eq__(self, other):
=C2=A0 =C2=A0 return vars(self) =3D=3D vars(other)

But I'm not sure about vars(self).values(), I don't really care abo= ut the order of the values, but I need to be sure that for 2 equal dictiona= ries, they will both return their values in the same order.
And that's the point, I'm not sure at all.
You cannot rely on this. Dictionaries are unordered, and = the order that items are added affects the order that the elements will be = iterated over. You could sort the vars by name (thus giving the stable orde= r you need) but there's another flaw - vars() contains more than just t= he attributes you set.

>>> class A():
... =C2=A0 =C2= =A0 pass
...
>>> vars(A)
= mappingproxy({'__qualname__': 'A', '__dict__': <= attribute '__dict__' of 'A' objects>, '__module__= 9;:
'__main__', '__weakref__': <attribute '__weakre= f__' of 'A' objects>, '__doc__': None})
<= div>
So by using vars you are preventing instances of s= ubclasses of your class from comparing equal to each other (or to instances= of the base class).

Additionally, =C2=A0If I'm making things much more complicated than the= y need to be, let me know.

You ar= e. There are ways to achieve what you want, but it requires a lot more setu= p and discipline. The simplest way is probably to have a _equal_fields() me= thod that subclasses override, returning a tuple of the attributes that sho= uld be hashed. Then in __hash__() and __eq__ you iterate over the returned = tuple, get the value for each attribute and either hash or compare.

Of course, you have to take into account in= __eq__ that the other instance may not have the same attributes (e.g. self= is a subclass that uses extra attributes in its __hash__ and __eq__).

Tim Delaney
--bcaec5396952df13f804d6063b3c--