Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.002 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'situation.': 0.05; 'broken.': 0.07; 'something,': 0.07; 'python': 0.08; 'dev': 0.09; 'etc).': 0.09; 'subclass': 0.09; 'broken': 0.12; 'algorithm': 0.13; 'received:209.85.210.174': 0.13; 'received:mail- iy0-f174.google.com': 0.13; 'dictionaries': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'function;': 0.16; 'impose': 0.16; 'lookup': 0.16; 'narrow': 0.16; 'wrote:': 0.16; "wouldn't": 0.17; 'jan': 0.19; 'appears': 0.19; 'primarily': 0.21; 'discussion': 0.21; 'header:In-Reply-To:1': 0.22; 'suggests': 0.23; 'string': 0.24; 'code': 0.25; 'sat,': 0.25; 'module': 0.26; 'pm,': 0.26; 'load': 0.26; 'function': 0.27; 'fix': 0.27; 'message-id:@mail.gmail.com': 0.28; 'hash': 0.30; 'strings.': 0.30; 'framework': 0.30; 'anyone': 0.31; 'does': 0.32; 'to:addr:python-list': 0.33; 'there': 0.33; 'changing': 0.34; 'issue': 0.35; 'uses': 0.36; 'post': 0.36; 'list,': 0.37; 'but': 0.37; "there's": 0.37; 'received:google.com': 0.37; 'could': 0.37; 'authors': 0.38; 'stable': 0.38; 'steven': 0.38; 'received:209.85': 0.38; 'skip:o 20': 0.38; 'mailing': 0.38; 'point': 0.39; 'either': 0.39; 'being': 0.39; 'received:209': 0.39; 'doing': 0.39; 'subject:: ': 0.39; 'change': 0.40; 'to:addr:python.org': 0.40; 'extremely': 0.40; 'data': 0.40; 'worth': 0.61; 'your': 0.61; 'cost': 0.62; 'stability': 0.67; 'algorithm,': 0.84; 'dict,': 0.84; 'malicious': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=P8+pyavVscl4znIOtLxOFPJ/jzOjWL/jktKUMsbdBAA=; b=xYk9jAd8lBMsjX8DaXOoyrxqNSaLiKLVQ5hDT6KGFs66pqgXwzHzusluOr0k2n58o5 nr31xidX+acBkhGh3IWsSy0dIuK10r6rIwYjGuUQNGScfYQZysYcBCqv1sTW/c3F28Oc 8z553uqloWGv7E9GUQXFntgTBtFpYpzIw9cq0= MIME-Version: 1.0 In-Reply-To: <4f1107b7$0$29988$c3e8da3$5496439d@news.astraweb.com> References: <4f1107b7$0$29988$c3e8da3$5496439d@news.astraweb.com> Date: Sun, 15 Jan 2012 11:36:00 +1100 Subject: Re: Hash stability From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 29 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1326587769 news.xs4all.nl 6933 [2001:888:2000:d::a6]:37499 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:18986 On Sat, Jan 14, 2012 at 3:42 PM, Steven D'Aprano wrote: > On the Python Dev mailing list, there is a discussion going on about the > stability of the hash function for strings. > > How many people rely on hash(some_string) being stable across Python > versions? Does anyone have code that will be broken if the string hashing > algorithm changes? On reading your post I immediately thought that you could, if changing algorithm, simultaneously fix the issue of malicious collisions, but that appears to be what you're doing it for primarily :) Suggestion: Create a subclass of dict, the SecureDict or something, which could either perturb the hashes or even use a proper cryptographic hash function; normal dictionaries can continue to use the current algorithm. The description in Objects/dictnotes.txt suggests that it's still well worth keeping the current system for programmer-controlled dictionaries, and only change user-controlled ones (such as POST data etc). It would then be up to the individual framework and module authors to make use of this, but it would not impose any cost on the myriad other uses of dictionaries - there's no point adding extra load to every name lookup just because of a security issue in an extremely narrow situation. It would also mean that code relying on hash(str) stability wouldn't be broken. ChrisA