Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.073 X-Spam-Evidence: '*H*': 0.85; '*S*': 0.00; 'django': 0.10; 'backend': 0.15; 'digest': 0.15; "hasn't": 0.15; 'hex': 0.16; 'identifiers': 0.16; 'md5': 0.16; 'truncates': 0.16; 'truncating': 0.16; 'value:': 0.16; 'wrote:': 0.17; 'thu,': 0.17; '(or': 0.18; 'imposed': 0.22; 'of.': 0.22; 'ticket': 0.24; 'header:In-Reply- To:1': 0.25; 'am,': 0.27; 'message-id:@mail.gmail.com': 0.27; 'received:209.85.212': 0.28; 'actual': 0.28; "d'aprano": 0.29; 'hash': 0.29; 'invoke': 0.29; 'steven': 0.29; "i'm": 0.29; 'bought': 0.33; 'oracle': 0.33; 'to:addr:python-list': 0.33; 'received:google.com': 0.34; 'received:209.85': 0.35; 'but': 0.36; 'characters': 0.36; 'problems': 0.36; 'enough': 0.36; 'bad': 0.37; 'two': 0.37; 'uses': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'to:addr:python.org': 0.39; 'header:Received:5': 0.40; 'first': 0.61; '30,': 0.62; 'different': 0.63; 'more': 0.63; '10.': 0.64; 'limit': 0.65; '26,': 0.65; 'jul': 0.65; 'sum': 0.66; 'day': 0.73; 'lottery': 0.84; 'to:name:python': 0.84; 'luck': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=dHmeMzkCTciVAcGhn0eCCw20+MqJ8hO8iWQNbTWzjsI=; b=BIQlU/nL8IX4JaLQCGkA/hsTLC3Ij7ezQ7VVYNbr/pTOnsG2mueTNrKROJE/qAAx+E HMFTyw+FczgSOWGCckezYriuLbDJfmxgGSvwsh6VX716nq0ruDa28r9uvUeP7G3uHs1Z Y5aU0MPbhR0C4EjLf+ZiVLkpkXhBdNBouFevrAzffvrWtWcdfDhhM+GuHT0GDwRXi9MG fXzLXFdQ9W86WDl3oBmFp8kugg6mEOZCyNt6RVNc0mN4DKR5XGwhdYrEIayHpC/MHFC6 QHWBdaW4L3AR5bSrDiidOua9lQAWYPahT3Z/Mjy/n/llUEOhqBjbAEn/D/f+00hNuFkL 4pkg== MIME-Version: 1.0 In-Reply-To: <50116281$0$29978$c3e8da3$5496439d@news.astraweb.com> References: <50116281$0$29978$c3e8da3$5496439d@news.astraweb.com> From: Ian Kelly Date: Thu, 26 Jul 2012 13:28:26 -0600 Subject: Re: Generating valid identifiers To: Python Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 30 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1343330939 news.xs4all.nl 6866 [2001:888:2000:d::a6]:58611 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:26110 On Thu, Jul 26, 2012 at 9:30 AM, Steven D'Aprano wrote: > What happens if you get a collision? > > That is, you have two different long identifiers: > > a.b.c.d...something > a.b.c.d...anotherthing > > which by bad luck both hash to the same value: > > a.b.c.d.$AABB99 > a.b.c.d.$AABB99 > > (or whatever). The odds of a given pair of identifiers having the same digest to 10 hex digits are 1 in 16^10, or approximately 1 in a trillion. If you bought one lottery ticket a day at those odds, you would win approximately once every 3 billion years. But it's not enough just to have a hash collision, they also have to match exactly in the first 21 (or 30, or whatever) characters of their actual names, and they have to both be long enough to invoke the truncating scheme in the first place. The Oracle backend for Django uses this same approach with an MD5 sum to ensure that identifiers will be no more than 30 characters long (a hard limit imposed by Oracle). It actually truncates the hash to 4 digits, though, not 10. This hasn't caused any problems that I'm aware of.