Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #26110

Re: Generating valid identifiers

Path csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <ian.g.kelly@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.073
X-Spam-Evidence '*H*': 0.85; '*S*': 0.00; 'django': 0.10; 'backend': 0.15; 'digest': 0.15; "hasn't": 0.15; 'hex': 0.16; 'identifiers': 0.16; 'md5': 0.16; 'truncates': 0.16; 'truncating': 0.16; 'value:': 0.16; 'wrote:': 0.17; 'thu,': 0.17; '(or': 0.18; 'imposed': 0.22; 'of.': 0.22; 'ticket': 0.24; 'header:In-Reply- To:1': 0.25; 'am,': 0.27; 'message-id:@mail.gmail.com': 0.27; 'received:209.85.212': 0.28; 'actual': 0.28; "d'aprano": 0.29; 'hash': 0.29; 'invoke': 0.29; 'steven': 0.29; "i'm": 0.29; 'bought': 0.33; 'oracle': 0.33; 'to:addr:python-list': 0.33; 'received:google.com': 0.34; 'received:209.85': 0.35; 'but': 0.36; 'characters': 0.36; 'problems': 0.36; 'enough': 0.36; 'bad': 0.37; 'two': 0.37; 'uses': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'to:addr:python.org': 0.39; 'header:Received:5': 0.40; 'first': 0.61; '30,': 0.62; 'different': 0.63; 'more': 0.63; '10.': 0.64; 'limit': 0.65; '26,': 0.65; 'jul': 0.65; 'sum': 0.66; 'day': 0.73; 'lottery': 0.84; 'to:name:python': 0.84; 'luck': 0.93
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=dHmeMzkCTciVAcGhn0eCCw20+MqJ8hO8iWQNbTWzjsI=; b=BIQlU/nL8IX4JaLQCGkA/hsTLC3Ij7ezQ7VVYNbr/pTOnsG2mueTNrKROJE/qAAx+E HMFTyw+FczgSOWGCckezYriuLbDJfmxgGSvwsh6VX716nq0ruDa28r9uvUeP7G3uHs1Z Y5aU0MPbhR0C4EjLf+ZiVLkpkXhBdNBouFevrAzffvrWtWcdfDhhM+GuHT0GDwRXi9MG fXzLXFdQ9W86WDl3oBmFp8kugg6mEOZCyNt6RVNc0mN4DKR5XGwhdYrEIayHpC/MHFC6 QHWBdaW4L3AR5bSrDiidOua9lQAWYPahT3Z/Mjy/n/llUEOhqBjbAEn/D/f+00hNuFkL 4pkg==
MIME-Version 1.0
In-Reply-To <50116281$0$29978$c3e8da3$5496439d@news.astraweb.com>
References <mailman.2604.1343305588.4697.python-list@python.org> <50116281$0$29978$c3e8da3$5496439d@news.astraweb.com>
From Ian Kelly <ian.g.kelly@gmail.com>
Date Thu, 26 Jul 2012 13:28:26 -0600
Subject Re: Generating valid identifiers
To Python <python-list@python.org>
Content-Type text/plain; charset=ISO-8859-1
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.2628.1343330939.4697.python-list@python.org> (permalink)
Lines 30
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1343330939 news.xs4all.nl 6866 [2001:888:2000:d::a6]:58611
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:26110

Show key headers only | View raw


On Thu, Jul 26, 2012 at 9:30 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> What happens if you get a collision?
>
> That is, you have two different long identifiers:
>
> a.b.c.d...something
> a.b.c.d...anotherthing
>
> which by bad luck both hash to the same value:
>
> a.b.c.d.$AABB99
> a.b.c.d.$AABB99
>
> (or whatever).

The odds of a given pair of identifiers having the same digest to 10
hex digits are 1 in 16^10, or approximately 1 in a trillion.  If you
bought one lottery ticket a day at those odds, you would win
approximately once every 3 billion years.  But it's not enough just to
have a hash collision, they also have to match exactly in the first 21
(or 30, or whatever) characters of their actual names, and they have
to both be long enough to invoke the truncating scheme in the first
place.

The Oracle backend for Django uses this same approach with an MD5 sum
to ensure that identifiers will be no more than 30 characters long (a
hard limit imposed by Oracle).  It actually truncates the hash to 4
digits, though, not 10.  This hasn't caused any problems that I'm
aware of.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Generating valid identifiers Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-26 14:26 +0200
  Re: Generating valid identifiers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-07-26 15:30 +0000
    Re: Generating valid identifiers Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-26 20:08 +0200
    Re: Generating valid identifiers Ian Kelly <ian.g.kelly@gmail.com> - 2012-07-26 13:28 -0600
      Re: Generating valid identifiers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-07-27 01:54 +0000
        Re: Generating valid identifiers Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-27 09:34 +0200
    Re: Generating valid identifiers Ian Kelly <ian.g.kelly@gmail.com> - 2012-07-26 14:00 -0600
    Re: Generating valid identifiers Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-27 09:28 +0200
    Re: Generating valid identifiers Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-27 11:59 +0200

csiph-web