Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #26112
| References | <mailman.2604.1343305588.4697.python-list@python.org> <50116281$0$29978$c3e8da3$5496439d@news.astraweb.com> <CALwzidnR-H=VCWAiC-qL95C_8C_A=+LMWvCAq4+0ZCM2-UHbVw@mail.gmail.com> |
|---|---|
| From | Ian Kelly <ian.g.kelly@gmail.com> |
| Date | 2012-07-26 14:00 -0600 |
| Subject | Re: Generating valid identifiers |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.2630.1343332892.4697.python-list@python.org> (permalink) |
On Thu, Jul 26, 2012 at 1:28 PM, Ian Kelly <ian.g.kelly@gmail.com> wrote: > The odds of a given pair of identifiers having the same digest to 10 > hex digits are 1 in 16^10, or approximately 1 in a trillion. If you > bought one lottery ticket a day at those odds, you would win > approximately once every 3 billion years. But it's not enough just to > have a hash collision, they also have to match exactly in the first 21 > (or 30, or whatever) characters of their actual names, and they have > to both be long enough to invoke the truncating scheme in the first > place. > > The Oracle backend for Django uses this same approach with an MD5 sum > to ensure that identifiers will be no more than 30 characters long (a > hard limit imposed by Oracle). It actually truncates the hash to 4 > digits, though, not 10. This hasn't caused any problems that I'm > aware of. As a side note, the odds of having at least one hash collision among multiple tables are known as the birthday problem. At 4 hex digits there are 65536 possible digests, and it turns out that at 302 tables there is a >50% chance that at least one pair of those names have the same 4-digit digest. That doesn't mean you should be concerned if you have 302 tables in your Django Oracle database, though, because those colliding tables also have to match completely in the first 26 characters of their generated names, which is not that common. If a collision ever did occur, the resolution would be simple: manually set the name of one of the offending tables in the model definition. With 16 ** 10 possible digests, the probability of collision hits 50% at 1234605 tables.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Generating valid identifiers Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-26 14:26 +0200
Re: Generating valid identifiers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-07-26 15:30 +0000
Re: Generating valid identifiers Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-26 20:08 +0200
Re: Generating valid identifiers Ian Kelly <ian.g.kelly@gmail.com> - 2012-07-26 13:28 -0600
Re: Generating valid identifiers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-07-27 01:54 +0000
Re: Generating valid identifiers Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-27 09:34 +0200
Re: Generating valid identifiers Ian Kelly <ian.g.kelly@gmail.com> - 2012-07-26 14:00 -0600
Re: Generating valid identifiers Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-27 09:28 +0200
Re: Generating valid identifiers Laszlo Nagy <gandalf@shopzeus.com> - 2012-07-27 11:59 +0200
csiph-web