Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #33324

Re: Generate unique ID for URL

From Johannes Bauer <dfnsonfsduifb@gmx.de>
Newsgroups comp.lang.python
Subject Re: Generate unique ID for URL
Date 2012-11-14 12:29 +0100
Organization albasani.net
Message-ID <k7vva1$srq$1@news.albasani.net> (permalink)
References <0692e6a2-343c-4eb0-be57-fe5c815efb99@googlegroups.com> <roy-862116.20390413112012@news.panix.com>

Show all headers | View raw


On 14.11.2012 02:39, Roy Smith wrote:

> The next step is to reduce the number of bits you are encoding.  You 
> said in another post that "1 collision in 10 million hashes would be 
> tolerable".  So you need:
> 
>>>> math.log(10*1000*1000, 2)
> 23.25349666421154
> 
> 24 bits worth of key. 

Nope :-)

> Base64 encoded, that's only 4 characters.  
> Actually, I probably just proved that I don't really understand how 
> probabilities work, so maybe what you really need is 32 or 48 or 64 
> bits.

:-))

When doing these calculations, it's important to keep the birthday
paradox in mind (this is kind of counter-intuitive): The chance of a
collission raises tremendously when we're looking for *any* arbitrary
two hashes colliding within a certain namespace. The probability you've
calculated is the pre-image probability (which you also again need to
multiply with a factor of two, because when trying to collide one given
hash, in the mean case you'll only have to search *half* the namespace
before finding a collision).

There are three things you need to know before you can give an estimate:
1. The permissible probability of a collision (1e-7 in this case)
2. The hash size
3. The worst-case number of elements in the namespace

You neglected 3 completely -- but knowing this is really important. This
becomes obvious when looking at the extreme cases: Let's say you have a
hash of arbitrary size, but only hash one element. The chance of a
collision is *always* zero. Or look at a hash of size 2^n. Then put 2^n
+ 1 elements in the namespace. The chance of a collision is *always* one.

Doing the calculations (formulas can be found on wikipedia on the site
of the birthday phaenomenon), you can come up with these following
bitlenghts of the hash with a 1e-7 probability of collision in respect
to the worst-case number of elements

10k elements: 49 bit
100k elements: 56 bit
1e6 elements: 63 bit
100e6 elements: 76 bit
1e9 elements: 83 bit
1e12 elements: 102 bit

Best regards,
Johannes

-- 
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1@speranza.aioe.org>

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Generate unique ID for URL Richard <richardbp@gmail.com> - 2012-11-13 15:20 -0800
  Re: Generate unique ID for URL John Gordon <gordon@panix.com> - 2012-11-13 23:34 +0000
    Re: Generate unique ID for URL Richard <richardbp@gmail.com> - 2012-11-13 15:56 -0800
      Re: Generate unique ID for URL Chris Kaynor <ckaynor@zindagigames.com> - 2012-11-13 16:26 -0800
      Re: Generate unique ID for URL Richard Baron Penman <richardbp@gmail.com> - 2012-11-14 11:41 +1100
        Re: Generate unique ID for URL Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-11-14 10:44 +0100
          Re: Generate unique ID for URL Richard <richardbp@gmail.com> - 2012-11-14 03:14 -0800
      Re: Generate unique ID for URL Christian Heimes <christian@python.org> - 2012-11-14 01:43 +0100
        Re: Generate unique ID for URL Richard <richardbp@gmail.com> - 2012-11-13 16:50 -0800
          Re: Generate unique ID for URL Christian Heimes <christian@python.org> - 2012-11-14 02:05 +0100
      Re: Generate unique ID for URL Christian Heimes <christian@python.org> - 2012-11-14 01:59 +0100
        Re: Generate unique ID for URL Richard <richardbp@gmail.com> - 2012-11-13 17:18 -0800
        Re: Generate unique ID for URL Richard <richardbp@gmail.com> - 2012-11-13 17:18 -0800
  Re: Generate unique ID for URL Miki Tebeka <miki.tebeka@gmail.com> - 2012-11-13 16:13 -0800
    Re: Generate unique ID for URL Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-11-14 02:04 +0000
      Re: Generate unique ID for URL Steve Howell <showell30@yahoo.com> - 2012-11-13 18:32 -0800
        Re: Generate unique ID for URL Richard <richardbp@gmail.com> - 2012-11-13 19:12 -0800
  Re: Generate unique ID for URL Roy Smith <roy@panix.com> - 2012-11-13 20:39 -0500
    Re: Generate unique ID for URL Richard <richardbp@gmail.com> - 2012-11-13 19:25 -0800
      Re: Generate unique ID for URL Roy Smith <roy@panix.com> - 2012-11-13 22:38 -0500
        Re: Generate unique ID for URL Richard <richardbp@gmail.com> - 2012-11-13 19:56 -0800
      Re: Generate unique ID for URL Chris Angelico <rosuav@gmail.com> - 2012-11-14 15:06 +1100
        Re: Generate unique ID for URL Richard <richardbp@gmail.com> - 2012-11-13 20:14 -0800
        Re: Generate unique ID for URL Richard <richardbp@gmail.com> - 2012-11-13 20:14 -0800
    Re: Generate unique ID for URL Richard <richardbp@gmail.com> - 2012-11-13 19:27 -0800
    Re: Generate unique ID for URL Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-11-14 12:29 +0100
      Re: Generate unique ID for URL Dave Angel <d@davea.name> - 2012-11-14 07:33 -0500
        Re: Generate unique ID for URL Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-11-14 14:00 +0100

csiph-web