Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #71922

hashing strings to integers (was: hashing strings to integers for sqlite3 keys)

From Adam Funk <a24061@ducksburg.com>
Newsgroups comp.lang.python
Subject hashing strings to integers (was: hashing strings to integers for sqlite3 keys)
Date 2014-05-23 11:27 +0100
Organization $CABAL
Message-ID <sck35bxdc5.ln2@news.ducksburg.com> (permalink)
References (1 earlier) <mailman.10220.1400764235.18130.python-list@python.org> <05c15bxrpj.ln2@news.ducksburg.com> <mailman.10223.1400768058.18130.python-list@python.org> <k9f15bxoql.ln2@news.ducksburg.com> <mailman.10225.1400772863.18130.python-list@python.org>

Show all headers | View raw


On 2014-05-22, Peter Otten wrote:

> Adam Funk wrote:

>> Well, J*v* returns a byte array, so I used to do this:
>> 
>>     digester = MessageDigest.getInstance("MD5");
>>     ...
>>     digester.reset();
>>     byte[] digest = digester.digest(bytes);
>>     return new BigInteger(+1, digest);
>
> In Python 3 there's int.from_bytes()
>
>>>> h = hashlib.sha1(b"Hello world")
>>>> int.from_bytes(h.digest(), "little")
> 538059071683667711846616050503420899184350089339

Excellent, thanks for pointing that out.  I've just recently started
using Python 3 instead of 2, & appreciate pointers to new things like
that.  The only thing that really bugs me in Python 3 is that execfile
has been removed (I find it useful for testing things interactively).


>> I dunno why language designers don't make it easy to get a single big
>> number directly out of these things.
>  
> You hardly ever need to manipulate the numerical value of the digest. And on 
> its way into the database it will be re-serialized anyway.

I now agree that my original plan to hash strings for the SQLite3
table was pointless, so I've changed the subject header.  :-)

I have had good reason to use int hashes in the past, however.  I was
doing some experiments with Andrei Broder's "sketches of shingles"
technique for finding partial duplication between documents, & you
need integers for that so you can do modulo arithmetic.

I've also used hashes of strings for other things involving
deduplication or fast lookups (because integer equality is faster than
string equality).  I guess if it's just for deduplication, though, a
set of byte arrays is as good as a set of int?


-- 
Classical Greek lent itself to the promulgation of a rich culture,
indeed, to Western civilization.  Computer languages bring us
doorbells that chime with thirty-two tunes, alt.sex.bestiality, and
Tetris clones.                                         (Stoll 1995)

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

hashing strings to integers for sqlite3 keys Adam Funk <a24061@ducksburg.com> - 2014-05-22 12:47 +0100
  Re: hashing strings to integers for sqlite3 keys Peter Otten <__peter__@web.de> - 2014-05-22 14:58 +0200
    Re: hashing strings to integers for sqlite3 keys Adam Funk <a24061@ducksburg.com> - 2014-05-22 14:41 +0100
      Re: hashing strings to integers for sqlite3 keys Chris Angelico <rosuav@gmail.com> - 2014-05-23 00:08 +1000
        Re: hashing strings to integers for sqlite3 keys Adam Funk <a24061@ducksburg.com> - 2014-05-22 15:40 +0100
  Re: hashing strings to integers for sqlite3 keys Chris Angelico <rosuav@gmail.com> - 2014-05-22 23:03 +1000
    Re: hashing strings to integers for sqlite3 keys Adam Funk <a24061@ducksburg.com> - 2014-05-22 14:47 +0100
  Re: hashing strings to integers for sqlite3 keys Tim Chase <python.list@tim.thechases.com> - 2014-05-22 08:09 -0500
    Re: hashing strings to integers for sqlite3 keys Adam Funk <a24061@ducksburg.com> - 2014-05-22 14:54 +0100
      Re: hashing strings to integers for sqlite3 keys Chris Angelico <rosuav@gmail.com> - 2014-05-23 00:14 +1000
        Re: hashing strings to integers for sqlite3 keys Adam Funk <a24061@ducksburg.com> - 2014-05-22 15:47 +0100
          Re: hashing strings to integers for sqlite3 keys Chris Angelico <rosuav@gmail.com> - 2014-05-23 01:09 +1000
          Re: hashing strings to integers for sqlite3 keys Peter Otten <__peter__@web.de> - 2014-05-22 17:34 +0200
            hashing strings to integers (was: hashing strings to integers for sqlite3 keys) Adam Funk <a24061@ducksburg.com> - 2014-05-23 11:27 +0100
              Re: hashing strings to integers Adam Funk <a24061@ducksburg.com> - 2014-05-23 11:36 +0100
                Re: hashing strings to integers Chris Angelico <rosuav@gmail.com> - 2014-05-23 21:01 +1000
              Re: hashing strings to integers (was: hashing strings to integers for sqlite3 keys) Chris Angelico <rosuav@gmail.com> - 2014-05-23 20:59 +1000
                Re: hashing strings to integers Adam Funk <a24061@ducksburg.com> - 2014-05-27 16:13 +0100
                Re: hashing strings to integers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-27 17:02 +0000
                Re: hashing strings to integers Chris Angelico <rosuav@gmail.com> - 2014-05-28 05:16 +1000
                Re: hashing strings to integers Dan Sommers <dan@tombstonezero.net> - 2014-05-28 01:55 +0000
                Re: hashing strings to integers Adam Funk <a24061@ducksburg.com> - 2014-06-03 11:29 +0100
                Re: hashing strings to integers Adam Funk <a24061@ducksburg.com> - 2014-06-03 11:32 +0100
              Re: hashing strings to integers Terry Reedy <tjreedy@udel.edu> - 2014-05-23 15:10 -0400
                Re: hashing strings to integers Adam Funk <a24061@ducksburg.com> - 2014-05-27 16:20 +0100
  Re: hashing strings to integers for sqlite3 keys alister <alister.nospam.ware@ntlworld.com> - 2014-05-22 14:48 +0000

csiph-web