Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #72118

Re: hashing strings to integers

From Adam Funk <a24061@ducksburg.com>
Newsgroups comp.lang.python
Subject Re: hashing strings to integers
Date 2014-05-27 16:13 +0100
Organization $CABAL
Message-ID <amme5bxc3v.ln2@news.ducksburg.com> (permalink)
References (3 earlier) <mailman.10223.1400768058.18130.python-list@python.org> <k9f15bxoql.ln2@news.ducksburg.com> <mailman.10225.1400772863.18130.python-list@python.org> <sck35bxdc5.ln2@news.ducksburg.com> <mailman.10239.1400842778.18130.python-list@python.org>

Show all headers | View raw


On 2014-05-23, Chris Angelico wrote:

> On Fri, May 23, 2014 at 8:27 PM, Adam Funk <a24061@ducksburg.com> wrote:
>> I've also used hashes of strings for other things involving
>> deduplication or fast lookups (because integer equality is faster than
>> string equality).  I guess if it's just for deduplication, though, a
>> set of byte arrays is as good as a set of int?
>
> Want a better way to do that? Use a set or dict and let Python do the
> hashing. It'll be every bit as fast as explicit hashing, plus you get
> the bonus of simplicity.

Well, here's the way it works in my mind:

   I can store a set of a zillion strings (or a dict with a zillion
   string keys), but every time I test "if new_string in
   seen_strings", the computer hashes the new_string using some kind
   of "short hash", checks the set for matching buckets (I'm assuming
   this is how python tests set membership --- is that right?), then
   checks any hash-matches for string equality.  Testing string
   equality is slower than integer equality, and strings (unless they
   are really short) take up a lot more memory than long integers.

So for that kind of thing, I tend to store MD5 hashes or something
similar.  Is that wrong?


-- 
With the breakdown of the medieval system, the gods of chaos, lunacy,
and bad taste gained ascendancy.
                                                --- Ignatius J Reilly

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

hashing strings to integers for sqlite3 keys Adam Funk <a24061@ducksburg.com> - 2014-05-22 12:47 +0100
  Re: hashing strings to integers for sqlite3 keys Peter Otten <__peter__@web.de> - 2014-05-22 14:58 +0200
    Re: hashing strings to integers for sqlite3 keys Adam Funk <a24061@ducksburg.com> - 2014-05-22 14:41 +0100
      Re: hashing strings to integers for sqlite3 keys Chris Angelico <rosuav@gmail.com> - 2014-05-23 00:08 +1000
        Re: hashing strings to integers for sqlite3 keys Adam Funk <a24061@ducksburg.com> - 2014-05-22 15:40 +0100
  Re: hashing strings to integers for sqlite3 keys Chris Angelico <rosuav@gmail.com> - 2014-05-22 23:03 +1000
    Re: hashing strings to integers for sqlite3 keys Adam Funk <a24061@ducksburg.com> - 2014-05-22 14:47 +0100
  Re: hashing strings to integers for sqlite3 keys Tim Chase <python.list@tim.thechases.com> - 2014-05-22 08:09 -0500
    Re: hashing strings to integers for sqlite3 keys Adam Funk <a24061@ducksburg.com> - 2014-05-22 14:54 +0100
      Re: hashing strings to integers for sqlite3 keys Chris Angelico <rosuav@gmail.com> - 2014-05-23 00:14 +1000
        Re: hashing strings to integers for sqlite3 keys Adam Funk <a24061@ducksburg.com> - 2014-05-22 15:47 +0100
          Re: hashing strings to integers for sqlite3 keys Chris Angelico <rosuav@gmail.com> - 2014-05-23 01:09 +1000
          Re: hashing strings to integers for sqlite3 keys Peter Otten <__peter__@web.de> - 2014-05-22 17:34 +0200
            hashing strings to integers (was: hashing strings to integers for sqlite3 keys) Adam Funk <a24061@ducksburg.com> - 2014-05-23 11:27 +0100
              Re: hashing strings to integers Adam Funk <a24061@ducksburg.com> - 2014-05-23 11:36 +0100
                Re: hashing strings to integers Chris Angelico <rosuav@gmail.com> - 2014-05-23 21:01 +1000
              Re: hashing strings to integers (was: hashing strings to integers for sqlite3 keys) Chris Angelico <rosuav@gmail.com> - 2014-05-23 20:59 +1000
                Re: hashing strings to integers Adam Funk <a24061@ducksburg.com> - 2014-05-27 16:13 +0100
                Re: hashing strings to integers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-27 17:02 +0000
                Re: hashing strings to integers Chris Angelico <rosuav@gmail.com> - 2014-05-28 05:16 +1000
                Re: hashing strings to integers Dan Sommers <dan@tombstonezero.net> - 2014-05-28 01:55 +0000
                Re: hashing strings to integers Adam Funk <a24061@ducksburg.com> - 2014-06-03 11:29 +0100
                Re: hashing strings to integers Adam Funk <a24061@ducksburg.com> - 2014-06-03 11:32 +0100
              Re: hashing strings to integers Terry Reedy <tjreedy@udel.edu> - 2014-05-23 15:10 -0400
                Re: hashing strings to integers Adam Funk <a24061@ducksburg.com> - 2014-05-27 16:20 +0100
  Re: hashing strings to integers for sqlite3 keys alister <alister.nospam.ware@ntlworld.com> - 2014-05-22 14:48 +0000

csiph-web