Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #72123

Re: hashing strings to integers

References (5 earlier) <mailman.10225.1400772863.18130.python-list@python.org> <sck35bxdc5.ln2@news.ducksburg.com> <mailman.10239.1400842778.18130.python-list@python.org> <amme5bxc3v.ln2@news.ducksburg.com> <5384c539$0$29978$c3e8da3$5496439d@news.astraweb.com>
Date 2014-05-28 05:16 +1000
Subject Re: hashing strings to integers
From Chris Angelico <rosuav@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.10372.1401218186.18130.python-list@python.org> (permalink)

Show all headers | View raw


On Wed, May 28, 2014 at 3:02 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> But I know that Python is a high-level language with
> lots of high-level data structures like dicts which trade-off time and
> memory for programmer convenience, and that I'd want to see some real
> benchmarks proving that my application was too slow before giving up that
> convenience with a complicated strategy like this.

And they trade off that time and memory on the basis of X years of
development expertise. A while ago (about ten years or so, now... wow,
that's quite a while) I wrote a C++ program that needed an
ever-growing array; for simplicity, I went with a very basic system of
doubling the size every time, from a base of something like 1024 or
8192. (Note that it was storing and moving around only pointers, so
it's comparable to Python's list.) That means it has an average 25%
slack space at all times, more until its first allocation, and every
now and then it has a huge job of copying a pile of pointers into a
new array. (Imagine it's now at 16777216 and it needs to add the
16,777,217th string to the array. Bye-bye CPU caches.) These
boundaries became *user-visible pauses*, fortunately at predictable
points, but on a not-terrible computer it could cause a >1s pause just
copying heaps of pointers. How do you think a Python list will
perform, under the same workload (periodic appending of single strings
or small groups of strings, very frequent retrieval based on index -
probably about a 20:1 read:write ratio)? Not only would it be far more
convenient, it's probably going to outperform my hand-rolled code -
unless I'm so brilliant (or lucky) that I can stumble to something as
good as can be achieved with years of dedicated development. Yeah, I
don't think so.

Same goes for hashing algorithms. I can at least boast that I've never
written one of those... although I have several times written a binary
tree of one form or another, in order to provide comparable features
to a dict. And once again, now that I know how convenient and
performant high level languages can be (which may or may not have been
true 15 years ago, but I certainly didn't know it), I don't rewrite
what can be done better by just tossing in a couple of curly braces
and saying "here, map this to that".

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

hashing strings to integers for sqlite3 keys Adam Funk <a24061@ducksburg.com> - 2014-05-22 12:47 +0100
  Re: hashing strings to integers for sqlite3 keys Peter Otten <__peter__@web.de> - 2014-05-22 14:58 +0200
    Re: hashing strings to integers for sqlite3 keys Adam Funk <a24061@ducksburg.com> - 2014-05-22 14:41 +0100
      Re: hashing strings to integers for sqlite3 keys Chris Angelico <rosuav@gmail.com> - 2014-05-23 00:08 +1000
        Re: hashing strings to integers for sqlite3 keys Adam Funk <a24061@ducksburg.com> - 2014-05-22 15:40 +0100
  Re: hashing strings to integers for sqlite3 keys Chris Angelico <rosuav@gmail.com> - 2014-05-22 23:03 +1000
    Re: hashing strings to integers for sqlite3 keys Adam Funk <a24061@ducksburg.com> - 2014-05-22 14:47 +0100
  Re: hashing strings to integers for sqlite3 keys Tim Chase <python.list@tim.thechases.com> - 2014-05-22 08:09 -0500
    Re: hashing strings to integers for sqlite3 keys Adam Funk <a24061@ducksburg.com> - 2014-05-22 14:54 +0100
      Re: hashing strings to integers for sqlite3 keys Chris Angelico <rosuav@gmail.com> - 2014-05-23 00:14 +1000
        Re: hashing strings to integers for sqlite3 keys Adam Funk <a24061@ducksburg.com> - 2014-05-22 15:47 +0100
          Re: hashing strings to integers for sqlite3 keys Chris Angelico <rosuav@gmail.com> - 2014-05-23 01:09 +1000
          Re: hashing strings to integers for sqlite3 keys Peter Otten <__peter__@web.de> - 2014-05-22 17:34 +0200
            hashing strings to integers (was: hashing strings to integers for sqlite3 keys) Adam Funk <a24061@ducksburg.com> - 2014-05-23 11:27 +0100
              Re: hashing strings to integers Adam Funk <a24061@ducksburg.com> - 2014-05-23 11:36 +0100
                Re: hashing strings to integers Chris Angelico <rosuav@gmail.com> - 2014-05-23 21:01 +1000
              Re: hashing strings to integers (was: hashing strings to integers for sqlite3 keys) Chris Angelico <rosuav@gmail.com> - 2014-05-23 20:59 +1000
                Re: hashing strings to integers Adam Funk <a24061@ducksburg.com> - 2014-05-27 16:13 +0100
                Re: hashing strings to integers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-27 17:02 +0000
                Re: hashing strings to integers Chris Angelico <rosuav@gmail.com> - 2014-05-28 05:16 +1000
                Re: hashing strings to integers Dan Sommers <dan@tombstonezero.net> - 2014-05-28 01:55 +0000
                Re: hashing strings to integers Adam Funk <a24061@ducksburg.com> - 2014-06-03 11:29 +0100
                Re: hashing strings to integers Adam Funk <a24061@ducksburg.com> - 2014-06-03 11:32 +0100
              Re: hashing strings to integers Terry Reedy <tjreedy@udel.edu> - 2014-05-23 15:10 -0400
                Re: hashing strings to integers Adam Funk <a24061@ducksburg.com> - 2014-05-27 16:20 +0100
  Re: hashing strings to integers for sqlite3 keys alister <alister.nospam.ware@ntlworld.com> - 2014-05-22 14:48 +0000

csiph-web