Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #19313 > unrolled thread

Re: String interning in Python 3 - missing or moved?

Started byChris Angelico <rosuav@gmail.com>
First post2012-01-24 15:47 +1100
Last post2012-01-24 15:47 +1100
Articles 1 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: String interning in Python 3 - missing or moved? Chris Angelico <rosuav@gmail.com> - 2012-01-24 15:47 +1100

#19313 — Re: String interning in Python 3 - missing or moved?

FromChris Angelico <rosuav@gmail.com>
Date2012-01-24 15:47 +1100
SubjectRe: String interning in Python 3 - missing or moved?
Message-ID<mailman.5007.1327380479.27778.python-list@python.org>
On Tue, Jan 24, 2012 at 3:18 PM, Terry Reedy <tjreedy@udel.edu> wrote:
> I think that the devs decided that interning is a minor internal
> optimization that users generally should not fiddle with (especially how
> that so much is done automatically anyway*), while having it a builtin made
> it look like something they should pay attention to.
>
> *I am not sure but what hashes for strings either are or in 3.3 will always
> be cached.

I'm of the opinion that hash() shouldn't be relied upon, but
apparently there's code "out there" that would be broken if hash()
changed (and, quite reasonably, the devs don't want to make a sudden
change as a bug-fix release). String interning basically turns every
string into a completely opaque hash; you can use 'is' to test for
equality of two interned strings. Having intern() as a builtin cannot
encourage any worse behavior than relying on hash(), imho - both make
no promises of constancy across runs.

Lua and Pike both quite happily solved hash collision attacks in their
interning of strings by randomizing the hash used, because there's no
way to rely on it. Presumably (based on the intern() docs) Python can
do the same, if you explicitly intern your strings first. Is it worth
recommending that people do this with anything that is
client-provided, and then simply randomize the intern() hash? This
would allow hash() to be unchanged, intern() to still do exactly what
it's always done, and hash collision attacks to be eliminated.

ChrisA

[toc] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web