Groups > comp.lang.python > #20043 > unrolled thread

Re: frozendict

Started by	Nathan Rice <nathan.alexander.rice@gmail.com>
First post	2012-02-08 22:43 -0500
Last post	2012-02-09 11:50 -0500
Articles	18 — 9 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: frozendict Nathan Rice <nathan.alexander.rice@gmail.com> - 2012-02-08 22:43 -0500
    Re: frozendict Duncan Booth <duncan.booth@invalid.invalid> - 2012-02-09 10:33 +0000
      Re: frozendict Nathan Rice <nathan.alexander.rice@gmail.com> - 2012-02-09 09:36 -0500
        Re: frozendict Duncan Booth <duncan.booth@invalid.invalid> - 2012-02-09 14:52 +0000
          Re: frozendict Nathan Rice <nathan.alexander.rice@gmail.com> - 2012-02-09 10:19 -0500
            Re: frozendict Duncan Booth <duncan.booth@invalid.invalid> - 2012-02-09 18:47 +0000
          Re: frozendict Ian Kelly <ian.g.kelly@gmail.com> - 2012-02-09 09:35 -0700
            Re: frozendict Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-10 01:24 +0000
              Re: frozendict Nathan Rice <nathan.alexander.rice@gmail.com> - 2012-02-09 21:30 -0500
              Re: frozendict Terry Reedy <tjreedy@udel.edu> - 2012-02-09 22:33 -0500
              Re: frozendict Chris Angelico <rosuav@gmail.com> - 2012-02-10 21:08 +1100
              Re: frozendict Nathan Rice <nathan.alexander.rice@gmail.com> - 2012-02-10 11:53 -0500
              Re: frozendict Chris Rebert <clp2@rebertia.com> - 2012-02-10 09:00 -0800
              Re: frozendict Nathan Rice <nathan.alexander.rice@gmail.com> - 2012-02-10 13:14 -0500
                Re: frozendict John Nagle <nagle@animats.com> - 2012-02-10 10:57 -0800
                  Re: frozendict 88888 Dihedral <dihedral88888@googlemail.com> - 2012-02-10 21:52 -0800
                    Re: frozendict John Nagle <nagle@animats.com> - 2012-02-13 13:15 -0800
          Re: frozendict Nathan Rice <nathan.alexander.rice@gmail.com> - 2012-02-09 11:50 -0500

#20043 — Re: frozendict

From	Nathan Rice <nathan.alexander.rice@gmail.com>
Date	2012-02-08 22:43 -0500
Subject	Re: frozendict
Message-ID	<mailman.5558.1328759017.27778.python-list@python.org>

> Turn the question around: why should there be?
> Python is intentionally parsimonious in adding builtins.

For the same reason there are frozensets?

I put dicts in sets all the time.  I just tuple the items, but that
means you have to re-dict it on the way out to do anything useful with
it.  I am too lazy to write a frozendict or import one, but I would
use it if it was a builtin.


Nathan

[toc] | [next] | [standalone]

#20063

From	Duncan Booth <duncan.booth@invalid.invalid>
Date	2012-02-09 10:33 +0000
Message-ID	<Xns9FF46B35EDFF9duncanbooth@127.0.0.1>
In reply to	#20043

Nathan Rice <nathan.alexander.rice@gmail.com> wrote:

> I put dicts in sets all the time.  I just tuple the items, but that
> means you have to re-dict it on the way out to do anything useful with
> it.  I am too lazy to write a frozendict or import one, but I would
> use it if it was a builtin.
> 
I hope you sort the items before putting them in a tuple, otherwise how do 
you handle two identical dicts that return their items in a different 
order?


-- 
Duncan Booth http://kupuguy.blogspot.com

[toc] | [prev] | [next] | [standalone]

#20078

From	Nathan Rice <nathan.alexander.rice@gmail.com>
Date	2012-02-09 09:36 -0500
Message-ID	<mailman.5588.1328798176.27778.python-list@python.org>
In reply to	#20063

On Thu, Feb 9, 2012 at 5:33 AM, Duncan Booth
<duncan.booth@invalid.invalid> wrote:
> Nathan Rice <nathan.alexander.rice@gmail.com> wrote:
>
>> I put dicts in sets all the time.  I just tuple the items, but that
>> means you have to re-dict it on the way out to do anything useful with
>> it.  I am too lazy to write a frozendict or import one, but I would
>> use it if it was a builtin.
>>
> I hope you sort the items before putting them in a tuple, otherwise how do
> you handle two identical dicts that return their items in a different
> order?

Two dicts created from the same inputs will return items in the same
arbitrary order.  As long as you don't insert or delete a key you're
fine.


Nathan

[toc] | [prev] | [next] | [standalone]

#20080

From	Duncan Booth <duncan.booth@invalid.invalid>
Date	2012-02-09 14:52 +0000
Message-ID	<Xns9FF497406322duncanbooth@127.0.0.1>
In reply to	#20078

Nathan Rice <nathan.alexander.rice@gmail.com> wrote:

> On Thu, Feb 9, 2012 at 5:33 AM, Duncan Booth
><duncan.booth@invalid.invalid> wrote:
>> Nathan Rice <nathan.alexander.rice@gmail.com> wrote:
>>
>>> I put dicts in sets all the time.  I just tuple the items, but that
>>> means you have to re-dict it on the way out to do anything useful
>>> with it.  I am too lazy to write a frozendict or import one, but I
>>> would use it if it was a builtin.
>>>
>> I hope you sort the items before putting them in a tuple, otherwise
>> how d 
> o
>> you handle two identical dicts that return their items in a different
>> order?
> 
> Two dicts created from the same inputs will return items in the same
> arbitrary order.  As long as you don't insert or delete a key you're
> fine.
> 
Two dicts that contain the same keys and values may or may not return them 
in the same order:

>>> dict.fromkeys('ia')
{'i': None, 'a': None}
>>> dict.fromkeys('ai')
{'a': None, 'i': None}

Would your system count those two dicts as the same?

If the sequence in which the keys were added to the dict (and deleted if 
appropriate) is exactly the same then it is likely but still not guaranteed 
that they will have the same order. The only ordering guarantee is that 
within a single dict keys and values will appear in a consistent order so 
long as you don't add/delete keys between calls.

-- 
Duncan Booth http://kupuguy.blogspot.com

[toc] | [prev] | [next] | [standalone]

#20081

From	Nathan Rice <nathan.alexander.rice@gmail.com>
Date	2012-02-09 10:19 -0500
Message-ID	<mailman.5593.1328800790.27778.python-list@python.org>
In reply to	#20080

>> Two dicts created from the same inputs will return items in the same
>> arbitrary order.  As long as you don't insert or delete a key you're
>> fine.
>>
> Two dicts that contain the same keys and values may or may not return them
> in the same order:
>
>>>> dict.fromkeys('ia')
> {'i': None, 'a': None}
>>>> dict.fromkeys('ai')
> {'a': None, 'i': None}
>
> Would your system count those two dicts as the same?
>
> If the sequence in which the keys were added to the dict (and deleted if
> appropriate) is exactly the same then it is likely but still not guaranteed
> that they will have the same order. The only ordering guarantee is that
> within a single dict keys and values will appear in a consistent order so
> long as you don't add/delete keys between calls.

As I said, two dictionaries created from the same input will be the
same...  'ai' != 'ia'.  If I need to hash a dict that I don't know was
created in a deterministic order, I'd frozenset(thedict.items()).


Nathan

[toc] | [prev] | [next] | [standalone]

#20094

From	Duncan Booth <duncan.booth@invalid.invalid>
Date	2012-02-09 18:47 +0000
Message-ID	<Xns9FF4BF1DD69ADduncanbooth@127.0.0.1>
In reply to	#20081

Nathan Rice <nathan.alexander.rice@gmail.com> wrote:

> As I said, two dictionaries created from the same input will be the
> same...  'ai' != 'ia'.  If I need to hash a dict that I don't know was
> created in a deterministic order, I'd frozenset(thedict.items()).
> 
Fair enough, the idea scares me, but it's your code and your risk.

BTW, make sure that if you ever copy any of those dictionaries they all get 
copied the same number of times as simply copying a dict can change the key 
order.

>>> dict.fromkeys('me')
{'e': None, 'm': None}
>>> dict(dict.fromkeys('me'))
{'m': None, 'e': None}
>>> dict(dict(dict.fromkeys('me')))
{'e': None, 'm': None}

-- 
Duncan Booth http://kupuguy.blogspot.com

[toc] | [prev] | [next] | [standalone]

#20086

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2012-02-09 09:35 -0700
Message-ID	<mailman.5597.1328805384.27778.python-list@python.org>
In reply to	#20080

On Thu, Feb 9, 2012 at 8:19 AM, Nathan Rice
<nathan.alexander.rice@gmail.com> wrote:
> As I said, two dictionaries created from the same input will be the
> same...

That's an implementation detail, not a guarantee.  It will hold for
current versions of CPython but not necessarily for other Python
implementations.

[toc] | [prev] | [next] | [standalone]

#20126

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2012-02-10 01:24 +0000
Message-ID	<4f3471e9$0$29986$c3e8da3$5496439d@news.astraweb.com>
In reply to	#20086

On Thu, 09 Feb 2012 09:35:52 -0700, Ian Kelly wrote:

> On Thu, Feb 9, 2012 at 8:19 AM, Nathan Rice
> <nathan.alexander.rice@gmail.com> wrote:
>> As I said, two dictionaries created from the same input will be the
>> same...
> 
> That's an implementation detail, not a guarantee.  It will hold for
> current versions of CPython but not necessarily for other Python
> implementations.

That day may be sooner than you think. It is very likely that in Python 
3.3, dict order will be randomized on creation as a side-effect of adding 
a random salt to hashes to prevent a serious vulnerability in dicts.

http://securitytracker.com/id/1026478

http://bugs.python.org/issue13703

If there is anyone still assuming that dicts have a predictable order, 
they're going to be in for a nasty surprise one of these days.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#20131

From	Nathan Rice <nathan.alexander.rice@gmail.com>
Date	2012-02-09 21:30 -0500
Message-ID	<mailman.5630.1328841051.27778.python-list@python.org>
In reply to	#20126

On Thu, Feb 9, 2012 at 8:24 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> On Thu, 09 Feb 2012 09:35:52 -0700, Ian Kelly wrote:
>
>> On Thu, Feb 9, 2012 at 8:19 AM, Nathan Rice
>> <nathan.alexander.rice@gmail.com> wrote:
>>> As I said, two dictionaries created from the same input will be the
>>> same...
>>
>> That's an implementation detail, not a guarantee.  It will hold for
>> current versions of CPython but not necessarily for other Python
>> implementations.
>
> That day may be sooner than you think. It is very likely that in Python
> 3.3, dict order will be randomized on creation as a side-effect of adding
> a random salt to hashes to prevent a serious vulnerability in dicts.
>
> http://securitytracker.com/id/1026478
>
> http://bugs.python.org/issue13703
>
>
> If there is anyone still assuming that dicts have a predictable order,
> they're going to be in for a nasty surprise one of these days.

The only thing needed to avoid the hash collision is that your hash
function is not not 100% predictable just by looking at the python
source code.  I don't see why every dict would have to be created
differently.  I would think having the most ubiquitous data structure
in your language be more predictable would be a priority.  Oh well....


Nathan

[toc] | [prev] | [next] | [standalone]

#20136

From	Terry Reedy <tjreedy@udel.edu>
Date	2012-02-09 22:33 -0500
Message-ID	<mailman.5635.1328844905.27778.python-list@python.org>
In reply to	#20126

On 2/9/2012 9:30 PM, Nathan Rice wrote:

>> That day may be sooner than you think. It is very likely that in Python
>> 3.3, dict order will be randomized on creation as a side-effect of adding
>> a random salt to hashes to prevent a serious vulnerability in dicts.
>>
>> http://securitytracker.com/id/1026478
>>
>> http://bugs.python.org/issue13703
>>
>>
>> If there is anyone still assuming that dicts have a predictable order,
>> they're going to be in for a nasty surprise one of these days.
>
> The only thing needed to avoid the hash collision is that your hash
> function is not not 100% predictable just by looking at the python
> source code.  I don't see why every dict would have to be created
> differently.  I would think having the most ubiquitous data structure
> in your language be more predictable would be a priority.  Oh well....

I believe 'on creation' means 'on process startup', not on dict 
creation. There have, however, been several variants suggested, and the 
focus has been on choosing one first for past and current versions. 3.3 
is about 6 months off and hash for it may still be debated.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#20151

From	Chris Angelico <rosuav@gmail.com>
Date	2012-02-10 21:08 +1100
Message-ID	<mailman.5645.1328868491.27778.python-list@python.org>
In reply to	#20126

On Fri, Feb 10, 2012 at 1:30 PM, Nathan Rice
<nathan.alexander.rice@gmail.com> wrote:
> The only thing needed to avoid the hash collision is that your hash
> function is not not 100% predictable just by looking at the python
> source code.  I don't see why every dict would have to be created
> differently.  I would think having the most ubiquitous data structure
> in your language be more predictable would be a priority.  Oh well....

It's perfectly predictable. If you put a series of keys into it, you
get those same keys back. Nobody ever promised anything about order.

If your hash function is not 100% predictable, that means it varies on
the basis of something that isn't part of either the Python
interpreter or the script being run. That means that, from one
execution to another of the exact same code, the results could be
different. The keys will come out in different orders.

ChrisA

[toc] | [prev] | [next] | [standalone]

#20173

From	Nathan Rice <nathan.alexander.rice@gmail.com>
Date	2012-02-10 11:53 -0500
Message-ID	<mailman.5670.1328892792.27778.python-list@python.org>
In reply to	#20126

On Fri, Feb 10, 2012 at 5:08 AM, Chris Angelico <rosuav@gmail.com> wrote:
> On Fri, Feb 10, 2012 at 1:30 PM, Nathan Rice
> <nathan.alexander.rice@gmail.com> wrote:
>> The only thing needed to avoid the hash collision is that your hash
>> function is not not 100% predictable just by looking at the python
>> source code.  I don't see why every dict would have to be created
>> differently.  I would think having the most ubiquitous data structure
>> in your language be more predictable would be a priority.  Oh well....
>
> It's perfectly predictable. If you put a series of keys into it, you
> get those same keys back. Nobody ever promised anything about order.
>
> If your hash function is not 100% predictable, that means it varies on
> the basis of something that isn't part of either the Python
> interpreter or the script being run. That means that, from one
> execution to another of the exact same code, the results could be
> different. The keys will come out in different orders.

I think having a hash function that is not referentially transparent
is a bad thing.  Basing your language on a non-deterministic function?
 Yeah...

A type factory that produces the dict type on interpreter
initialization (or, replaces the hash function, rather), and uses
time/system information/etc would solve the problem, while limiting
the introduced non-determinism.  I don't care if the order of
iteration for keys is different from interpreter run to run.

I have used frozenset(mydict.items()) when my requirements dictated.
It is a minor performance hit.

Lets also not forget that knowing an object is immutable lets you do a
lot of optimizations; it can be inlined, it is safe to convert to a
contiguous block of memory and stuff in cache, etc.  If you know the
input to a function is guaranteed to be frozen you can just go crazy.
Being able to freeze(anyobject) seems like a pretty clear win.
Whether or not it is pythonic is debatable.  I'd argue if the meaning
of pythonic in some context is limiting, we should consider updating
the term rather than being dogmatic.

Just my 2 cents...

Nathan

[toc] | [prev] | [next] | [standalone]

#20174

From	Chris Rebert <clp2@rebertia.com>
Date	2012-02-10 09:00 -0800
Message-ID	<mailman.5671.1328893258.27778.python-list@python.org>
In reply to	#20126

On Fri, Feb 10, 2012 at 8:53 AM, Nathan Rice
<nathan.alexander.rice@gmail.com> wrote:
<snip>
> Lets also not forget that knowing an object is immutable lets you do a
> lot of optimizations; it can be inlined, it is safe to convert to a
> contiguous block of memory and stuff in cache, etc.  If you know the
> input to a function is guaranteed to be frozen you can just go crazy.
> Being able to freeze(anyobject) seems like a pretty clear win.
> Whether or not it is pythonic is debatable.  I'd argue if the meaning
> of pythonic in some context is limiting, we should consider updating
> the term rather than being dogmatic.

It's been proposed previously and rejected:
PEP 351 -- The freeze protocol
http://www.python.org/dev/peps/pep-0351/

Cheers,
Chris

[toc] | [prev] | [next] | [standalone]

#20179

From	Nathan Rice <nathan.alexander.rice@gmail.com>
Date	2012-02-10 13:14 -0500
Message-ID	<mailman.5673.1328897669.27778.python-list@python.org>
In reply to	#20126

>> Lets also not forget that knowing an object is immutable lets you do a
>> lot of optimizations; it can be inlined, it is safe to convert to a
>> contiguous block of memory and stuff in cache, etc.  If you know the
>> input to a function is guaranteed to be frozen you can just go crazy.
>> Being able to freeze(anyobject) seems like a pretty clear win.
>> Whether or not it is pythonic is debatable.  I'd argue if the meaning
>> of pythonic in some context is limiting, we should consider updating
>> the term rather than being dogmatic.

Sweet, looking at the reason for rejection:

1. How dicts (and multiply nested objects) should be frozen was not
completely obvious
2. "frozen()" implies in place, thus confusing users
3. freezing something like a list is confusing because some list
methods would disappear or cause errors
4. Because automatic conversion in the proposal was seen as too
involved to be so magical
5. Because frozendicts are the main end user benefit, and using dicts
as keys was seen as "suspect"

Honestly, as far as #1, we already have copy and deepcopy, the can of
worms is already opened (and necessarily so).  For 2, choose a better
name?  For 3, we have abstract base classes now, which make a nice
distinction between mutable and immutable sequences; nominal types are
a crutch, thinking in terms of structure is much more powerful.  I
agree with point 4, if magic does anything besides make the code
conform to what an informed user would expect, it should be considered
heretical.  As for #5, I feel using a collection of key value
relations as a key in another collection is not inherently bad, it
just depends on the context... The mutability is the rub.  Also,
immutability provides scaffolding to improve performance and
concurrency (both of which are top tier language features).

I understand that this is one of those cases where Guido has a strong
"bad feeling" about something, and I think a consequence of that is
people tread lightly.  Perhaps I'm a bit of a language communist in
that regard (historically a dangerous philosophy :)

As an aside, I find it kind of schizophrenic how on one hand Python is
billed as a language for consenting adults (see duck typing, no data
hiding, etc) and on the other hand users need to be protected from
themselves.  Better to serve just one flavor of kool-aid imo.


Nathan

[toc] | [prev] | [next] | [standalone]

#20180

From	John Nagle <nagle@animats.com>
Date	2012-02-10 10:57 -0800
Message-ID	<4f356898$0$11996$742ec2ed@news.sonic.net>
In reply to	#20179

On 2/10/2012 10:14 AM, Nathan Rice wrote:
>>> Lets also not forget that knowing an object is immutable lets you do a
>>> lot of optimizations; it can be inlined, it is safe to convert to a
>>> contiguous block of memory and stuff in cache, etc.  If you know the
>>> input to a function is guaranteed to be frozen you can just go crazy.
>>> Being able to freeze(anyobject) seems like a pretty clear win.
>>> Whether or not it is pythonic is debatable.  I'd argue if the meaning
>>> of pythonic in some context is limiting, we should consider updating
>>> the term rather than being dogmatic.

     A real justification for the ability to make anything immutable is
to make it safely shareable between threads.  If it's immutable, it
doesn't have to be locked for access.  Mozilla's new "Rust"
language takes advantage of this.  Take a look at Rust's concurrency
semantics.  They've made some progress.

					John Nagle

[toc] | [prev] | [next] | [standalone]

#20212

From	88888 Dihedral <dihedral88888@googlemail.com>
Date	2012-02-10 21:52 -0800
Message-ID	<11277623.419.1328939579823.JavaMail.geo-discussion-forums@pbcql8>
In reply to	#20180

在 2012年2月11日星期六UTC+8上午2时57分34秒，John Nagle写道：
> On 2/10/2012 10:14 AM, Nathan Rice wrote:
> >>> Lets also not forget that knowing an object is immutable lets you do a
> >>> lot of optimizations; it can be inlined, it is safe to convert to a
> >>> contiguous block of memory and stuff in cache, etc.  If you know the
> >>> input to a function is guaranteed to be frozen you can just go crazy.
> >>> Being able to freeze(anyobject) seems like a pretty clear win.
> >>> Whether or not it is pythonic is debatable.  I'd argue if the meaning
> >>> of pythonic in some context is limiting, we should consider updating
> >>> the term rather than being dogmatic.
> 
>      A real justification for the ability to make anything immutable is
> to make it safely shareable between threads.  If it's immutable, it
> doesn't have to be locked for access.  Mozilla's new "Rust"
> language takes advantage of this.  Take a look at Rust's concurrency
> semantics.  They've made some progress.
> 
> 					John Nagl


Lets model the system as an asynchronous set of objects with multiple threads
performing operatons on objects as in the above.

This reminds me  the old problem solved before in the digital hardware.

[toc] | [prev] | [next] | [standalone]

#20369

From	John Nagle <nagle@animats.com>
Date	2012-02-13 13:15 -0800
Message-ID	<4f397d7a$0$12009$742ec2ed@news.sonic.net>
In reply to	#20212

On 2/10/2012 9:52 PM, 88888 Dihedral wrote:
> 在 2012年2月11日星期六UTC+8上午2时57分34秒，John Nagle写道：
>> On 2/10/2012 10:14 AM, Nathan Rice wrote:
>>>>> Lets also not forget that knowing an object is immutable lets you do a
>>>>> lot of optimizations; it can be inlined, it is safe to convert to a
>>>>> contiguous block of memory and stuff in cache, etc.  If you know the
>>>>> input to a function is guaranteed to be frozen you can just go crazy.
>>>>> Being able to freeze(anyobject) seems like a pretty clear win.
>>>>> Whether or not it is pythonic is debatable.  I'd argue if the meaning
>>>>> of pythonic in some context is limiting, we should consider updating
>>>>> the term rather than being dogmatic.
>>
>>       A real justification for the ability to make anything immutable is
>> to make it safely shareable between threads.  If it's immutable, it
>> doesn't have to be locked for access.  Mozilla's new "Rust"
>> language takes advantage of this.  Take a look at Rust's concurrency
>> semantics.  They've made some progress.
>>
>> 					John Nagl
>
>
> Lets model the system as an asynchronous set of objects with multiple threads
> performing operatons on objects as in the above.

    I'd argue for a concurrency system where everything is either 
immutable, unshared, synchronized, or owned by a synchronized object.
This eliminates almost all explicit locking.

    Python's use of immutability has potential in that direction, but
Python doesn't do anything with that concept.

					John Nagle

[toc] | [prev] | [next] | [standalone]

#20088

From	Nathan Rice <nathan.alexander.rice@gmail.com>
Date	2012-02-09 11:50 -0500
Message-ID	<mailman.5599.1328806255.27778.python-list@python.org>
In reply to	#20080

On Thu, Feb 9, 2012 at 11:35 AM, Ian Kelly <ian.g.kelly@gmail.com> wrote:
> On Thu, Feb 9, 2012 at 8:19 AM, Nathan Rice
> <nathan.alexander.rice@gmail.com> wrote:
>> As I said, two dictionaries created from the same input will be the
>> same...
>
> That's an implementation detail, not a guarantee.  It will hold for
> current versions of CPython but not necessarily for other Python
> implementations.

That is true.  The implications of the function that creates
dictionaries being non-deterministic are a little scary though, so I
suspect that it will hold :)

Nathan

[toc] | [prev] | [standalone]

csiph-web

Re: frozendict

Contents

#20043 — Re: frozendict

#20063

#20078

#20080

#20081

#20094

#20086

#20126

#20131

#20136

#20151

#20173

#20174

#20179

#20180

#20212

#20369

#20088