Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #26806 > unrolled thread

save dictionary to a file without brackets.

Started bygiuseppe.amatulli@gmail.com
First post2012-08-09 13:11 -0700
Last post2012-08-10 02:02 +0400
Articles 15 on this page of 55 — 18 participants

Back to article view | Back to comp.lang.python


Contents

  save dictionary to a file without brackets. giuseppe.amatulli@gmail.com - 2012-08-09 13:11 -0700
    Re: save dictionary to a file without brackets. Roman Vashkevich <vashkevichrb@gmail.com> - 2012-08-10 00:22 +0400
    Re: save dictionary to a file without brackets. Tim Chase <python.list@tim.thechases.com> - 2012-08-09 15:35 -0500
    Re: save dictionary to a file without brackets. Giuseppe Amatulli <giuseppe.amatulli@gmail.com> - 2012-08-09 15:35 -0500
    Re: save dictionary to a file without brackets. Gelonida N <gelonida@gmail.com> - 2012-08-09 22:35 +0200
    Re: save dictionary to a file without brackets. Giuseppe Amatulli <giuseppe.amatulli@gmail.com> - 2012-08-09 15:38 -0500
    Re: save dictionary to a file without brackets. Roman Vashkevich <vashkevichrb@gmail.com> - 2012-08-10 00:41 +0400
    Re: save dictionary to a file without brackets. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-09 22:17 +0100
    Re: save dictionary to a file without brackets. Tim Chase <python.list@tim.thechases.com> - 2012-08-09 16:21 -0500
    Re: save dictionary to a file without brackets. Roman Vashkevich <vashkevichrb@gmail.com> - 2012-08-10 01:34 +0400
      Re: save dictionary to a file without brackets. Andrew Cooper <amc96@cam.ac.uk> - 2012-08-09 23:03 +0100
        Re: save dictionary to a file without brackets. Dave Angel <d@davea.name> - 2012-08-09 18:26 -0400
          Re: save dictionary to a file without brackets. Andrew Cooper <amc96@cam.ac.uk> - 2012-08-09 23:54 +0100
            Re: save dictionary to a file without brackets. Roy Smith <roy@panix.com> - 2012-08-09 19:05 -0400
              Re: save dictionary to a file without brackets. Chris Angelico <rosuav@gmail.com> - 2012-08-10 09:14 +1000
                Re: save dictionary to a file without brackets. Roy Smith <roy@panix.com> - 2012-08-09 19:24 -0400
                  Re: save dictionary to a file without brackets. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-10 00:33 +0100
                  Re: save dictionary to a file without brackets. Tim Chase <python.list@tim.thechases.com> - 2012-08-09 19:16 -0500
                    Re: save dictionary to a file without brackets. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-10 08:54 +0000
                      Re: save dictionary to a file without brackets. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-10 10:37 +0100
                        Re: save dictionary to a file without brackets. Roy Smith <roy@panix.com> - 2012-08-10 08:29 -0400
                          Re: save dictionary to a file without brackets. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-10 15:47 +0100
                        Re: save dictionary to a file without brackets. alex23 <wuwei23@gmail.com> - 2012-08-12 17:15 -0700
                          Re: save dictionary to a file without brackets. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-13 08:05 +0000
                            Re: save dictionary to a file without brackets. alex23 <wuwei23@gmail.com> - 2012-08-13 09:16 -0700
                            Re: save dictionary to a file without brackets. rusi <rustompmody@gmail.com> - 2012-08-13 08:55 -0700
                          Re: save dictionary to a file without brackets. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-13 13:37 +0100
                            Re: save dictionary to a file without brackets. alex23 <wuwei23@gmail.com> - 2012-08-13 09:14 -0700
                              Re: save dictionary to a file without brackets. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-13 18:07 +0100
                                Re: save dictionary to a file without brackets. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-14 02:54 +0000
                                  Re: save dictionary to a file without brackets. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-14 12:33 +0100
                                    Re: save dictionary to a file without brackets. Alister <alister.ware@ntlworld.com> - 2012-08-14 15:05 +0000
                                      Re: save dictionary to a file without brackets. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-14 19:15 +0100
                          Re: save dictionary to a file without brackets. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-13 18:43 +0100
                            Re: save dictionary to a file without brackets. alex23 <wuwei23@gmail.com> - 2012-08-13 18:32 -0700
                      Re: save dictionary to a file without brackets. Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-08-10 12:34 -0400
                  Re: save dictionary to a file without brackets. Dave Angel <d@davea.name> - 2012-08-09 20:27 -0400
                  Re: save dictionary to a file without brackets. Chris Angelico <rosuav@gmail.com> - 2012-08-10 10:31 +1000
            Re: save dictionary to a file without brackets. Dave Angel <d@davea.name> - 2012-08-09 19:38 -0400
        Re: save dictionary to a file without brackets. Tim Chase <python.list@tim.thechases.com> - 2012-08-09 17:39 -0500
        Re: save dictionary to a file without brackets. Chris Kaynor <ckaynor@zindagigames.com> - 2012-08-09 15:37 -0700
        Re: save dictionary to a file without brackets. Chris Angelico <rosuav@gmail.com> - 2012-08-10 08:53 +1000
          Re: save dictionary to a file without brackets. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-11 11:26 +0000
            Re: save dictionary to a file without brackets. 88888 Dihedral <dihedral88888@googlemail.com> - 2012-08-12 04:59 -0700
        Re: save dictionary to a file without brackets. Chris Angelico <rosuav@gmail.com> - 2012-08-10 09:01 +1000
        Re: save dictionary to a file without brackets. Dave Angel <d@davea.name> - 2012-08-09 19:42 -0400
        Re: save dictionary to a file without brackets. 88888 Dihedral <dihedral88888@googlemail.com> - 2012-08-09 22:35 -0700
    Re: save dictionary to a file without brackets. Terry Reedy <tjreedy@udel.edu> - 2012-08-09 17:46 -0400
    Re: save dictionary to a file without brackets. Dave Angel <d@davea.name> - 2012-08-09 17:47 -0400
      Re: save dictionary to a file without brackets. 88888 Dihedral <dihedral88888@googlemail.com> - 2012-08-10 10:46 -0700
      Re: save dictionary to a file without brackets. 88888 Dihedral <dihedral88888@googlemail.com> - 2012-08-10 10:46 -0700
    Re: save dictionary to a file without brackets. Chris Kaynor <ckaynor@zindagigames.com> - 2012-08-09 14:49 -0700
    Re: save dictionary to a file without brackets. Chris Kaynor <ckaynor@zindagigames.com> - 2012-08-09 14:51 -0700
    Re: save dictionary to a file without brackets. Giuseppe Amatulli <giuseppe.amatulli@gmail.com> - 2012-08-09 16:53 -0500
    Re: save dictionary to a file without brackets. Roman Vashkevich <vashkevichrb@gmail.com> - 2012-08-10 02:02 +0400

Page 3 of 3 — ← Prev page 1 2 [3]


#26827

FromChris Kaynor <ckaynor@zindagigames.com>
Date2012-08-09 15:37 -0700
Message-ID<mailman.3131.1344551903.4697.python-list@python.org>
In reply to#26824
On Thu, Aug 9, 2012 at 3:26 PM, Dave Angel <d@davea.name> wrote:
> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
>> On 09/08/2012 22:34, Roman Vashkevich wrote:
>>> Actually, they are different.
>>> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
>>> Dict uses hashing to get a value from the dict and this is why it's O(1).
>>>
>> Sligtly off topic, but looking up a value in a dictionary is actually
>> O(n) for all other entries in the dict which suffer a hash collision
>> with the searched entry.
>>
>> True, a sensible choice of hash function will reduce n to 1 in common
>> cases, but it becomes an important consideration for larger datasets.
>>
>> ~Andrew
>
> I'm glad you're wrong for CPython's dictionaries.  The only time the
> lookup would degenerate to O[n] would be if the hash table had only one
> slot.  CPython sensibly increases the hash table size when it becomes
> too small for efficiency.
>
>
> Where have you seen dictionaries so poorly implemented?

There are plenty of ways to make a pathological hash function that
will have that issue in CPython.

The very simple (and stupid):

class O(object):
    def __hash__(self):
        return 0
    def __eq__(self, other): # I am aware this is the default equals method.
        return self is other

Start adding those to a dictionary to get O(n) lookups.

Any case the hash return values modulus the dictionary hash table size
is constant will have similar results; powers of 2 are likely to
result in such behavior as well.

>
> --
>
> DaveA
>
> --
> http://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [next] | [standalone]


#26828

FromChris Angelico <rosuav@gmail.com>
Date2012-08-10 08:53 +1000
Message-ID<mailman.3132.1344552827.4697.python-list@python.org>
In reply to#26824
On Fri, Aug 10, 2012 at 8:26 AM, Dave Angel <d@davea.name> wrote:
> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
>> O(n) for all other entries in the dict which suffer a hash collision
>> with the searched entry.
>>
>> True, a sensible choice of hash function will reduce n to 1 in common
>> cases, but it becomes an important consideration for larger datasets.
>
> I'm glad you're wrong for CPython's dictionaries.  The only time the
> lookup would degenerate to O[n] would be if the hash table had only one
> slot.  CPython sensibly increases the hash table size when it becomes
> too small for efficiency.
>
> Where have you seen dictionaries so poorly implemented?

In vanilla CPython up to version (I think) 3.3, where it's possible to
DoS the hash generator. Hash collisions are always possible, just
ridiculously unlikely unless deliberately exploited.

(And yes, I know an option was added to older versions to randomize
the hashes there too. It's not active by default, so "vanilla CPython"
is still vulnerable.)

ChrisA

[toc] | [prev] | [next] | [standalone]


#26919

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-08-11 11:26 +0000
Message-ID<5026416d$0$29978$c3e8da3$5496439d@news.astraweb.com>
In reply to#26828
On Fri, 10 Aug 2012 08:53:43 +1000, Chris Angelico wrote:

> On Fri, Aug 10, 2012 at 8:26 AM, Dave Angel <d@davea.name> wrote:
>> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
>>> O(n) for all other entries in the dict which suffer a hash collision
>>> with the searched entry.
>>>
>>> True, a sensible choice of hash function will reduce n to 1 in common
>>> cases, but it becomes an important consideration for larger datasets.
>>
>> I'm glad you're wrong for CPython's dictionaries.  The only time the
>> lookup would degenerate to O[n] would be if the hash table had only one
>> slot.  CPython sensibly increases the hash table size when it becomes
>> too small for efficiency.
>>
>> Where have you seen dictionaries so poorly implemented?
> 
> In vanilla CPython up to version (I think) 3.3, where it's possible to
> DoS the hash generator. Hash collisions are always possible, just
> ridiculously unlikely unless deliberately exploited.

Not so unlikely actually.

py> hash(3)
3
py> hash(2**64 + 2)
3

py> hash(-1)
-2
py> hash(-2)
-2


By its very nature, a hash function cannot fail to have collisions. The 
problem is that in general you have an essentially unlimited number of 
objects being mapped to a large but still finite number of hash values.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#26943

From88888 Dihedral <dihedral88888@googlemail.com>
Date2012-08-12 04:59 -0700
Message-ID<a5bede18-ffa7-47d4-93c4-cd9d04d6587f@googlegroups.com>
In reply to#26919
Steven D'Aprano於 2012年8月11日星期六UTC+8下午7時26分37秒寫道:
> On Fri, 10 Aug 2012 08:53:43 +1000, Chris Angelico wrote:
> 
> 
> 
> > On Fri, Aug 10, 2012 at 8:26 AM, Dave Angel <d@davea.name> wrote:
> 
> >> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
> 
> >>> O(n) for all other entries in the dict which suffer a hash collision
> 
> >>> with the searched entry.
> 
> >>>
> 
> >>> True, a sensible choice of hash function will reduce n to 1 in common
> 
> >>> cases, but it becomes an important consideration for larger datasets.
> 
> >>
> 
> >> I'm glad you're wrong for CPython's dictionaries.  The only time the
> 
> >> lookup would degenerate to O[n] would be if the hash table had only one
> 
> >> slot.  CPython sensibly increases the hash table size when it becomes
> 
> >> too small for efficiency.
> 
> >>
> 
> >> Where have you seen dictionaries so poorly implemented?
> 
> > 
> 
> > In vanilla CPython up to version (I think) 3.3, where it's possible to
> 
> > DoS the hash generator. Hash collisions are always possible, just
> 
> > ridiculously unlikely unless deliberately exploited.
> 
> 
> 
> Not so unlikely actually.
> 
> 
> 
> py> hash(3)
> 
> 3
> 
> py> hash(2**64 + 2)
> 
> 3
> 
> 
> 
> py> hash(-1)
> 
> -2
> 
> py> hash(-2)
> 
> -2
> 
> 
> 
> 
> 
> By its very nature, a hash function cannot fail to have collisions. The 
> 
> problem is that in general you have an essentially unlimited number of 
> 
> objects being mapped to a large but still finite number of hash values.
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> Steven



Steven D'Aprano於 2012年8月11日星期六UTC+8下午7時26分37秒寫道:
> On Fri, 10 Aug 2012 08:53:43 +1000, Chris Angelico wrote:
> 
> 
> 
> > On Fri, Aug 10, 2012 at 8:26 AM, Dave Angel <d@davea.name> wrote:
> 
> >> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
> 
> >>> O(n) for all other entries in the dict which suffer a hash collision
> 
> >>> with the searched entry.
> 
> >>>
> 
> >>> True, a sensible choice of hash function will reduce n to 1 in common
> 
> >>> cases, but it becomes an important consideration for larger datasets.
> 
> >>
> 
> >> I'm glad you're wrong for CPython's dictionaries.  The only time the
> 
> >> lookup would degenerate to O[n] would be if the hash table had only one
> 
> >> slot.  CPython sensibly increases the hash table size when it becomes
> 
> >> too small for efficiency.
> 
> >>
> 
> >> Where have you seen dictionaries so poorly implemented?
> 
> > 
> 
> > In vanilla CPython up to version (I think) 3.3, where it's possible to
> 
> > DoS the hash generator. Hash collisions are always possible, just
> 
> > ridiculously unlikely unless deliberately exploited.
> 
> 
> 
> Not so unlikely actually.
> 
> 
> 
> py> hash(3)
> 
> 3
> 
> py> hash(2**64 + 2)
> 
> 3
> 
> 
> 
> py> hash(-1)
> 
> -2
> 
> py> hash(-2)
> 
> -2
> 
> 
> 
> 
> 
> By its very nature, a hash function cannot fail to have collisions. The 
> 
> problem is that in general you have an essentially unlimited number of 
> 
> objects being mapped to a large but still finite number of hash values.
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> Steven

Lets check the basic operations of a hash table or so-called a dictionary first.

If the dictionary is implemented toward faster in searching items, 
then it is slightly different in the insertion and the deletion operations of
(key, value) pairs.

[toc] | [prev] | [next] | [standalone]


#26830

FromChris Angelico <rosuav@gmail.com>
Date2012-08-10 09:01 +1000
Message-ID<mailman.3133.1344553280.4697.python-list@python.org>
In reply to#26824
On Fri, Aug 10, 2012 at 8:39 AM, Tim Chase
<python.list@tim.thechases.com> wrote:
> On 08/09/12 17:26, Dave Angel wrote:
>> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
>> I'm glad you're wrong for CPython's dictionaries.  The only time the
>> lookup would degenerate to O[n] would be if the hash table had only one
>> slot.  CPython sensibly increases the hash table size when it becomes
>> too small for efficiency.
>>
>> Where have you seen dictionaries so poorly implemented?
>
> PHP?
>
> http://www.phpclasses.org/blog/post/171-PHP-Vulnerability-May-Halt-Millions-of-Servers.html

That's the same hash collision attack that I alluded to above, and it
strikes *many* language implementations. Most released a patch fairly
quickly and quietly (Pike, Lua, V8 (JavaScript/ECMAScript), PHP), but
CPython dared not, on account of various applications depending on
hash order (at least for tests). It's not (for once) an indictment of
PHP (maybe that should be an "inarrayment"?), it's a consequence of a
hashing algorithm that favored simplicity over cryptographic
qualities.

(It feels weird to be defending PHP...)

ChrisA

[toc] | [prev] | [next] | [standalone]


#26836

FromDave Angel <d@davea.name>
Date2012-08-09 19:42 -0400
Message-ID<mailman.3138.1344555793.4697.python-list@python.org>
In reply to#26824
On 08/09/2012 06:53 PM, Chris Angelico wrote:
> On Fri, Aug 10, 2012 at 8:26 AM, Dave Angel <d@davea.name> wrote:
>> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
>>> O(n) for all other entries in the dict which suffer a hash collision
>>> with the searched entry.
>>>
>>> True, a sensible choice of hash function will reduce n to 1 in common
>>> cases, but it becomes an important consideration for larger datasets.
>> I'm glad you're wrong for CPython's dictionaries.  The only time the
>> lookup would degenerate to O[n] would be if the hash table had only one
>> slot.  CPython sensibly increases the hash table size when it becomes
>> too small for efficiency.
>>
>> Where have you seen dictionaries so poorly implemented?
> In vanilla CPython up to version (I think) 3.3, where it's possible to
> DoS the hash generator. Hash collisions are always possible, just
> ridiculously unlikely unless deliberately exploited.
>
> (And yes, I know an option was added to older versions to randomize
> the hashes there too. It's not active by default, so "vanilla CPython"
> is still vulnerable.)
>
> ChrisA

Thank you to you and others, who have corrected my over-general
response.  I was not intending to claim anything about either a
deliberate DOS, nor a foolishly chosen hash function.  But the message I
was replying to seemed to me to claim that for large datasets, even a
good hash algorithm would end up giving O(n) performance.



-- 

DaveA

[toc] | [prev] | [next] | [standalone]


#26840

From88888 Dihedral <dihedral88888@googlemail.com>
Date2012-08-09 22:35 -0700
Message-ID<ae9568e5-2053-4997-afc3-203b492a8295@googlegroups.com>
In reply to#26824
Andrew Cooper於 2012年8月10日星期五UTC+8上午6時03分26秒寫道:
> On 09/08/2012 22:34, Roman Vashkevich wrote:
> 
> > Actually, they are different.
> 
> > Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
> 
> > Dict uses hashing to get a value from the dict and this is why it's O(1).
> 
> > 
> 
> 
> 
> Sligtly off topic, but looking up a value in a dictionary is actually
> 
> O(n) for all other entries in the dict which suffer a hash collision
> 
> with the searched entry.
> 
> 
> 
> True, a sensible choice of hash function will reduce n to 1 in common
> 
> cases, but it becomes an important consideration for larger datasets.
> 
> 
> 
> ~Andrew

This is the classical problem of storing the hash collision items one by one.
Those items should be stored by some order.

[toc] | [prev] | [next] | [standalone]


#26818

FromTerry Reedy <tjreedy@udel.edu>
Date2012-08-09 17:46 -0400
Message-ID<mailman.3122.1344548828.4697.python-list@python.org>
In reply to#26806
On 8/9/2012 5:21 PM, Tim Chase wrote:
> On 08/09/12 15:41, Roman Vashkevich wrote:
>> 10.08.2012, в 0:35, Tim Chase написал(а):
>>> On 08/09/12 15:22, Roman Vashkevich wrote:
>>>>> {(4, 5): 1, (5, 4): 1, (4, 4): 2, (2, 3): 1, (4, 3): 2}
>>>>> and i want to print to a file without the brackets comas and semicolon in order to obtain something like this?
>>>>> 4 5 1
>>>>> 5 4 1
>>>>> 4 4 2
>>>>> 2 3 1
>>>>> 4 3 2
>>>>
>>>> for key in dict:
>>>> 	print key[0], key[1], dict[key]
>>>
>>> This might read more cleanly with tuple unpacking:
>>>
>>>   for (edge1, edge2), cost in d.iteritems(): # or .items()
>>>     print edge1, edge2, cost
>>>
>>> (I'm making the assumption that this is a edge/cost graph...use
>>> appropriate names according to what they actually mean)
>>
>> dict.items() is a list - linear access time whereas with 'for
>> key in dict:' access time is constant:
>> http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html#use-in-where-possible-1
>
> That link doesn't actually discuss dict.{iter}items()
>
> Both are O(N) because you have to touch each item in the dict--you
> can't iterate over N entries in less than O(N) time.  For small
> data-sets, building the list and then iterating over it may be
> faster faster; for larger data-sets, the cost of building the list
> overshadows the (minor) overhead of a generator.  Either way, the
> iterate-and-fetch-the-associated-value of .items() & .iteritems()
> can (should?) be optimized in Python's internals to the point I
> wouldn't think twice about using the more readable version.

In 3.x, .keys, .values, and .items are set-like read-only views 
specifically designed for iteration. So in 3.x they are THE way to do so 
for whichever alternative is appropriate. Iterating by keys and then 
looking up values instead of yielding the values at the same time is 
extra work.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#26819

FromDave Angel <d@davea.name>
Date2012-08-09 17:47 -0400
Message-ID<mailman.3123.1344548888.4697.python-list@python.org>
In reply to#26806
On 08/09/2012 05:34 PM, Roman Vashkevich wrote:
> Actually, they are different.
> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
> Dict uses hashing to get a value from the dict and this is why it's O(1).

Sure, that's why

for key in dict:
	print key[0], key[1], dict[key]

is probably slower than

for (edge1, edge2), cost in d.iteritems(): # or .items()
  print edge1, edge2, cost


So, the latter is both faster and easier to read.  Why are you arguing against it?

Also, please stop top-posting.  It's impolite here, and makes it much harder to figure out who is saying what, in what order.



-- 

DaveA

[toc] | [prev] | [next] | [standalone]


#26884

From88888 Dihedral <dihedral88888@googlemail.com>
Date2012-08-10 10:46 -0700
Message-ID<mailman.3171.1344620771.4697.python-list@python.org>
In reply to#26819
Dave Angel於 2012年8月10日星期五UTC+8上午5時47分45秒寫道:
> On 08/09/2012 05:34 PM, Roman Vashkevich wrote:
> 
> > Actually, they are different.
> 
> > Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
> 
> > Dict uses hashing to get a value from the dict and this is why it's O(1).
> 
> 
> 
> Sure, that's why
> 
> 
> 
> for key in dict:
> 
> 	print key[0], key[1], dict[key]
> 
> 
> 
> is probably slower than
> 
> 
> 
> for (edge1, edge2), cost in d.iteritems(): # or .items()
> 
>   print edge1, edge2, cost
> 
> 
> 
> 
> 
> So, the latter is both faster and easier to read.  Why are you arguing against it?
> 
> 
> 
> Also, please stop top-posting.  It's impolite here, and makes it much harder to figure out who is saying what, in what order.
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> 
> 
> DaveA

OK, lets estimate the hash colision rate first.

For those items hashed to the same key, I'll store a sorted list with a
known lenth m to be accessed in O(LOG(M)). 

Of couse another hash can be attatched.

[toc] | [prev] | [next] | [standalone]


#26885

From88888 Dihedral <dihedral88888@googlemail.com>
Date2012-08-10 10:46 -0700
Message-ID<0786dd35-b624-4dfc-bbbb-a3e33aa43ae1@googlegroups.com>
In reply to#26819
Dave Angel於 2012年8月10日星期五UTC+8上午5時47分45秒寫道:
> On 08/09/2012 05:34 PM, Roman Vashkevich wrote:
> 
> > Actually, they are different.
> 
> > Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
> 
> > Dict uses hashing to get a value from the dict and this is why it's O(1).
> 
> 
> 
> Sure, that's why
> 
> 
> 
> for key in dict:
> 
> 	print key[0], key[1], dict[key]
> 
> 
> 
> is probably slower than
> 
> 
> 
> for (edge1, edge2), cost in d.iteritems(): # or .items()
> 
>   print edge1, edge2, cost
> 
> 
> 
> 
> 
> So, the latter is both faster and easier to read.  Why are you arguing against it?
> 
> 
> 
> Also, please stop top-posting.  It's impolite here, and makes it much harder to figure out who is saying what, in what order.
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> 
> 
> DaveA

OK, lets estimate the hash colision rate first.

For those items hashed to the same key, I'll store a sorted list with a
known lenth m to be accessed in O(LOG(M)). 

Of couse another hash can be attatched.

[toc] | [prev] | [next] | [standalone]


#26820

FromChris Kaynor <ckaynor@zindagigames.com>
Date2012-08-09 14:49 -0700
Message-ID<mailman.3124.1344548966.4697.python-list@python.org>
In reply to#26806
On Thu, Aug 9, 2012 at 2:34 PM, Roman Vashkevich <vashkevichrb@gmail.com> wrote:
>
> Actually, they are different.
> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
> Dict uses hashing to get a value from the dict and this is why it's O(1).
>

Using "in" as an operator such as: "if key in dict" or "result = key
in dict" is O(1) as you say. Iterating on the dictionary requires
touching every item, and so is O(n), even though it also using "in" in
the command.

Here are a few quick timing tests I just ran with Python 2.6:

>>> timeit.timeit('for i in d: pass', 'd=dict.fromkeys(range(1))')
0.078683853332734088
>>> timeit.timeit('for i in d: pass', 'd=dict.fromkeys(range(10))')
0.17451784110969015
>>> timeit.timeit('for i in d: pass', 'd=dict.fromkeys(range(100))')
1.1708168159579486

>>> timeit.timeit('for i in d.iteritems(): pass', 'd=dict.fromkeys(range(1))')
0.14186911440299355
>>> timeit.timeit('for i in d.iteritems(): pass', 'd=dict.fromkeys(range(10))')
0.33836512561802579
>>> timeit.timeit('for i in d.iteritems(): pass', 'd=dict.fromkeys(range(100))')
2.2544262854249268

>>> timeit.timeit('for i in d: v=d[i]', 'd=dict.fromkeys(range(1))')
0.10009793211446549
>>> timeit.timeit('for i in d: v=d[i]', 'd=dict.fromkeys(range(10))')
0.38825072496723578
>>> timeit.timeit('for i in d: v=d[i]', 'd=dict.fromkeys(range(100))')
3.3020098061049339


As can be seen here, a 1-item dictionary iterated in 0.07 seconds, 10
items in 0.17 seconds, and 100 items in 1.17 seconds. That is fairly
close to linear, especially when considering the overhead of a
complete no-op

Using iteritems, it appears to actually scale slightly better than
linear, though it is slower than just the plain iteration.

Doing a plain iteration, then looking up the keys to get the values
also appears to be linear, and is even slower than iteritems.

[toc] | [prev] | [next] | [standalone]


#26821

FromChris Kaynor <ckaynor@zindagigames.com>
Date2012-08-09 14:51 -0700
Message-ID<mailman.3125.1344549099.4697.python-list@python.org>
In reply to#26806
I realized, I should have done 10, 100, 1000 rather than 1, 10, 100
for better results, so here are the results for 1000 items. It still
maintains the same pattern:

>>> timeit.timeit('for i in d: pass', 'd=dict.fromkeys(range(1000))')
10.166595947685153
>>> timeit.timeit('for i in d.iteritems(): pass', 'd=dict.fromkeys(range(1000))')
19.922474218828711
>>> timeit.timeit('for i in d: v=d[i]', 'd=dict.fromkeys(range(1000))')
31.007666660415282

Chris

On Thu, Aug 9, 2012 at 2:49 PM, Chris Kaynor <ckaynor@zindagigames.com> wrote:
> On Thu, Aug 9, 2012 at 2:34 PM, Roman Vashkevich <vashkevichrb@gmail.com> wrote:
>>
>> Actually, they are different.
>> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
>> Dict uses hashing to get a value from the dict and this is why it's O(1).
>>
>
> Using "in" as an operator such as: "if key in dict" or "result = key
> in dict" is O(1) as you say. Iterating on the dictionary requires
> touching every item, and so is O(n), even though it also using "in" in
> the command.
>
> Here are a few quick timing tests I just ran with Python 2.6:
>
>>>> timeit.timeit('for i in d: pass', 'd=dict.fromkeys(range(1))')
> 0.078683853332734088
>>>> timeit.timeit('for i in d: pass', 'd=dict.fromkeys(range(10))')
> 0.17451784110969015
>>>> timeit.timeit('for i in d: pass', 'd=dict.fromkeys(range(100))')
> 1.1708168159579486
>
>>>> timeit.timeit('for i in d.iteritems(): pass', 'd=dict.fromkeys(range(1))')
> 0.14186911440299355
>>>> timeit.timeit('for i in d.iteritems(): pass', 'd=dict.fromkeys(range(10))')
> 0.33836512561802579
>>>> timeit.timeit('for i in d.iteritems(): pass', 'd=dict.fromkeys(range(100))')
> 2.2544262854249268
>
>>>> timeit.timeit('for i in d: v=d[i]', 'd=dict.fromkeys(range(1))')
> 0.10009793211446549
>>>> timeit.timeit('for i in d: v=d[i]', 'd=dict.fromkeys(range(10))')
> 0.38825072496723578
>>>> timeit.timeit('for i in d: v=d[i]', 'd=dict.fromkeys(range(100))')
> 3.3020098061049339
>
>
> As can be seen here, a 1-item dictionary iterated in 0.07 seconds, 10
> items in 0.17 seconds, and 100 items in 1.17 seconds. That is fairly
> close to linear, especially when considering the overhead of a
> complete no-op
>
> Using iteritems, it appears to actually scale slightly better than
> linear, though it is slower than just the plain iteration.
>
> Doing a plain iteration, then looking up the keys to get the values
> also appears to be linear, and is even slower than iteritems.

[toc] | [prev] | [next] | [standalone]


#26822

FromGiuseppe Amatulli <giuseppe.amatulli@gmail.com>
Date2012-08-09 16:53 -0500
Message-ID<mailman.3127.1344549233.4697.python-list@python.org>
In reply to#26806
Thanks a lot for the clarification.
Actually my problem is giving to raster dataset in geo-tif format find out
unique pair combination, count the number of observation
unique combination in rast1, count the number of observation
unique combination in rast2, count the number of observation

I try different solution and this seems to me the faster


Rast00=dsRast00.GetRasterBand(1).ReadAsArray()
Rast10=dsRast10.GetRasterBand(1).ReadAsArray()

mask=( Rast00 != 0 ) & ( Rast10 != 0  )  # may be this masking
operation can be included in the for loop

Rast00_mask= Rast00[mask]                # may be this masking
operation can be included in the for loop
Rast10_mask= Rast10[mask]                # may be this masking
operation can be included in the for loop

array2D = np.array(zip( Rast00_mask,Rast10_mask))

unique_u=dict()
unique_k1=dict()
unique_k2=dict()

for key1,key2 in  array2D :
    row = tuple((key1,key2))
    if row in unique_u:
        unique_u[row] += 1
    else:
        unique_u[row] = 1
    if key1 in unique_k1:
        unique_k1[key1] += 1
    else:
        unique_k1[key1] = 1
    if key2 in unique_k2:
        unique_k2[key2] += 1
    else:
        unique_k2[key2] = 1

output = open(dst_file_rast0010, "w")
for (a, b), c in unique_u.items():
    print(a, b, c, file=output)
output.close()

output = open(dst_file_rast00, "w")
for (a), b in unique_k1.items():
    print(a, b, file=output)
output.close()

output = open(dst_file_rast10, "w")
for (a), b in unique_k2.items():
    print(a, b, file=output)
output.close()

What do you think? is there a way to speed up the process?
Thanks
Giuseppe





On 9 August 2012 16:34, Roman Vashkevich <vashkevichrb@gmail.com> wrote:
> Actually, they are different.
> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
> Dict uses hashing to get a value from the dict and this is why it's O(1).
>
> 10.08.2012, в 1:21, Tim Chase написал(а):
>
>> On 08/09/12 15:41, Roman Vashkevich wrote:
>>> 10.08.2012, в 0:35, Tim Chase написал(а):
>>>> On 08/09/12 15:22, Roman Vashkevich wrote:
>>>>>> {(4, 5): 1, (5, 4): 1, (4, 4): 2, (2, 3): 1, (4, 3): 2}
>>>>>> and i want to print to a file without the brackets comas and semicolon in order to obtain something like this?
>>>>>> 4 5 1
>>>>>> 5 4 1
>>>>>> 4 4 2
>>>>>> 2 3 1
>>>>>> 4 3 2
>>>>>
>>>>> for key in dict:
>>>>>    print key[0], key[1], dict[key]
>>>>
>>>> This might read more cleanly with tuple unpacking:
>>>>
>>>> for (edge1, edge2), cost in d.iteritems(): # or .items()
>>>>   print edge1, edge2, cost
>>>>
>>>> (I'm making the assumption that this is a edge/cost graph...use
>>>> appropriate names according to what they actually mean)
>>>
>>> dict.items() is a list - linear access time whereas with 'for
>>> key in dict:' access time is constant:
>>> http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html#use-in-where-possible-1
>>
>> That link doesn't actually discuss dict.{iter}items()
>>
>> Both are O(N) because you have to touch each item in the dict--you
>> can't iterate over N entries in less than O(N) time.  For small
>> data-sets, building the list and then iterating over it may be
>> faster faster; for larger data-sets, the cost of building the list
>> overshadows the (minor) overhead of a generator.  Either way, the
>> iterate-and-fetch-the-associated-value of .items() & .iteritems()
>> can (should?) be optimized in Python's internals to the point I
>> wouldn't think twice about using the more readable version.
>>
>> -tkc
>>
>>
>



-- 
Giuseppe Amatulli
Web: www.spatial-ecology.net

[toc] | [prev] | [next] | [standalone]


#26823

FromRoman Vashkevich <vashkevichrb@gmail.com>
Date2012-08-10 02:02 +0400
Message-ID<mailman.3128.1344549727.4697.python-list@python.org>
In reply to#26806
10.08.2012, в 1:47, Dave Angel написал(а):

> On 08/09/2012 05:34 PM, Roman Vashkevich wrote:
>> Actually, they are different.
>> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
>> Dict uses hashing to get a value from the dict and this is why it's O(1).
> 
> Sure, that's why
> 
> for key in dict:
> 	print key[0], key[1], dict[key]
> 
> is probably slower than
> 
> for (edge1, edge2), cost in d.iteritems(): # or .items()
>  print edge1, edge2, cost
> 
> 
> So, the latter is both faster and easier to read.  Why are you arguing against it?
> 
> Also, please stop top-posting.  It's impolite here, and makes it much harder to figure out who is saying what, in what order.
> 
> 
> 
> -- 
> 
> DaveA
> 

I'm not arguing at all. Sorry if it sounded like I was arguing.
Thanks for notifying me of the way messages should be sent.

Roman

[toc] | [prev] | [standalone]


Page 3 of 3 — ← Prev page 1 2 [3]

Back to top | Article view | comp.lang.python


csiph-web