Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #26806 > unrolled thread
| Started by | giuseppe.amatulli@gmail.com |
|---|---|
| First post | 2012-08-09 13:11 -0700 |
| Last post | 2012-08-10 02:02 +0400 |
| Articles | 15 on this page of 55 — 18 participants |
Back to article view | Back to comp.lang.python
save dictionary to a file without brackets. giuseppe.amatulli@gmail.com - 2012-08-09 13:11 -0700
Re: save dictionary to a file without brackets. Roman Vashkevich <vashkevichrb@gmail.com> - 2012-08-10 00:22 +0400
Re: save dictionary to a file without brackets. Tim Chase <python.list@tim.thechases.com> - 2012-08-09 15:35 -0500
Re: save dictionary to a file without brackets. Giuseppe Amatulli <giuseppe.amatulli@gmail.com> - 2012-08-09 15:35 -0500
Re: save dictionary to a file without brackets. Gelonida N <gelonida@gmail.com> - 2012-08-09 22:35 +0200
Re: save dictionary to a file without brackets. Giuseppe Amatulli <giuseppe.amatulli@gmail.com> - 2012-08-09 15:38 -0500
Re: save dictionary to a file without brackets. Roman Vashkevich <vashkevichrb@gmail.com> - 2012-08-10 00:41 +0400
Re: save dictionary to a file without brackets. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-09 22:17 +0100
Re: save dictionary to a file without brackets. Tim Chase <python.list@tim.thechases.com> - 2012-08-09 16:21 -0500
Re: save dictionary to a file without brackets. Roman Vashkevich <vashkevichrb@gmail.com> - 2012-08-10 01:34 +0400
Re: save dictionary to a file without brackets. Andrew Cooper <amc96@cam.ac.uk> - 2012-08-09 23:03 +0100
Re: save dictionary to a file without brackets. Dave Angel <d@davea.name> - 2012-08-09 18:26 -0400
Re: save dictionary to a file without brackets. Andrew Cooper <amc96@cam.ac.uk> - 2012-08-09 23:54 +0100
Re: save dictionary to a file without brackets. Roy Smith <roy@panix.com> - 2012-08-09 19:05 -0400
Re: save dictionary to a file without brackets. Chris Angelico <rosuav@gmail.com> - 2012-08-10 09:14 +1000
Re: save dictionary to a file without brackets. Roy Smith <roy@panix.com> - 2012-08-09 19:24 -0400
Re: save dictionary to a file without brackets. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-10 00:33 +0100
Re: save dictionary to a file without brackets. Tim Chase <python.list@tim.thechases.com> - 2012-08-09 19:16 -0500
Re: save dictionary to a file without brackets. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-10 08:54 +0000
Re: save dictionary to a file without brackets. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-10 10:37 +0100
Re: save dictionary to a file without brackets. Roy Smith <roy@panix.com> - 2012-08-10 08:29 -0400
Re: save dictionary to a file without brackets. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-10 15:47 +0100
Re: save dictionary to a file without brackets. alex23 <wuwei23@gmail.com> - 2012-08-12 17:15 -0700
Re: save dictionary to a file without brackets. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-13 08:05 +0000
Re: save dictionary to a file without brackets. alex23 <wuwei23@gmail.com> - 2012-08-13 09:16 -0700
Re: save dictionary to a file without brackets. rusi <rustompmody@gmail.com> - 2012-08-13 08:55 -0700
Re: save dictionary to a file without brackets. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-13 13:37 +0100
Re: save dictionary to a file without brackets. alex23 <wuwei23@gmail.com> - 2012-08-13 09:14 -0700
Re: save dictionary to a file without brackets. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-13 18:07 +0100
Re: save dictionary to a file without brackets. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-14 02:54 +0000
Re: save dictionary to a file without brackets. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-14 12:33 +0100
Re: save dictionary to a file without brackets. Alister <alister.ware@ntlworld.com> - 2012-08-14 15:05 +0000
Re: save dictionary to a file without brackets. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-14 19:15 +0100
Re: save dictionary to a file without brackets. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-08-13 18:43 +0100
Re: save dictionary to a file without brackets. alex23 <wuwei23@gmail.com> - 2012-08-13 18:32 -0700
Re: save dictionary to a file without brackets. Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-08-10 12:34 -0400
Re: save dictionary to a file without brackets. Dave Angel <d@davea.name> - 2012-08-09 20:27 -0400
Re: save dictionary to a file without brackets. Chris Angelico <rosuav@gmail.com> - 2012-08-10 10:31 +1000
Re: save dictionary to a file without brackets. Dave Angel <d@davea.name> - 2012-08-09 19:38 -0400
Re: save dictionary to a file without brackets. Tim Chase <python.list@tim.thechases.com> - 2012-08-09 17:39 -0500
Re: save dictionary to a file without brackets. Chris Kaynor <ckaynor@zindagigames.com> - 2012-08-09 15:37 -0700
Re: save dictionary to a file without brackets. Chris Angelico <rosuav@gmail.com> - 2012-08-10 08:53 +1000
Re: save dictionary to a file without brackets. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-08-11 11:26 +0000
Re: save dictionary to a file without brackets. 88888 Dihedral <dihedral88888@googlemail.com> - 2012-08-12 04:59 -0700
Re: save dictionary to a file without brackets. Chris Angelico <rosuav@gmail.com> - 2012-08-10 09:01 +1000
Re: save dictionary to a file without brackets. Dave Angel <d@davea.name> - 2012-08-09 19:42 -0400
Re: save dictionary to a file without brackets. 88888 Dihedral <dihedral88888@googlemail.com> - 2012-08-09 22:35 -0700
Re: save dictionary to a file without brackets. Terry Reedy <tjreedy@udel.edu> - 2012-08-09 17:46 -0400
Re: save dictionary to a file without brackets. Dave Angel <d@davea.name> - 2012-08-09 17:47 -0400
Re: save dictionary to a file without brackets. 88888 Dihedral <dihedral88888@googlemail.com> - 2012-08-10 10:46 -0700
Re: save dictionary to a file without brackets. 88888 Dihedral <dihedral88888@googlemail.com> - 2012-08-10 10:46 -0700
Re: save dictionary to a file without brackets. Chris Kaynor <ckaynor@zindagigames.com> - 2012-08-09 14:49 -0700
Re: save dictionary to a file without brackets. Chris Kaynor <ckaynor@zindagigames.com> - 2012-08-09 14:51 -0700
Re: save dictionary to a file without brackets. Giuseppe Amatulli <giuseppe.amatulli@gmail.com> - 2012-08-09 16:53 -0500
Re: save dictionary to a file without brackets. Roman Vashkevich <vashkevichrb@gmail.com> - 2012-08-10 02:02 +0400
Page 3 of 3 — ← Prev page 1 2 [3]
| From | Chris Kaynor <ckaynor@zindagigames.com> |
|---|---|
| Date | 2012-08-09 15:37 -0700 |
| Message-ID | <mailman.3131.1344551903.4697.python-list@python.org> |
| In reply to | #26824 |
On Thu, Aug 9, 2012 at 3:26 PM, Dave Angel <d@davea.name> wrote:
> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
>> On 09/08/2012 22:34, Roman Vashkevich wrote:
>>> Actually, they are different.
>>> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
>>> Dict uses hashing to get a value from the dict and this is why it's O(1).
>>>
>> Sligtly off topic, but looking up a value in a dictionary is actually
>> O(n) for all other entries in the dict which suffer a hash collision
>> with the searched entry.
>>
>> True, a sensible choice of hash function will reduce n to 1 in common
>> cases, but it becomes an important consideration for larger datasets.
>>
>> ~Andrew
>
> I'm glad you're wrong for CPython's dictionaries. The only time the
> lookup would degenerate to O[n] would be if the hash table had only one
> slot. CPython sensibly increases the hash table size when it becomes
> too small for efficiency.
>
>
> Where have you seen dictionaries so poorly implemented?
There are plenty of ways to make a pathological hash function that
will have that issue in CPython.
The very simple (and stupid):
class O(object):
def __hash__(self):
return 0
def __eq__(self, other): # I am aware this is the default equals method.
return self is other
Start adding those to a dictionary to get O(n) lookups.
Any case the hash return values modulus the dictionary hash table size
is constant will have similar results; powers of 2 are likely to
result in such behavior as well.
>
> --
>
> DaveA
>
> --
> http://mail.python.org/mailman/listinfo/python-list
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2012-08-10 08:53 +1000 |
| Message-ID | <mailman.3132.1344552827.4697.python-list@python.org> |
| In reply to | #26824 |
On Fri, Aug 10, 2012 at 8:26 AM, Dave Angel <d@davea.name> wrote: > On 08/09/2012 06:03 PM, Andrew Cooper wrote: >> O(n) for all other entries in the dict which suffer a hash collision >> with the searched entry. >> >> True, a sensible choice of hash function will reduce n to 1 in common >> cases, but it becomes an important consideration for larger datasets. > > I'm glad you're wrong for CPython's dictionaries. The only time the > lookup would degenerate to O[n] would be if the hash table had only one > slot. CPython sensibly increases the hash table size when it becomes > too small for efficiency. > > Where have you seen dictionaries so poorly implemented? In vanilla CPython up to version (I think) 3.3, where it's possible to DoS the hash generator. Hash collisions are always possible, just ridiculously unlikely unless deliberately exploited. (And yes, I know an option was added to older versions to randomize the hashes there too. It's not active by default, so "vanilla CPython" is still vulnerable.) ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-08-11 11:26 +0000 |
| Message-ID | <5026416d$0$29978$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #26828 |
On Fri, 10 Aug 2012 08:53:43 +1000, Chris Angelico wrote: > On Fri, Aug 10, 2012 at 8:26 AM, Dave Angel <d@davea.name> wrote: >> On 08/09/2012 06:03 PM, Andrew Cooper wrote: >>> O(n) for all other entries in the dict which suffer a hash collision >>> with the searched entry. >>> >>> True, a sensible choice of hash function will reduce n to 1 in common >>> cases, but it becomes an important consideration for larger datasets. >> >> I'm glad you're wrong for CPython's dictionaries. The only time the >> lookup would degenerate to O[n] would be if the hash table had only one >> slot. CPython sensibly increases the hash table size when it becomes >> too small for efficiency. >> >> Where have you seen dictionaries so poorly implemented? > > In vanilla CPython up to version (I think) 3.3, where it's possible to > DoS the hash generator. Hash collisions are always possible, just > ridiculously unlikely unless deliberately exploited. Not so unlikely actually. py> hash(3) 3 py> hash(2**64 + 2) 3 py> hash(-1) -2 py> hash(-2) -2 By its very nature, a hash function cannot fail to have collisions. The problem is that in general you have an essentially unlimited number of objects being mapped to a large but still finite number of hash values. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | 88888 Dihedral <dihedral88888@googlemail.com> |
|---|---|
| Date | 2012-08-12 04:59 -0700 |
| Message-ID | <a5bede18-ffa7-47d4-93c4-cd9d04d6587f@googlegroups.com> |
| In reply to | #26919 |
Steven D'Aprano於 2012年8月11日星期六UTC+8下午7時26分37秒寫道: > On Fri, 10 Aug 2012 08:53:43 +1000, Chris Angelico wrote: > > > > > On Fri, Aug 10, 2012 at 8:26 AM, Dave Angel <d@davea.name> wrote: > > >> On 08/09/2012 06:03 PM, Andrew Cooper wrote: > > >>> O(n) for all other entries in the dict which suffer a hash collision > > >>> with the searched entry. > > >>> > > >>> True, a sensible choice of hash function will reduce n to 1 in common > > >>> cases, but it becomes an important consideration for larger datasets. > > >> > > >> I'm glad you're wrong for CPython's dictionaries. The only time the > > >> lookup would degenerate to O[n] would be if the hash table had only one > > >> slot. CPython sensibly increases the hash table size when it becomes > > >> too small for efficiency. > > >> > > >> Where have you seen dictionaries so poorly implemented? > > > > > > In vanilla CPython up to version (I think) 3.3, where it's possible to > > > DoS the hash generator. Hash collisions are always possible, just > > > ridiculously unlikely unless deliberately exploited. > > > > Not so unlikely actually. > > > > py> hash(3) > > 3 > > py> hash(2**64 + 2) > > 3 > > > > py> hash(-1) > > -2 > > py> hash(-2) > > -2 > > > > > > By its very nature, a hash function cannot fail to have collisions. The > > problem is that in general you have an essentially unlimited number of > > objects being mapped to a large but still finite number of hash values. > > > > > > > > -- > > Steven Steven D'Aprano於 2012年8月11日星期六UTC+8下午7時26分37秒寫道: > On Fri, 10 Aug 2012 08:53:43 +1000, Chris Angelico wrote: > > > > > On Fri, Aug 10, 2012 at 8:26 AM, Dave Angel <d@davea.name> wrote: > > >> On 08/09/2012 06:03 PM, Andrew Cooper wrote: > > >>> O(n) for all other entries in the dict which suffer a hash collision > > >>> with the searched entry. > > >>> > > >>> True, a sensible choice of hash function will reduce n to 1 in common > > >>> cases, but it becomes an important consideration for larger datasets. > > >> > > >> I'm glad you're wrong for CPython's dictionaries. The only time the > > >> lookup would degenerate to O[n] would be if the hash table had only one > > >> slot. CPython sensibly increases the hash table size when it becomes > > >> too small for efficiency. > > >> > > >> Where have you seen dictionaries so poorly implemented? > > > > > > In vanilla CPython up to version (I think) 3.3, where it's possible to > > > DoS the hash generator. Hash collisions are always possible, just > > > ridiculously unlikely unless deliberately exploited. > > > > Not so unlikely actually. > > > > py> hash(3) > > 3 > > py> hash(2**64 + 2) > > 3 > > > > py> hash(-1) > > -2 > > py> hash(-2) > > -2 > > > > > > By its very nature, a hash function cannot fail to have collisions. The > > problem is that in general you have an essentially unlimited number of > > objects being mapped to a large but still finite number of hash values. > > > > > > > > -- > > Steven Lets check the basic operations of a hash table or so-called a dictionary first. If the dictionary is implemented toward faster in searching items, then it is slightly different in the insertion and the deletion operations of (key, value) pairs.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2012-08-10 09:01 +1000 |
| Message-ID | <mailman.3133.1344553280.4697.python-list@python.org> |
| In reply to | #26824 |
On Fri, Aug 10, 2012 at 8:39 AM, Tim Chase <python.list@tim.thechases.com> wrote: > On 08/09/12 17:26, Dave Angel wrote: >> On 08/09/2012 06:03 PM, Andrew Cooper wrote: >> I'm glad you're wrong for CPython's dictionaries. The only time the >> lookup would degenerate to O[n] would be if the hash table had only one >> slot. CPython sensibly increases the hash table size when it becomes >> too small for efficiency. >> >> Where have you seen dictionaries so poorly implemented? > > PHP? > > http://www.phpclasses.org/blog/post/171-PHP-Vulnerability-May-Halt-Millions-of-Servers.html That's the same hash collision attack that I alluded to above, and it strikes *many* language implementations. Most released a patch fairly quickly and quietly (Pike, Lua, V8 (JavaScript/ECMAScript), PHP), but CPython dared not, on account of various applications depending on hash order (at least for tests). It's not (for once) an indictment of PHP (maybe that should be an "inarrayment"?), it's a consequence of a hashing algorithm that favored simplicity over cryptographic qualities. (It feels weird to be defending PHP...) ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <d@davea.name> |
|---|---|
| Date | 2012-08-09 19:42 -0400 |
| Message-ID | <mailman.3138.1344555793.4697.python-list@python.org> |
| In reply to | #26824 |
On 08/09/2012 06:53 PM, Chris Angelico wrote: > On Fri, Aug 10, 2012 at 8:26 AM, Dave Angel <d@davea.name> wrote: >> On 08/09/2012 06:03 PM, Andrew Cooper wrote: >>> O(n) for all other entries in the dict which suffer a hash collision >>> with the searched entry. >>> >>> True, a sensible choice of hash function will reduce n to 1 in common >>> cases, but it becomes an important consideration for larger datasets. >> I'm glad you're wrong for CPython's dictionaries. The only time the >> lookup would degenerate to O[n] would be if the hash table had only one >> slot. CPython sensibly increases the hash table size when it becomes >> too small for efficiency. >> >> Where have you seen dictionaries so poorly implemented? > In vanilla CPython up to version (I think) 3.3, where it's possible to > DoS the hash generator. Hash collisions are always possible, just > ridiculously unlikely unless deliberately exploited. > > (And yes, I know an option was added to older versions to randomize > the hashes there too. It's not active by default, so "vanilla CPython" > is still vulnerable.) > > ChrisA Thank you to you and others, who have corrected my over-general response. I was not intending to claim anything about either a deliberate DOS, nor a foolishly chosen hash function. But the message I was replying to seemed to me to claim that for large datasets, even a good hash algorithm would end up giving O(n) performance. -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | 88888 Dihedral <dihedral88888@googlemail.com> |
|---|---|
| Date | 2012-08-09 22:35 -0700 |
| Message-ID | <ae9568e5-2053-4997-afc3-203b492a8295@googlegroups.com> |
| In reply to | #26824 |
Andrew Cooper於 2012年8月10日星期五UTC+8上午6時03分26秒寫道:
> On 09/08/2012 22:34, Roman Vashkevich wrote:
>
> > Actually, they are different.
>
> > Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
>
> > Dict uses hashing to get a value from the dict and this is why it's O(1).
>
> >
>
>
>
> Sligtly off topic, but looking up a value in a dictionary is actually
>
> O(n) for all other entries in the dict which suffer a hash collision
>
> with the searched entry.
>
>
>
> True, a sensible choice of hash function will reduce n to 1 in common
>
> cases, but it becomes an important consideration for larger datasets.
>
>
>
> ~Andrew
This is the classical problem of storing the hash collision items one by one.
Those items should be stored by some order.
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2012-08-09 17:46 -0400 |
| Message-ID | <mailman.3122.1344548828.4697.python-list@python.org> |
| In reply to | #26806 |
On 8/9/2012 5:21 PM, Tim Chase wrote:
> On 08/09/12 15:41, Roman Vashkevich wrote:
>> 10.08.2012, в 0:35, Tim Chase написал(а):
>>> On 08/09/12 15:22, Roman Vashkevich wrote:
>>>>> {(4, 5): 1, (5, 4): 1, (4, 4): 2, (2, 3): 1, (4, 3): 2}
>>>>> and i want to print to a file without the brackets comas and semicolon in order to obtain something like this?
>>>>> 4 5 1
>>>>> 5 4 1
>>>>> 4 4 2
>>>>> 2 3 1
>>>>> 4 3 2
>>>>
>>>> for key in dict:
>>>> print key[0], key[1], dict[key]
>>>
>>> This might read more cleanly with tuple unpacking:
>>>
>>> for (edge1, edge2), cost in d.iteritems(): # or .items()
>>> print edge1, edge2, cost
>>>
>>> (I'm making the assumption that this is a edge/cost graph...use
>>> appropriate names according to what they actually mean)
>>
>> dict.items() is a list - linear access time whereas with 'for
>> key in dict:' access time is constant:
>> http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html#use-in-where-possible-1
>
> That link doesn't actually discuss dict.{iter}items()
>
> Both are O(N) because you have to touch each item in the dict--you
> can't iterate over N entries in less than O(N) time. For small
> data-sets, building the list and then iterating over it may be
> faster faster; for larger data-sets, the cost of building the list
> overshadows the (minor) overhead of a generator. Either way, the
> iterate-and-fetch-the-associated-value of .items() & .iteritems()
> can (should?) be optimized in Python's internals to the point I
> wouldn't think twice about using the more readable version.
In 3.x, .keys, .values, and .items are set-like read-only views
specifically designed for iteration. So in 3.x they are THE way to do so
for whichever alternative is appropriate. Iterating by keys and then
looking up values instead of yielding the values at the same time is
extra work.
--
Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <d@davea.name> |
|---|---|
| Date | 2012-08-09 17:47 -0400 |
| Message-ID | <mailman.3123.1344548888.4697.python-list@python.org> |
| In reply to | #26806 |
On 08/09/2012 05:34 PM, Roman Vashkevich wrote:
> Actually, they are different.
> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
> Dict uses hashing to get a value from the dict and this is why it's O(1).
Sure, that's why
for key in dict:
print key[0], key[1], dict[key]
is probably slower than
for (edge1, edge2), cost in d.iteritems(): # or .items()
print edge1, edge2, cost
So, the latter is both faster and easier to read. Why are you arguing against it?
Also, please stop top-posting. It's impolite here, and makes it much harder to figure out who is saying what, in what order.
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | 88888 Dihedral <dihedral88888@googlemail.com> |
|---|---|
| Date | 2012-08-10 10:46 -0700 |
| Message-ID | <mailman.3171.1344620771.4697.python-list@python.org> |
| In reply to | #26819 |
Dave Angel於 2012年8月10日星期五UTC+8上午5時47分45秒寫道:
> On 08/09/2012 05:34 PM, Roman Vashkevich wrote:
>
> > Actually, they are different.
>
> > Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
>
> > Dict uses hashing to get a value from the dict and this is why it's O(1).
>
>
>
> Sure, that's why
>
>
>
> for key in dict:
>
> print key[0], key[1], dict[key]
>
>
>
> is probably slower than
>
>
>
> for (edge1, edge2), cost in d.iteritems(): # or .items()
>
> print edge1, edge2, cost
>
>
>
>
>
> So, the latter is both faster and easier to read. Why are you arguing against it?
>
>
>
> Also, please stop top-posting. It's impolite here, and makes it much harder to figure out who is saying what, in what order.
>
>
>
>
>
>
>
> --
>
>
>
> DaveA
OK, lets estimate the hash colision rate first.
For those items hashed to the same key, I'll store a sorted list with a
known lenth m to be accessed in O(LOG(M)).
Of couse another hash can be attatched.
[toc] | [prev] | [next] | [standalone]
| From | 88888 Dihedral <dihedral88888@googlemail.com> |
|---|---|
| Date | 2012-08-10 10:46 -0700 |
| Message-ID | <0786dd35-b624-4dfc-bbbb-a3e33aa43ae1@googlegroups.com> |
| In reply to | #26819 |
Dave Angel於 2012年8月10日星期五UTC+8上午5時47分45秒寫道:
> On 08/09/2012 05:34 PM, Roman Vashkevich wrote:
>
> > Actually, they are different.
>
> > Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
>
> > Dict uses hashing to get a value from the dict and this is why it's O(1).
>
>
>
> Sure, that's why
>
>
>
> for key in dict:
>
> print key[0], key[1], dict[key]
>
>
>
> is probably slower than
>
>
>
> for (edge1, edge2), cost in d.iteritems(): # or .items()
>
> print edge1, edge2, cost
>
>
>
>
>
> So, the latter is both faster and easier to read. Why are you arguing against it?
>
>
>
> Also, please stop top-posting. It's impolite here, and makes it much harder to figure out who is saying what, in what order.
>
>
>
>
>
>
>
> --
>
>
>
> DaveA
OK, lets estimate the hash colision rate first.
For those items hashed to the same key, I'll store a sorted list with a
known lenth m to be accessed in O(LOG(M)).
Of couse another hash can be attatched.
[toc] | [prev] | [next] | [standalone]
| From | Chris Kaynor <ckaynor@zindagigames.com> |
|---|---|
| Date | 2012-08-09 14:49 -0700 |
| Message-ID | <mailman.3124.1344548966.4697.python-list@python.org> |
| In reply to | #26806 |
On Thu, Aug 9, 2012 at 2:34 PM, Roman Vashkevich <vashkevichrb@gmail.com> wrote:
>
> Actually, they are different.
> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
> Dict uses hashing to get a value from the dict and this is why it's O(1).
>
Using "in" as an operator such as: "if key in dict" or "result = key
in dict" is O(1) as you say. Iterating on the dictionary requires
touching every item, and so is O(n), even though it also using "in" in
the command.
Here are a few quick timing tests I just ran with Python 2.6:
>>> timeit.timeit('for i in d: pass', 'd=dict.fromkeys(range(1))')
0.078683853332734088
>>> timeit.timeit('for i in d: pass', 'd=dict.fromkeys(range(10))')
0.17451784110969015
>>> timeit.timeit('for i in d: pass', 'd=dict.fromkeys(range(100))')
1.1708168159579486
>>> timeit.timeit('for i in d.iteritems(): pass', 'd=dict.fromkeys(range(1))')
0.14186911440299355
>>> timeit.timeit('for i in d.iteritems(): pass', 'd=dict.fromkeys(range(10))')
0.33836512561802579
>>> timeit.timeit('for i in d.iteritems(): pass', 'd=dict.fromkeys(range(100))')
2.2544262854249268
>>> timeit.timeit('for i in d: v=d[i]', 'd=dict.fromkeys(range(1))')
0.10009793211446549
>>> timeit.timeit('for i in d: v=d[i]', 'd=dict.fromkeys(range(10))')
0.38825072496723578
>>> timeit.timeit('for i in d: v=d[i]', 'd=dict.fromkeys(range(100))')
3.3020098061049339
As can be seen here, a 1-item dictionary iterated in 0.07 seconds, 10
items in 0.17 seconds, and 100 items in 1.17 seconds. That is fairly
close to linear, especially when considering the overhead of a
complete no-op
Using iteritems, it appears to actually scale slightly better than
linear, though it is slower than just the plain iteration.
Doing a plain iteration, then looking up the keys to get the values
also appears to be linear, and is even slower than iteritems.
[toc] | [prev] | [next] | [standalone]
| From | Chris Kaynor <ckaynor@zindagigames.com> |
|---|---|
| Date | 2012-08-09 14:51 -0700 |
| Message-ID | <mailman.3125.1344549099.4697.python-list@python.org> |
| In reply to | #26806 |
I realized, I should have done 10, 100, 1000 rather than 1, 10, 100
for better results, so here are the results for 1000 items. It still
maintains the same pattern:
>>> timeit.timeit('for i in d: pass', 'd=dict.fromkeys(range(1000))')
10.166595947685153
>>> timeit.timeit('for i in d.iteritems(): pass', 'd=dict.fromkeys(range(1000))')
19.922474218828711
>>> timeit.timeit('for i in d: v=d[i]', 'd=dict.fromkeys(range(1000))')
31.007666660415282
Chris
On Thu, Aug 9, 2012 at 2:49 PM, Chris Kaynor <ckaynor@zindagigames.com> wrote:
> On Thu, Aug 9, 2012 at 2:34 PM, Roman Vashkevich <vashkevichrb@gmail.com> wrote:
>>
>> Actually, they are different.
>> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
>> Dict uses hashing to get a value from the dict and this is why it's O(1).
>>
>
> Using "in" as an operator such as: "if key in dict" or "result = key
> in dict" is O(1) as you say. Iterating on the dictionary requires
> touching every item, and so is O(n), even though it also using "in" in
> the command.
>
> Here are a few quick timing tests I just ran with Python 2.6:
>
>>>> timeit.timeit('for i in d: pass', 'd=dict.fromkeys(range(1))')
> 0.078683853332734088
>>>> timeit.timeit('for i in d: pass', 'd=dict.fromkeys(range(10))')
> 0.17451784110969015
>>>> timeit.timeit('for i in d: pass', 'd=dict.fromkeys(range(100))')
> 1.1708168159579486
>
>>>> timeit.timeit('for i in d.iteritems(): pass', 'd=dict.fromkeys(range(1))')
> 0.14186911440299355
>>>> timeit.timeit('for i in d.iteritems(): pass', 'd=dict.fromkeys(range(10))')
> 0.33836512561802579
>>>> timeit.timeit('for i in d.iteritems(): pass', 'd=dict.fromkeys(range(100))')
> 2.2544262854249268
>
>>>> timeit.timeit('for i in d: v=d[i]', 'd=dict.fromkeys(range(1))')
> 0.10009793211446549
>>>> timeit.timeit('for i in d: v=d[i]', 'd=dict.fromkeys(range(10))')
> 0.38825072496723578
>>>> timeit.timeit('for i in d: v=d[i]', 'd=dict.fromkeys(range(100))')
> 3.3020098061049339
>
>
> As can be seen here, a 1-item dictionary iterated in 0.07 seconds, 10
> items in 0.17 seconds, and 100 items in 1.17 seconds. That is fairly
> close to linear, especially when considering the overhead of a
> complete no-op
>
> Using iteritems, it appears to actually scale slightly better than
> linear, though it is slower than just the plain iteration.
>
> Doing a plain iteration, then looking up the keys to get the values
> also appears to be linear, and is even slower than iteritems.
[toc] | [prev] | [next] | [standalone]
| From | Giuseppe Amatulli <giuseppe.amatulli@gmail.com> |
|---|---|
| Date | 2012-08-09 16:53 -0500 |
| Message-ID | <mailman.3127.1344549233.4697.python-list@python.org> |
| In reply to | #26806 |
Thanks a lot for the clarification.
Actually my problem is giving to raster dataset in geo-tif format find out
unique pair combination, count the number of observation
unique combination in rast1, count the number of observation
unique combination in rast2, count the number of observation
I try different solution and this seems to me the faster
Rast00=dsRast00.GetRasterBand(1).ReadAsArray()
Rast10=dsRast10.GetRasterBand(1).ReadAsArray()
mask=( Rast00 != 0 ) & ( Rast10 != 0 ) # may be this masking
operation can be included in the for loop
Rast00_mask= Rast00[mask] # may be this masking
operation can be included in the for loop
Rast10_mask= Rast10[mask] # may be this masking
operation can be included in the for loop
array2D = np.array(zip( Rast00_mask,Rast10_mask))
unique_u=dict()
unique_k1=dict()
unique_k2=dict()
for key1,key2 in array2D :
row = tuple((key1,key2))
if row in unique_u:
unique_u[row] += 1
else:
unique_u[row] = 1
if key1 in unique_k1:
unique_k1[key1] += 1
else:
unique_k1[key1] = 1
if key2 in unique_k2:
unique_k2[key2] += 1
else:
unique_k2[key2] = 1
output = open(dst_file_rast0010, "w")
for (a, b), c in unique_u.items():
print(a, b, c, file=output)
output.close()
output = open(dst_file_rast00, "w")
for (a), b in unique_k1.items():
print(a, b, file=output)
output.close()
output = open(dst_file_rast10, "w")
for (a), b in unique_k2.items():
print(a, b, file=output)
output.close()
What do you think? is there a way to speed up the process?
Thanks
Giuseppe
On 9 August 2012 16:34, Roman Vashkevich <vashkevichrb@gmail.com> wrote:
> Actually, they are different.
> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
> Dict uses hashing to get a value from the dict and this is why it's O(1).
>
> 10.08.2012, в 1:21, Tim Chase написал(а):
>
>> On 08/09/12 15:41, Roman Vashkevich wrote:
>>> 10.08.2012, в 0:35, Tim Chase написал(а):
>>>> On 08/09/12 15:22, Roman Vashkevich wrote:
>>>>>> {(4, 5): 1, (5, 4): 1, (4, 4): 2, (2, 3): 1, (4, 3): 2}
>>>>>> and i want to print to a file without the brackets comas and semicolon in order to obtain something like this?
>>>>>> 4 5 1
>>>>>> 5 4 1
>>>>>> 4 4 2
>>>>>> 2 3 1
>>>>>> 4 3 2
>>>>>
>>>>> for key in dict:
>>>>> print key[0], key[1], dict[key]
>>>>
>>>> This might read more cleanly with tuple unpacking:
>>>>
>>>> for (edge1, edge2), cost in d.iteritems(): # or .items()
>>>> print edge1, edge2, cost
>>>>
>>>> (I'm making the assumption that this is a edge/cost graph...use
>>>> appropriate names according to what they actually mean)
>>>
>>> dict.items() is a list - linear access time whereas with 'for
>>> key in dict:' access time is constant:
>>> http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html#use-in-where-possible-1
>>
>> That link doesn't actually discuss dict.{iter}items()
>>
>> Both are O(N) because you have to touch each item in the dict--you
>> can't iterate over N entries in less than O(N) time. For small
>> data-sets, building the list and then iterating over it may be
>> faster faster; for larger data-sets, the cost of building the list
>> overshadows the (minor) overhead of a generator. Either way, the
>> iterate-and-fetch-the-associated-value of .items() & .iteritems()
>> can (should?) be optimized in Python's internals to the point I
>> wouldn't think twice about using the more readable version.
>>
>> -tkc
>>
>>
>
--
Giuseppe Amatulli
Web: www.spatial-ecology.net
[toc] | [prev] | [next] | [standalone]
| From | Roman Vashkevich <vashkevichrb@gmail.com> |
|---|---|
| Date | 2012-08-10 02:02 +0400 |
| Message-ID | <mailman.3128.1344549727.4697.python-list@python.org> |
| In reply to | #26806 |
10.08.2012, в 1:47, Dave Angel написал(а):
> On 08/09/2012 05:34 PM, Roman Vashkevich wrote:
>> Actually, they are different.
>> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
>> Dict uses hashing to get a value from the dict and this is why it's O(1).
>
> Sure, that's why
>
> for key in dict:
> print key[0], key[1], dict[key]
>
> is probably slower than
>
> for (edge1, edge2), cost in d.iteritems(): # or .items()
> print edge1, edge2, cost
>
>
> So, the latter is both faster and easier to read. Why are you arguing against it?
>
> Also, please stop top-posting. It's impolite here, and makes it much harder to figure out who is saying what, in what order.
>
>
>
> --
>
> DaveA
>
I'm not arguing at all. Sorry if it sounded like I was arguing.
Thanks for notifying me of the way messages should be sent.
Roman
[toc] | [prev] | [standalone]
Page 3 of 3 — ← Prev page 1 2 [3]
Back to top | Article view | comp.lang.python
csiph-web