Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #16431 > unrolled thread
| Started by | Peter Otten <__peter__@web.de> |
|---|---|
| First post | 2011-11-30 13:47 +0100 |
| Last post | 2011-12-02 04:33 +0000 |
| Articles | 11 on this page of 51 — 14 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: order independent hash? Peter Otten <__peter__@web.de> - 2011-11-30 13:47 +0100
Re: order independent hash? 88888 Dihedral <dihedral88888@googlemail.com> - 2011-12-01 07:35 -0800
Re: order independent hash? 88888 Dihedral <dihedral88888@googlemail.com> - 2011-12-01 07:35 -0800
Re: order independent hash? Dave Angel <d@davea.name> - 2011-12-01 10:52 -0500
Re: order independent hash? 88888 Dihedral <dihedral88888@googlemail.com> - 2011-12-01 08:17 -0800
Re: order independent hash? 88888 Dihedral <dihedral88888@googlemail.com> - 2011-12-01 08:17 -0800
Re: order independent hash? Chris Angelico <rosuav@gmail.com> - 2011-12-02 03:18 +1100
Re: order independent hash? 88888 Dihedral <dihedral88888@googlemail.com> - 2011-12-01 20:29 -0800
Re: order independent hash? Chris Angelico <rosuav@gmail.com> - 2011-12-02 16:00 +1100
Re: order independent hash? 88888 Dihedral <dihedral88888@googlemail.com> - 2011-12-01 21:48 -0800
Re: order independent hash? Lie Ryan <lie.1296@gmail.com> - 2011-12-05 00:40 +1100
Re: order independent hash? 88888 Dihedral <dihedral88888@googlemail.com> - 2011-12-01 21:48 -0800
Re: order independent hash? Roel Schroeven <rschroev_nospam_ml@fastmail.fm> - 2011-12-04 14:41 +0100
Re: order independent hash? 88888 Dihedral <dihedral88888@googlemail.com> - 2011-12-04 07:39 -0800
Re: order independent hash? Ian Kelly <ian.g.kelly@gmail.com> - 2011-12-04 10:41 -0700
Re: order independent hash? 88888 Dihedral <dihedral88888@googlemail.com> - 2011-12-04 10:06 -0800
Re: order independent hash? Ian Kelly <ian.g.kelly@gmail.com> - 2011-12-04 13:13 -0700
Re: order independent hash? 88888 Dihedral <dihedral88888@googlemail.com> - 2011-12-04 15:17 -0800
Re: order independent hash? Matt Joiner <anacrolix@gmail.com> - 2011-12-05 10:21 +1100
Re: order independent hash? Ian Kelly <ian.g.kelly@gmail.com> - 2011-12-04 16:24 -0700
Re: order independent hash? 88888 Dihedral <dihedral88888@googlemail.com> - 2011-12-04 16:52 -0800
Re: order independent hash? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-12-05 02:00 +0000
Re: order independent hash? 88888 Dihedral <dihedral88888@googlemail.com> - 2011-12-04 16:52 -0800
Re: order independent hash? Lie Ryan <lie.1296@gmail.com> - 2011-12-05 12:59 +1100
Re: order independent hash? Ethan Furman <ethan@stoneleaf.us> - 2011-12-04 18:08 -0800
Re: order independent hash? Ben Finney <ben+python@benfinney.id.au> - 2011-12-05 15:52 +1100
Re: order independent hash? Matt Joiner <anacrolix@gmail.com> - 2011-12-05 14:02 +1100
Re: order independent hash? 88888 Dihedral <dihedral88888@googlemail.com> - 2011-12-06 21:24 -0800
Re: order independent hash? 88888 Dihedral <dihedral88888@googlemail.com> - 2011-12-06 21:24 -0800
Re: order independent hash? 88888 Dihedral <dihedral88888@googlemail.com> - 2011-12-04 15:17 -0800
Re: order independent hash? Terry Reedy <tjreedy@udel.edu> - 2011-12-04 19:28 -0500
Re: order independent hash? 88888 Dihedral <dihedral88888@googlemail.com> - 2011-12-04 10:06 -0800
Re: order independent hash? Hrvoje Niksic <hniksic@xemacs.org> - 2011-12-02 10:53 +0100
Re: order independent hash? 88888 Dihedral <dihedral88888@googlemail.com> - 2011-12-02 03:48 -0800
Re: order independent hash? 88888 Dihedral <dihedral88888@googlemail.com> - 2011-12-02 09:00 -0800
Re: order independent hash? Terry Reedy <tjreedy@udel.edu> - 2011-12-02 16:40 -0500
Re: order independent hash? Hrvoje Niksic <hniksic@xemacs.org> - 2011-12-04 14:55 +0100
Re: order independent hash? Chris Angelico <rosuav@gmail.com> - 2011-12-05 01:08 +1100
Re: order independent hash? Hrvoje Niksic <hniksic@xemacs.org> - 2011-12-07 14:28 +0100
Re: order independent hash? 88888 Dihedral <dihedral88888@googlemail.com> - 2011-12-07 09:48 -0800
Re: order independent hash? Tim Chase <python.list@tim.thechases.com> - 2011-12-04 12:57 -0600
Re: order independent hash? Hrvoje Niksic <hniksic@xemacs.org> - 2011-12-08 10:30 +0100
Re: order independent hash? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-12-09 11:27 +0000
Re: order independent hash? Hrvoje Niksic <hniksic@xemacs.org> - 2011-12-09 17:51 +0100
Re: order independent hash? Chris Angelico <rosuav@gmail.com> - 2011-12-10 04:13 +1100
Re: order independent hash? Lie Ryan <lie.1296@gmail.com> - 2011-12-11 10:58 +1100
Re: order independent hash? Chris Angelico <rosuav@gmail.com> - 2011-12-11 11:17 +1100
Re: order independent hash? Lie Ryan <lie.1296@gmail.com> - 2011-12-11 14:04 +1100
Re: order independent hash? Lie Ryan <lie.1296@gmail.com> - 2011-12-05 00:29 +1100
Re: order independent hash? 88888 Dihedral <dihedral88888@googlemail.com> - 2011-12-01 20:29 -0800
Re: order independent hash? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-12-02 04:33 +0000
Page 3 of 3 — ← Prev page 1 2 [3]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2011-12-04 12:57 -0600 |
| Message-ID | <mailman.3277.1323025080.27778.python-list@python.org> |
| In reply to | #16608 |
On 12/04/11 08:08, Chris Angelico wrote: > 2011/12/5 Hrvoje Niksic<hniksic@xemacs.org>: >> If a Python >> implementation tried to implement dict as a tree, instances of classes >> that define only __eq__ and __hash__ would not be correctly inserted in >> such a dict. > > Couldn't you just make a tree of hash values? Okay, that's probably > not the most useful way to do things, but technically it'd comply with > the spec. From an interface perspective, I suppose it would work. However one of the main computer-science reasons for addressing by a hash is to get O(1) access to items (modulo pessimal hash structures/algorithms which can approach O(N) if everything hashes to the same value/bucket), rather than the O(logN) time you'd get from a tree. So folks reaching for a hash/map might be surprised if performance degraded with the size of the contents. -tkc
[toc] | [prev] | [next] | [standalone]
| From | Hrvoje Niksic <hniksic@xemacs.org> |
|---|---|
| Date | 2011-12-08 10:30 +0100 |
| Message-ID | <87y5un9zs6.fsf@xemacs.org> |
| In reply to | #16624 |
Tim Chase <python.list@tim.thechases.com> writes: > From an interface perspective, I suppose it would work. However one > of the main computer-science reasons for addressing by a hash is to > get O(1) access to items (modulo pessimal hash structures/algorithms > which can approach O(N) if everything hashes to the same > value/bucket), rather than the O(logN) time you'd get from a tree. So > folks reaching for a hash/map might be surprised if performance > degraded with the size of the contents. In a language like Python, the difference between O(1) and O(log n) is not the primary reason why programmers use dict; they use it because it's built-in, efficient compared to alternatives, and convenient to use. If Python dict had been originally implemented as a tree, I'm sure it would be just as popular. Omitting the factor of O(log n) as functionally equivalent to O(1) is applicable to many situations and is sometimes called "soft-O" notation. One example from practice is the pre-2011 C++, where the standardization committee failed to standardize hash tables on time for the 1998 standard. Although this was widely recognized as an oversight, a large number of programs simply used tree-based std::maps and never noticed a practical difference between between average-constant-time and logarithmic complexity lookups.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2011-12-09 11:27 +0000 |
| Message-ID | <4ee1f093$0$29977$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #16813 |
On Thu, 08 Dec 2011 10:30:01 +0100, Hrvoje Niksic wrote: > In a language like Python, the difference between O(1) and O(log n) is > not the primary reason why programmers use dict; they use it because > it's built-in, efficient compared to alternatives, and convenient to > use. If Python dict had been originally implemented as a tree, I'm sure > it would be just as popular. Except for people who needed dicts with tens of millions of items. Remember also that dicts are used for looking up names in Python. Nearly all method calls, attribute accesses, global name lookups, function calls, etc. go through at least one and potentially multiple dict lookups. The simple statement: n = len(x.y) + len(z) likely requires nine dict lookups, and potentially more. In even a small application, there could be tens of millions of dict lookups; changing each of them from O(1) to O(log N) could result in a measurable slowdown to Python code in real applications. That is why dicts are highly optimized for speed. As fast as dicts are, sometimes they aren't fast enough. One common micro- optimization for tight loops and time-critical code is to create local variables from globals or builtins, because local variable access bypasses dict lookup. So people would notice if dicts were slower. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Hrvoje Niksic <hniksic@xemacs.org> |
|---|---|
| Date | 2011-12-09 17:51 +0100 |
| Message-ID | <87ty59adtx.fsf@xemacs.org> |
| In reply to | #16907 |
Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes: > Except for people who needed dicts with tens of millions of items. Huge tree-based dicts would be somewhat slower than today's hash-based dicts, but they would be far from unusable. Trees are often used to organize large datasets for quick access. The case of dicts which require frequent access, such as those used to implement namespaces, is different, and more interesting. Those dicts are typically quite small, and for them the difference between O(log n) and O(1) is negligible in both theory (since n is "small", i.e. bounded) and practice. In fact, depending on the details of the implementation, the lookup in a small tree could even be marginally faster.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2011-12-10 04:13 +1100 |
| Message-ID | <mailman.3480.1323450805.27778.python-list@python.org> |
| In reply to | #16923 |
On Sat, Dec 10, 2011 at 3:51 AM, Hrvoje Niksic <hniksic@xemacs.org> wrote: > The case of dicts which require frequent access, such as those used to > implement namespaces, is different, and more interesting. Those dicts > are typically quite small, and for them the difference between O(log n) > and O(1) is negligible in both theory (since n is "small", i.e. bounded) > and practice. In fact, depending on the details of the implementation, > the lookup in a small tree could even be marginally faster. This is something where, I am sure, far greater minds than mine delve... but, would a splay tree be effective for name lookups? In most cases, you'll have a huge puddle of names of which you use the tiniest fraction; and a splay tree would, in effect, automatically optimize itself to handle tight loops. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Lie Ryan <lie.1296@gmail.com> |
|---|---|
| Date | 2011-12-11 10:58 +1100 |
| Message-ID | <mailman.3503.1323561541.27778.python-list@python.org> |
| In reply to | #16907 |
On 12/09/2011 10:27 PM, Steven D'Aprano wrote: > On Thu, 08 Dec 2011 10:30:01 +0100, Hrvoje Niksic wrote: > >> In a language like Python, the difference between O(1) and O(log n) is >> not the primary reason why programmers use dict; they use it because >> it's built-in, efficient compared to alternatives, and convenient to >> use. If Python dict had been originally implemented as a tree, I'm sure >> it would be just as popular. > > Except for people who needed dicts with tens of millions of items. who should be using a proper DBMS in any case.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2011-12-11 11:17 +1100 |
| Message-ID | <mailman.3505.1323562670.27778.python-list@python.org> |
| In reply to | #16907 |
On Sun, Dec 11, 2011 at 10:58 AM, Lie Ryan <lie.1296@gmail.com> wrote: > On 12/09/2011 10:27 PM, Steven D'Aprano wrote: >> Except for people who needed dicts with tens of millions of items. > > who should be using a proper DBMS in any case. Not necessarily. "Database" usually implies disk-based and relational, features that may well be quite superfluous; and a pure-memory database with no relational facilities... is basically a dict. So why not use one? ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Lie Ryan <lie.1296@gmail.com> |
|---|---|
| Date | 2011-12-11 14:04 +1100 |
| Message-ID | <mailman.3507.1323572713.27778.python-list@python.org> |
| In reply to | #16907 |
On 12/11/2011 11:17 AM, Chris Angelico wrote: > On Sun, Dec 11, 2011 at 10:58 AM, Lie Ryan<lie.1296@gmail.com> wrote: >> On 12/09/2011 10:27 PM, Steven D'Aprano wrote: >>> Except for people who needed dicts with tens of millions of items. >> >> who should be using a proper DBMS in any case. > > Not necessarily. "Database" usually implies disk-based and relational, > features that may well be quite superfluous; and a pure-memory > database with no relational facilities... is basically a dict. So why > not use one? It is very unlikely you'd have millions of items in a dict and you're not planning to do any data processing at all. In any case, there are very few use cases which requires the use of a dict with millions of items that wouldn't be better served by a proper database.
[toc] | [prev] | [next] | [standalone]
| From | Lie Ryan <lie.1296@gmail.com> |
|---|---|
| Date | 2011-12-05 00:29 +1100 |
| Message-ID | <mailman.3262.1323005372.27778.python-list@python.org> |
| In reply to | #16523 |
On 12/02/2011 03:29 PM, 88888 Dihedral wrote: > > I clear my point a hash is a collection of (key, value) pairs that have > well defined methods and behavior to be used in programming. > > The basic operations of a hash normally includes the following: > > 1. insertion of a (key, value) pair into the hash > 2. deletion of a (key, value) from the hash > 3. inquiring a hash by a key to retrieve the value if the (key, value) > pair available in the hash. If no key matched, the hash will return > a not found result. > > The hash can grow with (k,v) pairs accumulated in the run time. > An auto memory management mechanism is required for a hash of a non-fixed size of (k,v) pairs. > > Some implementations of a hash might pose some restrictions of k and v > for some reasons. But in object programming k and v can be objects > to be manipulated by the programmer. Strictly speaking, what you're describing is just a dictionary/mapping abstract data type (ADT), not a hashtable. Hashtable is a particular way to implement the dictionary/mapping ADT. Python's dictionary is implemented as hashtable, but there are other ways to implement a dictionary/mapping, such as using a sorted tree. For a data structure to be considered a Hashtable, in addition to having the properties of a dictionary that you described, the data structure must also uses a "hashing function" to encode the dictionary's keys into integer that will be used to calculate the index for the corresponding value in its internal array. A hashtable also must provide mechanism to deal with hash collisions to maintains its invariants as a dictionary.
[toc] | [prev] | [next] | [standalone]
| From | 88888 Dihedral <dihedral88888@googlemail.com> |
|---|---|
| Date | 2011-12-01 20:29 -0800 |
| Message-ID | <mailman.3215.1322800184.27778.python-list@python.org> |
| In reply to | #16503 |
On Friday, December 2, 2011 12:18:29 AM UTC+8, Chris Angelico wrote: > On Fri, Dec 2, 2011 at 2:52 AM, Dave Angel <d...@davea.name> wrote: > > On 12/01/2011 10:35 AM, 88888 Dihedral wrote: > >> I knew a hash can replace a bi-directional linked list. > >> The value can be a multi-field string to be parsed for further actions. > >> Is this what you are asking? > > > > A hash is a number, so I don't see how it can replace any kind of linked > > list. Perhaps you're thinking of some other language. > > A hashtable is a form of data structure that involves hashing values > (ie calculating hashes for them) and storing them for easier retrieval > than a simple linked list offers. That may be what you're thinking of. > > ChrisA I clear my point a hash is a collection of (key, value) pairs that have well defined methods and behavior to be used in programming. The basic operations of a hash normally includes the following: 1. insertion of a (key, value) pair into the hash 2. deletion of a (key, value) from the hash 3. inquiring a hash by a key to retrieve the value if the (key, value) pair available in the hash. If no key matched, the hash will return a not found result. The hash can grow with (k,v) pairs accumulated in the run time. An auto memory management mechanism is required for a hash of a non-fixed size of (k,v) pairs. Some implementations of a hash might pose some restrictions of k and v for some reasons. But in object programming k and v can be objects to be manipulated by the programmer.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2011-12-02 04:33 +0000 |
| Message-ID | <4ed85535$0$29988$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #16503 |
On Fri, 02 Dec 2011 03:18:29 +1100, Chris Angelico wrote: > On Fri, Dec 2, 2011 at 2:52 AM, Dave Angel <d@davea.name> wrote: >> On 12/01/2011 10:35 AM, 88888 Dihedral wrote: >>> I knew a hash can replace a bi-directional linked list. The value can >>> be a multi-field string to be parsed for further actions. Is this >>> what you are asking? >> >> A hash is a number, so I don't see how it can replace any kind of >> linked list. Perhaps you're thinking of some other language. > > A hashtable is a form of data structure that involves hashing values (ie > calculating hashes for them) and storing them for easier retrieval than > a simple linked list offers. That may be what you're thinking of. Python dicts are hash tables. -- Steven
[toc] | [prev] | [standalone]
Page 3 of 3 — ← Prev page 1 2 [3]
Back to top | Article view | comp.lang.python
csiph-web