Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #67985 > unrolled thread

Re: Tuples and immutability

Started byDuncan Booth <duncan.booth@invalid.invalid>
First post2014-03-07 09:33 +0000
Last post2014-03-08 19:55 -0700
Articles 20 on this page of 108 — 24 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Tuples and immutability Duncan Booth <duncan.booth@invalid.invalid> - 2014-03-07 09:33 +0000
    Re: Tuples and immutability Ben Finney <ben+python@benfinney.id.au> - 2014-03-07 22:04 +1100
    Re: Tuples and immutability Chris Angelico <rosuav@gmail.com> - 2014-03-07 22:11 +1100
    Re: Tuples and immutability Peter Otten <__peter__@web.de> - 2014-03-07 12:38 +0100
    Re: Tuples and immutability Chris Angelico <rosuav@gmail.com> - 2014-03-07 22:45 +1100
    Re: Tuples and immutability Alister <alister.ware@ntlworld.com> - 2014-03-07 11:51 +0000
      Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-07 11:23 -0700
    Re: Tuples and immutability Roy Smith <roy@panix.com> - 2014-03-07 08:37 -0500
    Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-08 15:17 +1300
      Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-07 21:17 -0700
        Balanced trees (was: Re: Tuples and immutability) Marko Rauhamaa <marko@pacujo.net> - 2014-03-08 10:34 +0200
          Re: Balanced trees (was: Re: Tuples and immutability) Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-08 04:03 -0700
            Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-08 13:26 +0200
          Re: Balanced trees (was: Re: Tuples and immutability) Dan Stromberg <drsalists@gmail.com> - 2014-03-08 11:58 -0800
          Re: Balanced trees Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-08 20:37 +0000
            Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-08 23:21 +0200
              Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-08 17:22 -0500
                Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-09 11:17 +0200
              Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-08 17:31 -0800
                Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-09 11:27 +0200
                  Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-09 14:20 -0700
                    Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-09 23:32 +0200
                      Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-09 14:37 -0700
                        Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-09 23:43 +0200
                          Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-09 15:08 -0700
                            Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-10 00:24 +0200
                          Re: Balanced trees Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-09 18:04 -0600
                            Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-10 03:24 +0000
                      Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-10 03:24 +0000
                        Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-10 08:16 +0200
                          Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-10 08:53 +0000
                            Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-10 11:41 +0200
                              Re: Balanced trees Ned Batchelder <ned@nedbatchelder.com> - 2014-03-10 06:57 -0400
                                Re: Balanced trees Rustom Mody <rustompmody@gmail.com> - 2014-03-10 09:01 -0700
                                  Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-11 02:02 +0000
                                    Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-10 22:20 -0400
                                      Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 13:29 +1100
                            Re: Balanced trees Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-10 09:51 +0000
                          Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-10 09:59 -0400
                            Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 03:20 +1100
                            Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 03:24 +1100
                              Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-10 19:08 +0200
                                Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 04:17 +1100
                                  Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-10 19:34 +0200
                                    Re: Balanced trees Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-13 12:40 -0600
                                    Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-13 23:57 +0000
                                      Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-13 20:12 -0700
                                        Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-14 05:02 +0000
                                Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-10 19:24 -0400
                                  Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 10:27 +1100
                                  Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-11 01:26 +0000
                                    Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 12:45 +1100
                                    Re: Balanced trees Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-10 21:38 -0600
                                    Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 15:28 +1100
                                      Re: Balanced trees "Rhodri James" <rhodri@wildebst.org.uk> - 2014-03-12 00:57 +0000
                                    Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-11 12:12 +0200
                                      Re: Balanced trees alex23 <wuwei23@gmail.com> - 2014-03-12 10:13 +1000
                                  Re: Balanced trees Alister <alister.ware@ntlworld.com> - 2014-03-11 09:25 +0000
                                  Re: Balanced trees Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2014-03-12 10:08 +0100
                            Re: Balanced trees Tim Chase <python.list@tim.thechases.com> - 2014-03-10 11:33 -0500
                            Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 03:39 +1100
                            Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-10 18:05 -0700
                              Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-10 22:13 -0400
                        Re: Balanced trees Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-03-10 19:57 -0400
          Re: Balanced trees Joshua Landau <joshua@landau.ws> - 2014-03-15 01:13 +0000
            Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-18 00:05 +0200
              Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-18 12:26 -0700
                Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-18 22:55 +0200
                  Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-18 14:45 -0700
                    Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-19 00:03 +0200
                      Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-18 15:21 -0700
                        Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-19 01:11 +0200
                          Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-19 01:15 +0000
                            Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-19 10:49 +0200
                              Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-19 13:42 +0000
                                Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-19 15:54 +0200
                                Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-19 10:06 -0400
                        Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-19 01:15 +0000
                          Re: Balanced trees Ethan Furman <ethan@stoneleaf.us> - 2014-03-19 08:15 -0700
                    Re: Balanced trees "Rhodri James" <rhodri@wildebst.org.uk> - 2014-03-20 02:16 +0000
                      Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-19 19:34 -0700
                  Re: Balanced trees Chris Kaynor <ckaynor@zindagigames.com> - 2014-03-18 18:02 -0700
              Re: Balanced trees Daniel Stutzbach <stutzbach@google.com> - 2014-03-18 13:18 -0700
          blist in standard library  (was Re: Balanced trees) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-15 12:31 +0000
          Re: Balanced trees Daniel Stutzbach <stutzbach@google.com> - 2014-03-17 14:16 -0700
          Re: Balanced trees Joshua Landau <joshua@landau.ws> - 2014-03-18 00:08 +0000
          Re: Balanced trees Daniel Stutzbach <stutzbach@google.com> - 2014-03-17 18:01 -0700
          Re: Balanced trees Joshua Landau <joshua@landau.ws> - 2014-03-18 07:46 +0000
        Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-09 13:40 +1300
          Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-08 19:39 -0700
            Re: Tuples and immutability Marko Rauhamaa <marko@pacujo.net> - 2014-03-09 11:35 +0200
            Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-10 11:03 +1300
              Re: Tuples and immutability Terry Reedy <tjreedy@udel.edu> - 2014-03-09 19:00 -0400
              Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-09 17:42 -0600
                Re: Tuples and immutability Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-10 02:37 +0000
                  Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-10 02:35 -0600
                    Re: Tuples and immutability Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-10 09:13 +0000
                    Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-11 18:15 +1300
                Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-11 18:03 +1300
                  Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-11 04:39 -0600
                    Re: Tuples and immutability Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-11 16:46 +0000
                      Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-12 10:23 +1300
                      Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-11 17:06 -0600
                        Re: Tuples and immutability Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-12 23:20 +0000
                          Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-12 19:35 -0600
                          Re: Tuples and immutability Terry Reedy <tjreedy@udel.edu> - 2014-03-12 22:09 -0400
        Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-09 13:45 +1300
          Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-08 19:55 -0700

Page 2 of 6 — ← Prev page 1 [2] 3 4 5 6  Next page →


#68091 — Re: Balanced trees

FromDan Stromberg <drsalists@gmail.com>
Date2014-03-09 14:20 -0700
SubjectRe: Balanced trees
Message-ID<mailman.7963.1394400019.18130.python-list@python.org>
In reply to#68076
On Sun, Mar 9, 2014 at 1:27 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
> Dan Stromberg <drsalists@gmail.com>:
>
>> On Sat, Mar 8, 2014 at 1:21 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
>>> If I had to choose between a hash table and AVL (or RB) tree in the
>>> standard library, it would definitely have to be the latter. It is more
>>> generally usable, has fewer corner cases and probably has an equal
>>> performance even in hash tables' sweet spot.
>>
>> Actually, in the performance comparison I mentioned previously, I
>> compared Python dict's to a bunch of different balanced trees and one
>> unbalanced tree. The dictionary was much faster, though granted, it
>> was the only one in C.
>
> Yes, that is one major detail. There must be many more.

This is not just a detail: O(1) tends to be beat O(logn) pretty easily
for large n.

[toc] | [prev] | [next] | [standalone]


#68092 — Re: Balanced trees

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-03-09 23:32 +0200
SubjectRe: Balanced trees
Message-ID<87a9czrrax.fsf@elektro.pacujo.net>
In reply to#68091
Dan Stromberg <drsalists@gmail.com>:

> This is not just a detail: O(1) tends to be beat O(logn) pretty easily
> for large n.

There is no O(1) hash table.


Marko

[toc] | [prev] | [next] | [standalone]


#68094 — Re: Balanced trees

FromDan Stromberg <drsalists@gmail.com>
Date2014-03-09 14:37 -0700
SubjectRe: Balanced trees
Message-ID<mailman.7965.1394401067.18130.python-list@python.org>
In reply to#68092
On Sun, Mar 9, 2014 at 2:32 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
> Dan Stromberg <drsalists@gmail.com>:
>
>> This is not just a detail: O(1) tends to be beat O(logn) pretty easily
>> for large n.
>
> There is no O(1) hash table.

http://stackoverflow.com/questions/2771368/can-hash-tables-really-be-o1

[toc] | [prev] | [next] | [standalone]


#68095 — Re: Balanced trees

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-03-09 23:43 +0200
SubjectRe: Balanced trees
Message-ID<8761nnrqsk.fsf@elektro.pacujo.net>
In reply to#68094
Dan Stromberg <drsalists@gmail.com>:

> On Sun, Mar 9, 2014 at 2:32 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
>> There is no O(1) hash table.
>
> http://stackoverflow.com/questions/2771368/can-hash-tables-really-be-o1

Please elaborate.


Marko

[toc] | [prev] | [next] | [standalone]


#68098 — Re: Balanced trees

FromDan Stromberg <drsalists@gmail.com>
Date2014-03-09 15:08 -0700
SubjectRe: Balanced trees
Message-ID<mailman.7967.1394402908.18130.python-list@python.org>
In reply to#68095
On Sun, Mar 9, 2014 at 2:43 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
> Dan Stromberg <drsalists@gmail.com>:
>
>> On Sun, Mar 9, 2014 at 2:32 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
>>> There is no O(1) hash table.
>>
>> http://stackoverflow.com/questions/2771368/can-hash-tables-really-be-o1
>
> Please elaborate.

A hash table of fixed size is O(1) until you fill it so full that you
start getting enough collisions to make it O(n), as one bucket becomes
a linked list.  This is because the hash function is O(1), and looking
up a value in a C array is O(1).

A more flexible hash table will not have a fixed size; instead it will
grow itself as needed.  This growth operation tends to be O(n), but
happens vanishingly infrequently as the table grows, so it's still
amortized O(1).

[toc] | [prev] | [next] | [standalone]


#68100 — Re: Balanced trees

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-03-10 00:24 +0200
SubjectRe: Balanced trees
Message-ID<87zjkzqada.fsf@elektro.pacujo.net>
In reply to#68098
Dan Stromberg <drsalists@gmail.com>:

> On Sun, Mar 9, 2014 at 2:43 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
>>>> There is no O(1) hash table.
> [...]
>
> it's still amortized O(1).

So we are in perfect agreement.

Hash tables are a useful family of techniques but involve quite a bit of
cost/benefit heuristics. You can only claim O(1) if your hash table is
at least as large as the number of elements.

As for comparing them with balanced trees, I think one of the main
advantages hash tables have is better CPU cache performance. A tree
involves much more jumping around the RAM than a hash table.

Still, trees have many things going for them:

 * no need to find a good hashing function

 * no need to spend time on calculating the hash value

 * no need to find optimal hash table sizes -- no costly resizing

And of course, the main point:

 * trees are ordered

Note that most key comparisons tend to be very quick as you don't need
to traverse the whole key to locate the element. Also, it is hard to say
if going O(log n) levels deep in the tree is slower than calculating the
hash value, although, as I said, the latter operation tends to benefit
from a cache locality.

Trees are also crucial when you don't have exact matches, but for
example, when you are looking for prefix or wild-card matches.


Marko

[toc] | [prev] | [next] | [standalone]


#68110 — Re: Balanced trees

FromIan Kelly <ian.g.kelly@gmail.com>
Date2014-03-09 18:04 -0600
SubjectRe: Balanced trees
Message-ID<mailman.7975.1394409929.18130.python-list@python.org>
In reply to#68095
On Sun, Mar 9, 2014 at 4:08 PM, Dan Stromberg <drsalists@gmail.com> wrote:
> On Sun, Mar 9, 2014 at 2:43 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
>> Dan Stromberg <drsalists@gmail.com>:
>>
>>> On Sun, Mar 9, 2014 at 2:32 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
>>>> There is no O(1) hash table.
>>>
>>> http://stackoverflow.com/questions/2771368/can-hash-tables-really-be-o1
>>
>> Please elaborate.
>
> A hash table of fixed size is O(1) until you fill it so full that you
> start getting enough collisions to make it O(n), as one bucket becomes
> a linked list.  This is because the hash function is O(1), and looking
> up a value in a C array is O(1).
>
> A more flexible hash table will not have a fixed size; instead it will
> grow itself as needed.  This growth operation tends to be O(n), but
> happens vanishingly infrequently as the table grows, so it's still
> amortized O(1).

A hash table can also only ever be O(1) in the average case.  Worst
case, everything you put into the hash table has the same hash value,
and so the time to fetch an item is entirely dependent on the
collision resolution scheme -- e.g. O(n) if a linked list or linear
probe is used, or perhaps O(log n) if each bucket is a balanced binary
tree.

There are schemes such as cuckoo hashing that allow true O(1) worst
case access, but they have other drawbacks -- inserts with cuckoo
hashing are amortized O(1), and it is even possible for an insert to
fail entirely after spending exponential time trying to find a way to
arrange things.

[toc] | [prev] | [next] | [standalone]


#68118 — Re: Balanced trees

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-03-10 03:24 +0000
SubjectRe: Balanced trees
Message-ID<531d305d$1$29994$c3e8da3$5496439d@news.astraweb.com>
In reply to#68110
On Sun, 09 Mar 2014 18:04:46 -0600, Ian Kelly wrote:

> On Sun, Mar 9, 2014 at 4:08 PM, Dan Stromberg <drsalists@gmail.com>
> wrote:
>> On Sun, Mar 9, 2014 at 2:43 PM, Marko Rauhamaa <marko@pacujo.net>
>> wrote:
>>> Dan Stromberg <drsalists@gmail.com>:
>>>
>>>> On Sun, Mar 9, 2014 at 2:32 PM, Marko Rauhamaa <marko@pacujo.net>
>>>> wrote:
>>>>> There is no O(1) hash table.
>>>>
>>>> http://stackoverflow.com/questions/2771368/can-hash-tables-really-be-
o1
>>>
>>> Please elaborate.
>>
>> A hash table of fixed size is O(1) until you fill it so full that you
>> start getting enough collisions to make it O(n), as one bucket becomes
>> a linked list.  This is because the hash function is O(1), and looking
>> up a value in a C array is O(1).
>>
>> A more flexible hash table will not have a fixed size; instead it will
>> grow itself as needed.  This growth operation tends to be O(n), but
>> happens vanishingly infrequently as the table grows, so it's still
>> amortized O(1).
> 
> A hash table can also only ever be O(1) in the average case.

Since this discussion has apparently devolved into people trying to out-
pedant each other, I'm going to dispute that. That will depend on the 
distribution of hash collisions. With a sufficiently big table, 
sufficiently small set of possible data, a sufficiently good hash 
function, or some combination of the above, a hash table may be O(1) even 
in the worst case.

E.g. if you hash a number n to itself, e.g. hash(42) == 42, your data 
consists of single byte numbers 0 through 255, and your hash table has 
256 slots, *there are no collisions* and every lookup is O(1).

There are technical meanings for Big Oh notation, which can be quite 
complicated. There's Big O, Little o, Big Ω (omega), Little ω (omega), 
Big Θ (theta), although I don't think there's a Little θ. They can refer 
to the best case, worst case, average (mean) case, median case, "typical" 
case where you're making some guess of what occurs typically, and any 
other situation you care about. The algorithmic complexity can apply 
consistently to each and every run of the algorithm, or they can be 
amortized over many runs. If you're doing a technical analysis of an 
algorithm, this pedantic detail is important.

But when people are just talking informally in a hand-wavy manner, they 
usually say "O(foo)" when they actually mean "for typical data under 
typical conditions, the algorithm runs with a complexity of the order of 
foo". And that's perfectly okay, just like it's perfectly okay to 
describe the Earth as a sphere even though a pedant will call it a 
wrinkly asymmetrical oblate spheroid.




-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]


#68117 — Re: Balanced trees

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-03-10 03:24 +0000
SubjectRe: Balanced trees
Message-ID<531d3058$0$29994$c3e8da3$5496439d@news.astraweb.com>
In reply to#68092
On Sun, 09 Mar 2014 23:32:54 +0200, Marko Rauhamaa wrote:

> Dan Stromberg <drsalists@gmail.com>:
> 
>> This is not just a detail: O(1) tends to be beat O(logn) pretty easily
>> for large n.
> 
> There is no O(1) hash table.

Of course there are.

While it is true that hash tables *in general* are not *always* O(1), the 
claim that hash tables are O(1) is *less wrong* than your claim that 
there is "no O(1) hash table".

Proof: I create a hash table that accepts unsigned bytes as keys, where 
the hash is the value of the byte. So byte 0x42 hashes to 0x42, or 
decimal 66. I give the hash table 256 slots, numbered from 0 to 255. 
Every hash takes constant time, there are no collisions at all, lookups, 
insertions and deletions all take constant time regardless of how full 
the table is. The best, worst and average cases are not just O(1) but 
also Ω(1).

Since I have proven that there is *at least one* hash table that is O(1) 
for best, worst and average case, your claim there are no such hash 
tables is completely wrong, and certainly more wrong than the claim that 
Python dicts are O(1) (which is only a little bit wrong, but typically 
right).

See also "The Relativity Of Wrong":

http://chem.tufts.edu/answersinscience/relativityofwrong.htm




-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]


#68121 — Re: Balanced trees

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-03-10 08:16 +0200
SubjectRe: Balanced trees
Message-ID<87eh2atw6s.fsf@elektro.pacujo.net>
In reply to#68117
Steven D'Aprano <steve+comp.lang.python@pearwood.info>:

> Proof: I create a hash table that accepts unsigned bytes as keys, where 

The O(f(n)) notation has no meaning when n is limited.

This thing is not just pedantry. The discussion was how a balanced tree
fares in comparison with hash tables. A simplistic O(1) vs O(log n) was
presented as a proof that balanced trees don't have a chance in practice
or theory. I wasn't so easily convinced.


Marko

[toc] | [prev] | [next] | [standalone]


#68131 — Re: Balanced trees

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-03-10 08:53 +0000
SubjectRe: Balanced trees
Message-ID<531d7d73$0$29994$c3e8da3$5496439d@news.astraweb.com>
In reply to#68121
On Mon, 10 Mar 2014 08:16:43 +0200, Marko Rauhamaa wrote:

> Steven D'Aprano <steve+comp.lang.python@pearwood.info>:
> 
>> Proof: I create a hash table that accepts unsigned bytes as keys, where
> 
> The O(f(n)) notation has no meaning when n is limited.

It has obvious meaning: O(1) means that look-ups take constant time, not 
(for example) proportional to the number of keys in the table.


> This thing is not just pedantry. 

Yes it is. You're not even being pedantic for the sake of educating 
people and helping them learn. If that were the case, I would completely 
understand. Rather, it looks to me that you're being obnoxiously pedantic 
for the sake of trying to get attention. I think you are trolling. Take 
your comment that started this dispute:

    [responding to Dan Stromberg]
    > This is not just a detail: O(1) tends to be beat O(logn) 
    > pretty easily for large n.

    [to which your entire response was]
    There is no O(1) hash table.


As pedantry, it is an utter failure: it's *wrong*, for starters, but you 
didn't even make a pretence of trying to justify it. It's not 
educational, and it doesn't advance your argument in any way. And just 
*two posts later*, you acknowledged without apology or embarrassment that 
hash tables actually are O(1). So it seems that you didn't even believe 
your claim when you made it. This is why I think you are trolling.

All your comment accomplished was to wrongly contradict a well-
established and often-repeated fact that hash tables are usually O(1):

https://duckduckgo.com/html/?q=big+oh+hash+tables

which is great if your aim is to trick people into giving you attention 
(as I suppose I am giving you attention now) but useless for advancing 
the debate or educating people.

It is as if you had declared "Actually the Allies had lost World War 
Two", and then tried to justify such a ridiculously false statement on 
the basis that while they didn't actually *lose* according to the 
normally accepted meaning of the word, some of the Allies may have failed 
to accomplish every single one of their objectives in the war.


> The discussion was how a balanced tree
> fares in comparison with hash tables. A simplistic O(1) vs O(log n) was
> presented as a proof that balanced trees don't have a chance in practice
> or theory. I wasn't so easily convinced.

If you had wanted to put the case for balanced trees, you could mention 
that in practice they perform comparably to hash tables for reasonable 
sizes of data, and may require less memory. You could have made an 
attempt to teach people about the difference between O and Ω complexity, 
the difference between best case, worst case, average case, and typical 
behaviour. You could have mentioned the difference between amortized and 
non-amortized behaviour, or go into detail about the various assumptions 
made when doing Big Oh analysis (e.g. that all "simple" operations take 
constant time, which is not strictly speaking true).

Any of these things would have helped people understand your position 
better. But you did *none* of these things. It seems to me that your 
stated position is actually irrelevant to you, what you want is not 
better data structures in Python, but for people to respond to your posts.

In other words, I suspect you are trolling.

If I am right, that certainly would explain your apparent inability to 
understand the difference between "is" and == operators, your insistence 
that object IDs are addresses, and your declaration that object identity 
is philosophically untenable.



-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]


#68135 — Re: Balanced trees

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-03-10 11:41 +0200
SubjectRe: Balanced trees
Message-ID<8738iqtmp6.fsf@elektro.pacujo.net>
In reply to#68131
Steven D'Aprano <steve+comp.lang.python@pearwood.info>:

> If I am right, that certainly would explain your apparent inability to
> understand the difference between "is" and == operators, your
> insistence that object IDs are addresses, and your declaration that
> object identity is philosophically untenable.

You and I certainly have a very hard time communicating. Your
paraphrasing of my positions demonstrates that.


Marko

[toc] | [prev] | [next] | [standalone]


#68137 — Re: Balanced trees

FromNed Batchelder <ned@nedbatchelder.com>
Date2014-03-10 06:57 -0400
SubjectRe: Balanced trees
Message-ID<mailman.7990.1394449045.18130.python-list@python.org>
In reply to#68135
On 3/10/14 5:41 AM, Marko Rauhamaa wrote:
> Steven D'Aprano <steve+comp.lang.python@pearwood.info>:
>
>> If I am right, that certainly would explain your apparent inability to
>> understand the difference between "is" and == operators, your
>> insistence that object IDs are addresses, and your declaration that
>> object identity is philosophically untenable.
>
> You and I certainly have a very hard time communicating. Your
> paraphrasing of my positions demonstrates that.
>

Marko, welcome to the community.  We enjoy technical discourse, even 
debate.  It's clear that you bring a lot of knowledge, intelligence, and 
passion.  But you have been involved in a number of long contentious 
threads in the last few weeks.

You are right that you and Steven have had a hard time communicating. 
You are part of "you and Steven", it would be at least polite to 
consider that part of the reason for the difficulty has to do with your 
style.  It can be brief and contrarian, which puts people off.  Perhaps 
if you tried to understand the gap and bridge it more, people would be 
less inclined to think that you were trying to widen the gap.

--Ned.

>
> Marko
>


-- 
Ned Batchelder, http://nedbatchelder.com

[toc] | [prev] | [next] | [standalone]


#68152 — Re: Balanced trees

FromRustom Mody <rustompmody@gmail.com>
Date2014-03-10 09:01 -0700
SubjectRe: Balanced trees
Message-ID<cb333f78-785d-4700-a40c-00614f90f88f@googlegroups.com>
In reply to#68137
On Monday, March 10, 2014 4:27:13 PM UTC+5:30, Ned Batchelder wrote:
> You are right that you and Steven have had a hard time communicating. 
> You are part of "you and Steven", it would be at least polite to 
> consider that part of the reason for the difficulty has to do with your 
> style.  It can be brief and contrarian, which puts people off.  Perhaps 
> if you tried to understand the gap and bridge it more, people would be 
> less inclined to think that you were trying to widen the gap.

> --Ned.


Hi Ned

As you know on the whole I am thankful to you that you keep some order on this list.

I however find it strange and one-sided that you pull up Marko and not Steven given

1. Steven's response is almost entirely vituperative and that is in response to

2. Being pointed out that a finite-input table-lookup being called a
hash-function is a rather nonsensical claim and goes counter
to the basic tenets of asymptotic notation. (In CS unlike in math 'asymptote' is always infinity) IOW

3. If I start off with "I am going to out-pedant you..." and then goof up my 
attempted pedantry whose fault is it?

[toc] | [prev] | [next] | [standalone]


#68190 — Re: Balanced trees

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-03-11 02:02 +0000
SubjectRe: Balanced trees
Message-ID<531e6eca$0$29994$c3e8da3$5496439d@news.astraweb.com>
In reply to#68152
On Mon, 10 Mar 2014 09:01:23 -0700, Rustom Mody wrote:

> 2. Being pointed out that a finite-input table-lookup being called a
> hash-function is a rather nonsensical claim and goes counter to the
> basic tenets of asymptotic notation. (In CS unlike in math 'asymptote'
> is always infinity) IOW

That's backwards. In maths "asymptote" always implies infinity. Within 
any finite range, there must be a minimum distance that a line approaches 
a curve, beyond which it gets no closer. But that's not an asymptote: an 
asymptote requires that the line approaches the curve arbitrarily 
closely, which requires taking the limit approaching infinity.

But in computer science, while it may be possible to ignore all real-
world considerations and imagine what happens when the size of your list 
approaches infinity, that's not terribly common or useful. In reality, 
all lists and hash tables contain only a finite number of slots, many 
data types only have a finite number of possible keys, and beyond a 
certain point the Big Oh analysis breaks down. Computer scientists are 
not so foolish as to not understand this.

Big Oh analysis doesn't require behaviour as the input approaches 
infinity, but rather as the input exceeds a certain (usually unstated) 
size. "For large enough N, this function scales as O(N**2)" is a typical 
conclusion. Not "As N approaches infinity, this function scales as 
O(N**2)".

Many data types used as keys only have a fixed number of possible values 
(256 bytes, 65536 short ints, etc.) and even those with no fixed limit 
still have a practical finite limit. There's only so much matter in the 
universe, so talking about limits as the amount of data approaches 
infinity is nonsense. Where would you store it?

My example may have been extreme in its simplicity, but if you find 
yourself in the lucky circumstance that you can afford a slot for every 
possible key, and have a perfect hash function that guarantees no 
collisions, then you will have the same conclusion: guaranteed O(1) 
lookups, insertions and deletions.

http://en.wikipedia.org/wiki/Perfect_hash_function




-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]


#68192 — Re: Balanced trees

FromRoy Smith <roy@panix.com>
Date2014-03-10 22:20 -0400
SubjectRe: Balanced trees
Message-ID<roy-87AEA3.22201010032014@news.panix.com>
In reply to#68190
In article <531e6eca$0$29994$c3e8da3$5496439d@news.astraweb.com>,
 Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

> There's only so much matter in the 
> universe, so talking about limits as the amount of data approaches 
> infinity is nonsense. Where would you store it?

Funny you should ask that.  I just finished watching 
https://en.wikipedia.org/wiki/11001001 a few minutes ago.

[toc] | [prev] | [next] | [standalone]


#68200 — Re: Balanced trees

FromChris Angelico <rosuav@gmail.com>
Date2014-03-11 13:29 +1100
SubjectRe: Balanced trees
Message-ID<mailman.8035.1394517383.18130.python-list@python.org>
In reply to#68192
On Tue, Mar 11, 2014 at 1:20 PM, Roy Smith <roy@panix.com> wrote:
> In article <531e6eca$0$29994$c3e8da3$5496439d@news.astraweb.com>,
>  Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
>
>> There's only so much matter in the
>> universe, so talking about limits as the amount of data approaches
>> infinity is nonsense. Where would you store it?
>
> Funny you should ask that.  I just finished watching
> https://en.wikipedia.org/wiki/11001001 a few minutes ago.

Multiple universes and/or sub-atomic monkeys.

http://tools.ietf.org/html/rfc2795

ChrisA

[toc] | [prev] | [next] | [standalone]


#68136 — Re: Balanced trees

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2014-03-10 09:51 +0000
SubjectRe: Balanced trees
Message-ID<mailman.7989.1394445097.18130.python-list@python.org>
In reply to#68131
On 10/03/2014 08:53, Steven D'Aprano wrote:
>
> In other words, I suspect you are trolling.
>

s/suspect/know/ , he didn't make captain of my dream team for nothing, 
you know :)

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

[toc] | [prev] | [next] | [standalone]


#68143 — Re: Balanced trees

FromRoy Smith <roy@panix.com>
Date2014-03-10 09:59 -0400
SubjectRe: Balanced trees
Message-ID<roy-BE3C76.09590110032014@news.panix.com>
In reply to#68121
In article <87eh2atw6s.fsf@elektro.pacujo.net>,
 Marko Rauhamaa <marko@pacujo.net> wrote:

> Steven D'Aprano <steve+comp.lang.python@pearwood.info>:
> 
> > Proof: I create a hash table that accepts unsigned bytes as keys, where 
> 
> The O(f(n)) notation has no meaning when n is limited.
> 
> This thing is not just pedantry. The discussion was how a balanced tree
> fares in comparison with hash tables. A simplistic O(1) vs O(log n) was
> presented as a proof that balanced trees don't have a chance in practice
> or theory. I wasn't so easily convinced.
> 
> 
> Marko

On the other hand, log n, for n = 1 million, is just 20.  It's not hard 
to imagine a hash function which costs 20x what a node traversal does, 
in which case, the log n lookup is ahead for all n < 1 million.

Looking at the Songza source, I see we have one user-defined hash 
function:

    def __hash__(self):
        return hash((self.song,
                     self.user,
                     self.add_time,
                     self.delete_time,
                     self.play_first))

where those things are 2 bson.ObjectId's, 2 datetimes, and a boolean.  I 
wouldn't be surprised if that falls into the 20x node traversal bucket.  
In this case, convenience was more important than speed, so it doesn't 
matter.

The hash vs. tree argument can get very complicated.  For example, if 
your tree is not completely resident in memory, the cost of paging in a 
node will swamp everything else, and improving lookup speed will boil 
down to reducing the number of I/O operations you need.  An expensive 
hash plus a linear walk through a collision chain which was resident in 
a single memory block will beat traversing two nodes, each of which had 
to be paged in separately.

[toc] | [prev] | [next] | [standalone]


#68156 — Re: Balanced trees

FromChris Angelico <rosuav@gmail.com>
Date2014-03-11 03:20 +1100
SubjectRe: Balanced trees
Message-ID<mailman.8006.1394468411.18130.python-list@python.org>
In reply to#68143
On Tue, Mar 11, 2014 at 12:59 AM, Roy Smith <roy@panix.com> wrote:
> The hash vs. tree argument can get very complicated.  For example, if
> your tree is not completely resident in memory, the cost of paging in a
> node will swamp everything else, and improving lookup speed will boil
> down to reducing the number of I/O operations you need.  An expensive
> hash plus a linear walk through a collision chain which was resident in
> a single memory block will beat traversing two nodes, each of which had
> to be paged in separately.

Indeed, which is broadly an extension of the "cache locality" argument.

I've never actually yearned for any of the advanced operations a tree
can give (over a hash table). Usually, by the time I'm looking for
that sort of thing, I really want an on-disk database - that solves
all the problems of paging (the DBMS will make sure it reads in a
minimum of index and data pages), and a good SQL database can handle
multiple indexes in a space-efficient way. Plus, what you said about
log 1,000,000? By the time you're looking at a million records, you
probably (a) need to have them on disk somewhere anyway, and (b) don't
want to read them all into RAM before you begin.

ChrisA

[toc] | [prev] | [next] | [standalone]


Page 2 of 6 — ← Prev page 1 [2] 3 4 5 6  Next page →

Back to top | Article view | comp.lang.python


csiph-web