Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #67985 > unrolled thread

Re: Tuples and immutability

Started byDuncan Booth <duncan.booth@invalid.invalid>
First post2014-03-07 09:33 +0000
Last post2014-03-08 19:55 -0700
Articles 20 on this page of 108 — 24 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Tuples and immutability Duncan Booth <duncan.booth@invalid.invalid> - 2014-03-07 09:33 +0000
    Re: Tuples and immutability Ben Finney <ben+python@benfinney.id.au> - 2014-03-07 22:04 +1100
    Re: Tuples and immutability Chris Angelico <rosuav@gmail.com> - 2014-03-07 22:11 +1100
    Re: Tuples and immutability Peter Otten <__peter__@web.de> - 2014-03-07 12:38 +0100
    Re: Tuples and immutability Chris Angelico <rosuav@gmail.com> - 2014-03-07 22:45 +1100
    Re: Tuples and immutability Alister <alister.ware@ntlworld.com> - 2014-03-07 11:51 +0000
      Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-07 11:23 -0700
    Re: Tuples and immutability Roy Smith <roy@panix.com> - 2014-03-07 08:37 -0500
    Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-08 15:17 +1300
      Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-07 21:17 -0700
        Balanced trees (was: Re: Tuples and immutability) Marko Rauhamaa <marko@pacujo.net> - 2014-03-08 10:34 +0200
          Re: Balanced trees (was: Re: Tuples and immutability) Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-08 04:03 -0700
            Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-08 13:26 +0200
          Re: Balanced trees (was: Re: Tuples and immutability) Dan Stromberg <drsalists@gmail.com> - 2014-03-08 11:58 -0800
          Re: Balanced trees Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-08 20:37 +0000
            Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-08 23:21 +0200
              Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-08 17:22 -0500
                Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-09 11:17 +0200
              Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-08 17:31 -0800
                Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-09 11:27 +0200
                  Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-09 14:20 -0700
                    Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-09 23:32 +0200
                      Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-09 14:37 -0700
                        Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-09 23:43 +0200
                          Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-09 15:08 -0700
                            Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-10 00:24 +0200
                          Re: Balanced trees Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-09 18:04 -0600
                            Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-10 03:24 +0000
                      Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-10 03:24 +0000
                        Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-10 08:16 +0200
                          Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-10 08:53 +0000
                            Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-10 11:41 +0200
                              Re: Balanced trees Ned Batchelder <ned@nedbatchelder.com> - 2014-03-10 06:57 -0400
                                Re: Balanced trees Rustom Mody <rustompmody@gmail.com> - 2014-03-10 09:01 -0700
                                  Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-11 02:02 +0000
                                    Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-10 22:20 -0400
                                      Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 13:29 +1100
                            Re: Balanced trees Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-10 09:51 +0000
                          Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-10 09:59 -0400
                            Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 03:20 +1100
                            Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 03:24 +1100
                              Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-10 19:08 +0200
                                Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 04:17 +1100
                                  Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-10 19:34 +0200
                                    Re: Balanced trees Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-13 12:40 -0600
                                    Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-13 23:57 +0000
                                      Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-13 20:12 -0700
                                        Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-14 05:02 +0000
                                Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-10 19:24 -0400
                                  Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 10:27 +1100
                                  Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-11 01:26 +0000
                                    Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 12:45 +1100
                                    Re: Balanced trees Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-10 21:38 -0600
                                    Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 15:28 +1100
                                      Re: Balanced trees "Rhodri James" <rhodri@wildebst.org.uk> - 2014-03-12 00:57 +0000
                                    Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-11 12:12 +0200
                                      Re: Balanced trees alex23 <wuwei23@gmail.com> - 2014-03-12 10:13 +1000
                                  Re: Balanced trees Alister <alister.ware@ntlworld.com> - 2014-03-11 09:25 +0000
                                  Re: Balanced trees Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2014-03-12 10:08 +0100
                            Re: Balanced trees Tim Chase <python.list@tim.thechases.com> - 2014-03-10 11:33 -0500
                            Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 03:39 +1100
                            Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-10 18:05 -0700
                              Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-10 22:13 -0400
                        Re: Balanced trees Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-03-10 19:57 -0400
          Re: Balanced trees Joshua Landau <joshua@landau.ws> - 2014-03-15 01:13 +0000
            Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-18 00:05 +0200
              Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-18 12:26 -0700
                Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-18 22:55 +0200
                  Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-18 14:45 -0700
                    Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-19 00:03 +0200
                      Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-18 15:21 -0700
                        Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-19 01:11 +0200
                          Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-19 01:15 +0000
                            Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-19 10:49 +0200
                              Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-19 13:42 +0000
                                Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-19 15:54 +0200
                                Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-19 10:06 -0400
                        Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-19 01:15 +0000
                          Re: Balanced trees Ethan Furman <ethan@stoneleaf.us> - 2014-03-19 08:15 -0700
                    Re: Balanced trees "Rhodri James" <rhodri@wildebst.org.uk> - 2014-03-20 02:16 +0000
                      Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-19 19:34 -0700
                  Re: Balanced trees Chris Kaynor <ckaynor@zindagigames.com> - 2014-03-18 18:02 -0700
              Re: Balanced trees Daniel Stutzbach <stutzbach@google.com> - 2014-03-18 13:18 -0700
          blist in standard library  (was Re: Balanced trees) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-15 12:31 +0000
          Re: Balanced trees Daniel Stutzbach <stutzbach@google.com> - 2014-03-17 14:16 -0700
          Re: Balanced trees Joshua Landau <joshua@landau.ws> - 2014-03-18 00:08 +0000
          Re: Balanced trees Daniel Stutzbach <stutzbach@google.com> - 2014-03-17 18:01 -0700
          Re: Balanced trees Joshua Landau <joshua@landau.ws> - 2014-03-18 07:46 +0000
        Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-09 13:40 +1300
          Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-08 19:39 -0700
            Re: Tuples and immutability Marko Rauhamaa <marko@pacujo.net> - 2014-03-09 11:35 +0200
            Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-10 11:03 +1300
              Re: Tuples and immutability Terry Reedy <tjreedy@udel.edu> - 2014-03-09 19:00 -0400
              Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-09 17:42 -0600
                Re: Tuples and immutability Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-10 02:37 +0000
                  Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-10 02:35 -0600
                    Re: Tuples and immutability Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-10 09:13 +0000
                    Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-11 18:15 +1300
                Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-11 18:03 +1300
                  Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-11 04:39 -0600
                    Re: Tuples and immutability Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-11 16:46 +0000
                      Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-12 10:23 +1300
                      Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-11 17:06 -0600
                        Re: Tuples and immutability Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-12 23:20 +0000
                          Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-12 19:35 -0600
                          Re: Tuples and immutability Terry Reedy <tjreedy@udel.edu> - 2014-03-12 22:09 -0400
        Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-09 13:45 +1300
          Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-08 19:55 -0700

Page 3 of 6 — ← Prev page 1 2 [3] 4 5 6  Next page →


#68157 — Re: Balanced trees

FromChris Angelico <rosuav@gmail.com>
Date2014-03-11 03:24 +1100
SubjectRe: Balanced trees
Message-ID<mailman.8007.1394468681.18130.python-list@python.org>
In reply to#68143
On Tue, Mar 11, 2014 at 12:59 AM, Roy Smith <roy@panix.com> wrote:
> Looking at the Songza source, I see we have one user-defined hash
> function:
>
>     def __hash__(self):
>         return hash((self.song,
>                      self.user,
>                      self.add_time,
>                      self.delete_time,
>                      self.play_first))
>
> where those things are 2 bson.ObjectId's, 2 datetimes, and a boolean.  I
> wouldn't be surprised if that falls into the 20x node traversal bucket.
> In this case, convenience was more important than speed, so it doesn't
> matter.

The only difference between a tree and a hash here is that the tree
might be able to short-cut the comparisons. But if there are a whole
bunch of songs with the same "song" and "user", then the tree has to
compare (song->song? same; user->user? same; add_time->add_time?
left/right) multiple times. So what you're saying is that the hash
table has consistency but the tree could do better or could do worse.
(Imagine, worst case, all one million records have the same
song/user/add_time and you need to do twenty comparisons involving
four fields. That's gotta be worse than one hashing of five fields.)

ChrisA

[toc] | [prev] | [next] | [standalone]


#68163 — Re: Balanced trees

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-03-10 19:08 +0200
SubjectRe: Balanced trees
Message-ID<8761nmrnfk.fsf@elektro.pacujo.net>
In reply to#68157
Chris Angelico <rosuav@gmail.com>:

> The only difference between a tree and a hash here is that the tree
> might be able to short-cut the comparisons. But if there are a whole
> bunch of songs with the same "song" and "user", then the tree has to
> compare (song->song? same; user->user? same; add_time->add_time?
> left/right) multiple times. So what you're saying is that the hash
> table has consistency but the tree could do better or could do worse.
> (Imagine, worst case, all one million records have the same
> song/user/add_time and you need to do twenty comparisons involving
> four fields. That's gotta be worse than one hashing of five fields.)

You are right that in the end there probably is an equality test
involving all fields of the key. However, the intermediate comparisons
are often dealt with much more immediately. For example, to compare the
words "hello" and "world", you only need to look at the first letter.

Anyway, this whole debate is rather unnecessary since every developer is
supposed to have both weapons in their arsenal.


Marko

[toc] | [prev] | [next] | [standalone]


#68164 — Re: Balanced trees

FromChris Angelico <rosuav@gmail.com>
Date2014-03-11 04:17 +1100
SubjectRe: Balanced trees
Message-ID<mailman.8013.1394471845.18130.python-list@python.org>
In reply to#68163
On Tue, Mar 11, 2014 at 4:08 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
> Chris Angelico <rosuav@gmail.com>:
>
>> The only difference between a tree and a hash here is that the tree
>> might be able to short-cut the comparisons. But if there are a whole
>> bunch of songs with the same "song" and "user", then the tree has to
>> compare (song->song? same; user->user? same; add_time->add_time?
>> left/right) multiple times. So what you're saying is that the hash
>> table has consistency but the tree could do better or could do worse.
>> (Imagine, worst case, all one million records have the same
>> song/user/add_time and you need to do twenty comparisons involving
>> four fields. That's gotta be worse than one hashing of five fields.)
>
> You are right that in the end there probably is an equality test
> involving all fields of the key. However, the intermediate comparisons
> are often dealt with much more immediately. For example, to compare the
> words "hello" and "world", you only need to look at the first letter.

Right, which is what I meant about the consistency of a hash table.
You know, with a hash table, that you're going to need to calculate
the hash across everything. The tree lets you cheaply add
disambiguation fields, knowing they'll be used only if they're needed.

Oddly enough, the consistency in design is inverted by the
predictability under attack. If an attacker might be able to generate
hash collisions, the tree (assuming it's self-balancing) will have its
worst-case much closer to its average/typical cases, which means the
tree is more consistent. Strange how that goes, sometimes.

> Anyway, this whole debate is rather unnecessary since every developer is
> supposed to have both weapons in their arsenal.

Supposed to have? What does that mean, a language isn't ISO-compliant
unless it provides both? With a high level language like Python, using
the provided hash table will almost always cream any hand-built tree,
no matter how advantageous the data is to the tree.

ChrisA

[toc] | [prev] | [next] | [standalone]


#68166 — Re: Balanced trees

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-03-10 19:34 +0200
SubjectRe: Balanced trees
Message-ID<87zjkyq7nr.fsf@elektro.pacujo.net>
In reply to#68164
Chris Angelico <rosuav@gmail.com>:

> Supposed to have? What does that mean, a language isn't ISO-compliant
> unless it provides both?

It's an ancient, fundamental data structure, right up there with dynamic
lists. There's no reason it shouldn't be available in every programming
environment.

> With a high level language like Python, using the provided hash table
> will almost always cream any hand-built tree, no matter how
> advantageous the data is to the tree.

The main thing is there are use cases where order is essential. That's
why I have had to implement the AVL tree in Python myself. No biggie,
but a C implementation would probably be much faster. Also, a standard
version would likely be reviewed and tested better and have all Pythonic
accessors in place.


Marko

[toc] | [prev] | [next] | [standalone]


#68339 — Re: Balanced trees

FromIan Kelly <ian.g.kelly@gmail.com>
Date2014-03-13 12:40 -0600
SubjectRe: Balanced trees
Message-ID<mailman.8133.1394736421.18130.python-list@python.org>
In reply to#68166
On Mon, Mar 10, 2014 at 11:34 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
> The main thing is there are use cases where order is essential. That's
> why I have had to implement the AVL tree in Python myself. No biggie,
> but a C implementation would probably be much faster. Also, a standard
> version would likely be reviewed and tested better and have all Pythonic
> accessors in place.

That last is actually quite easy to achieve, especially using the
built-in ABCs.  Just subclass from collections.MutableMapping and
implement __len__, __iter__, __getitem__, __setitem__ and __delitem__;
and default implementations of the rest will be mixed in for you.  Or
you can override those with optimized implementations.

[toc] | [prev] | [next] | [standalone]


#68344 — Re: Balanced trees

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-03-13 23:57 +0000
SubjectRe: Balanced trees
Message-ID<532245fa$0$29994$c3e8da3$5496439d@news.astraweb.com>
In reply to#68166
On Mon, 10 Mar 2014 19:34:48 +0200, Marko Rauhamaa wrote:

>> With a high level language like Python, using the provided hash table
>> will almost always cream any hand-built tree, no matter how
>> advantageous the data is to the tree.
> 
> The main thing is there are use cases where order is essential. That's
> why I have had to implement the AVL tree in Python myself.

from collections import OrderedDict

gives you a fast, ordered mapping. In this case, the order is that of 
insertion order. If you want sort order instead, for "small" amounts of 
data, say below a million or ten million items, you're likely to have 
acceptable if not superior results by just sorting them items when and as 
needed.

Otherwise, I expect that following the basic technique of OrderedDict, 
and building your data structure from a dict and an associated list, 
keeping the two in sync, will be faster than a pure Python implementation 
of an AVL tree. But of course only testing it will make that clear.

http://code.activestate.com/recipes/576693-ordered-dictionary-for-py24/

Modifying the above recipe to keep items in something other than 
insertion order is left as an exercise.




-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]


#68354 — Re: Balanced trees

FromDan Stromberg <drsalists@gmail.com>
Date2014-03-13 20:12 -0700
SubjectRe: Balanced trees
Message-ID<mailman.8142.1394766770.18130.python-list@python.org>
In reply to#68344
On Thu, Mar 13, 2014 at 4:57 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> On Mon, 10 Mar 2014 19:34:48 +0200, Marko Rauhamaa wrote:
>
>>> With a high level language like Python, using the provided hash table
>>> will almost always cream any hand-built tree, no matter how
>>> advantageous the data is to the tree.
>>
>> The main thing is there are use cases where order is essential. That's
>> why I have had to implement the AVL tree in Python myself.
>
> from collections import OrderedDict
>
> gives you a fast, ordered mapping. In this case, the order is that of
> insertion order. If you want sort order instead, for "small" amounts of
> data, say below a million or ten million items, you're likely to have
> acceptable if not superior results by just sorting them items when and as
> needed.

This is one of my pet things.

Sorting the same (slightly tweaked) data inside of a tight loop is
rarely a good idea - despite the fact that the "sort" itself tends to
be O(n) with Python's rather awesome builtin sorting algorithm.  This
is because sorting inside a loop tends to yield O(n^2) algorithms, or
even O((n^2)*logn).

But if you're sorting once at the end of whatever other processing,
sorting is great.  It's (still) O(nlogn), but it's got a terrific
constant.

[toc] | [prev] | [next] | [standalone]


#68356 — Re: Balanced trees

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-03-14 05:02 +0000
SubjectRe: Balanced trees
Message-ID<53228d5d$0$29994$c3e8da3$5496439d@news.astraweb.com>
In reply to#68354
On Thu, 13 Mar 2014 20:12:41 -0700, Dan Stromberg wrote:

> Sorting the same (slightly tweaked) data inside of a tight loop is
> rarely a good idea - despite the fact that the "sort" itself tends to be
> O(n) with Python's rather awesome builtin sorting algorithm.  This is
> because sorting inside a loop tends to yield O(n^2) algorithms, or even
> O((n^2)*logn).

I agree with you except for the word "rarely". It depends on how much 
data you have, and in my experience most people think that N = 100 is a 
lot :-)

Remember that Big Oh doesn't say anything about the constant of 
proportionality. If you have one function that behaves like O(N) but with 
a constant factor of 1000, and another function that behaves like 
O(N**2) with a constant factor of 0.001, the second, theoretically 
slower, function will actually be faster for N < one million.

So *in practice*, a solution that involves a theoretically worse Big Oh 
function but a significantly lower constant factor may turn out to be 
better for all the values you care about. Given the difference in speed 
between Python's built-ins, and the code you can write yourself in pure 
Python, it is normally better to push as much work as you can onto the 
built-ins.

And of course, we should not make assumptions as to what is faster and 
what is slower. After 15+ years, I'm still surprised about what ends up 
being faster in Python.


> But if you're sorting once at the end of whatever other processing,
> sorting is great.  It's (still) O(nlogn), but it's got a terrific
> constant.

Exactly!



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#68179 — Re: Balanced trees

FromRoy Smith <roy@panix.com>
Date2014-03-10 19:24 -0400
SubjectRe: Balanced trees
Message-ID<roy-F30152.19240710032014@news.panix.com>
In reply to#68163
In article <8761nmrnfk.fsf@elektro.pacujo.net>,
 Marko Rauhamaa <marko@pacujo.net> wrote:

> Anyway, this whole debate is rather unnecessary since every developer is
> supposed to have both weapons in their arsenal.

The problem with having a choice is that it opens up the possibility of 
making the wrong one :-)

As this discussion has shown, figuring out whether a hash table or a 
tree is better for a given problem is non-trivial.  My guess is that if 
you gave 1000 typical developers both data structures and let them pick 
freely, the number of cases where it really mattered and the developer 
picked the right one would be approximately equal to the number of cases 
where they picked the wrong one.

[toc] | [prev] | [next] | [standalone]


#68180 — Re: Balanced trees

FromChris Angelico <rosuav@gmail.com>
Date2014-03-11 10:27 +1100
SubjectRe: Balanced trees
Message-ID<mailman.8025.1394494024.18130.python-list@python.org>
In reply to#68179
On Tue, Mar 11, 2014 at 10:24 AM, Roy Smith <roy@panix.com> wrote:
> As this discussion has shown, figuring out whether a hash table or a
> tree is better for a given problem is non-trivial.  My guess is that if
> you gave 1000 typical developers both data structures and let them pick
> freely, the number of cases where it really mattered and the developer
> picked the right one would be approximately equal to the number of cases
> where they picked the wrong one.

And both would be utterly dwarfed by the cases where it doesn't
matter, and everyone's paying a cost (having to choose) for no
benefit.

ChrisA

[toc] | [prev] | [next] | [standalone]


#68187 — Re: Balanced trees

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-03-11 01:26 +0000
SubjectRe: Balanced trees
Message-ID<531e6643$0$29994$c3e8da3$5496439d@news.astraweb.com>
In reply to#68179
On Mon, 10 Mar 2014 19:24:07 -0400, Roy Smith wrote:

> In article <8761nmrnfk.fsf@elektro.pacujo.net>,
>  Marko Rauhamaa <marko@pacujo.net> wrote:
> 
>> Anyway, this whole debate is rather unnecessary since every developer
>> is supposed to have both weapons in their arsenal.
> 
> The problem with having a choice is that it opens up the possibility of
> making the wrong one :-)
> 
> As this discussion has shown, figuring out whether a hash table or a
> tree is better for a given problem is non-trivial.  My guess is that if
> you gave 1000 typical developers both data structures and let them pick
> freely, the number of cases where it really mattered and the developer
> picked the right one would be approximately equal to the number of cases
> where they picked the wrong one.

You're very optimistic.

In my experience, the average developer has an amazing talent for 
pessimising code when they think they are optimising it.




-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]


#68189 — Re: Balanced trees

FromChris Angelico <rosuav@gmail.com>
Date2014-03-11 12:45 +1100
SubjectRe: Balanced trees
Message-ID<mailman.8032.1394502844.18130.python-list@python.org>
In reply to#68187
On Tue, Mar 11, 2014 at 12:26 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> In my experience, the average developer has an amazing talent for
> pessimising code when they think they are optimising it.

I remember a number of incidents from personal experience when I was a
*very* average developer. One time, writing C++ code, I looked at the
disassembly and decided the compiler was doing a terrible job. No no,
I could make this so much better by using the 80x86 "REP MOVSW"
command (or commands, depending on your point of view). That would be
so much better than all those separate operations the silly compiler
was doing! Roughly an hour of fiddling later, making sure it all still
worked correctly, I discover that... hmm, it's not actually any
faster. Turns out the 80x86 string opcodes are really inefficient;
they're short (a one-byte command that says "read a
byte/word/doubleword from DS:SI, write it to ES:DI, and increment or
decrement SI and DI"), but not fast. In my defense, I at least did
measure before-and-after, and learned that I should back out that
change :)

ChrisA

[toc] | [prev] | [next] | [standalone]


#68194 — Re: Balanced trees

FromIan Kelly <ian.g.kelly@gmail.com>
Date2014-03-10 21:38 -0600
SubjectRe: Balanced trees
Message-ID<mailman.8033.1394509126.18130.python-list@python.org>
In reply to#68187
On Mon, Mar 10, 2014 at 7:45 PM, Chris Angelico <rosuav@gmail.com> wrote:
> On Tue, Mar 11, 2014 at 12:26 PM, Steven D'Aprano
> <steve+comp.lang.python@pearwood.info> wrote:
>> In my experience, the average developer has an amazing talent for
>> pessimising code when they think they are optimising it.
>
> I remember a number of incidents from personal experience when I was a
> *very* average developer. One time, writing C++ code, I looked at the
> disassembly and decided the compiler was doing a terrible job. No no,
> I could make this so much better by using the 80x86 "REP MOVSW"
> command (or commands, depending on your point of view). That would be
> so much better than all those separate operations the silly compiler
> was doing! Roughly an hour of fiddling later, making sure it all still
> worked correctly, I discover that... hmm, it's not actually any
> faster. Turns out the 80x86 string opcodes are really inefficient;
> they're short (a one-byte command that says "read a
> byte/word/doubleword from DS:SI, write it to ES:DI, and increment or
> decrement SI and DI"), but not fast. In my defense, I at least did
> measure before-and-after, and learned that I should back out that
> change :)

Better to have tried and failed though than to have simply accepted
what the compiler was doing with no verification at all.

[toc] | [prev] | [next] | [standalone]


#68195 — Re: Balanced trees

FromChris Angelico <rosuav@gmail.com>
Date2014-03-11 15:28 +1100
SubjectRe: Balanced trees
Message-ID<mailman.8034.1394512113.18130.python-list@python.org>
In reply to#68187
On Tue, Mar 11, 2014 at 2:38 PM, Ian Kelly <ian.g.kelly@gmail.com> wrote:
> On Mon, Mar 10, 2014 at 7:45 PM, Chris Angelico <rosuav@gmail.com> wrote:
>> No no,
>> I could make this so much better by using the 80x86 "REP MOVSW"
>> command (or commands, depending on your point of view). That would be
>> so much better than all those separate operations the silly compiler
>> was doing! Roughly an hour of fiddling later, making sure it all still
>> worked correctly, I discover that... hmm, it's not actually any
>> faster.
>
> Better to have tried and failed though than to have simply accepted
> what the compiler was doing with no verification at all.

Maybe. But I've learned now that one guy who used to do assembly
language programming on an 8086 is unlikely to discover something that
the many authors of a C compiler haven't noticed. Yes, it's possible
there'll be something specific to my code, like if I'm doing a
strcpy-like operation that isn't *actually* strcpy (the function will
be optimized heavily, but a C-level loop might not be recognized), but
it's more likely the compiler knows better than I do.

That, by the way, was before I realized that *interpreter* writers are
more expert than I am, too, and therefore that I can trust a
heavily-optimized high level language to run my code faster than I
could write equivalent C.

ChrisA

[toc] | [prev] | [next] | [standalone]


#68254 — Re: Balanced trees

From"Rhodri James" <rhodri@wildebst.org.uk>
Date2014-03-12 00:57 +0000
SubjectRe: Balanced trees
Message-ID<op.xck3mxcw5079vu@gnudebeest>
In reply to#68195
On Tue, 11 Mar 2014 04:28:25 -0000, Chris Angelico <rosuav@gmail.com>  
wrote:

> On Tue, Mar 11, 2014 at 2:38 PM, Ian Kelly <ian.g.kelly@gmail.com> wrote:
>> On Mon, Mar 10, 2014 at 7:45 PM, Chris Angelico <rosuav@gmail.com>  
>> wrote:
>>> No no,
>>> I could make this so much better by using the 80x86 "REP MOVSW"
>>> command (or commands, depending on your point of view). That would be
>>> so much better than all those separate operations the silly compiler
>>> was doing! Roughly an hour of fiddling later, making sure it all still
>>> worked correctly, I discover that... hmm, it's not actually any
>>> faster.
>>
>> Better to have tried and failed though than to have simply accepted
>> what the compiler was doing with no verification at all.
>
> Maybe. But I've learned now that one guy who used to do assembly
> language programming on an 8086 is unlikely to discover something that
> the many authors of a C compiler haven't noticed. Yes, it's possible
> there'll be something specific to my code, like if I'm doing a
> strcpy-like operation that isn't *actually* strcpy (the function will
> be optimized heavily, but a C-level loop might not be recognized), but
> it's more likely the compiler knows better than I do.

That might be true for x86, but it isn't true for ARM for example.   
Apparently it's algorithmically hard to generate ARM code that makes  
efficient use of conditional instructions, and I can almost always write  
tighter, faster code than the compiler generates in consequence.  It's not  
always worth the extra coding time (longer now that I'm not in practise),  
but that's a separate matter.


-- 
Rhodri James *-* Wildebeest Herder to the Masses

[toc] | [prev] | [next] | [standalone]


#68206 — Re: Balanced trees

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-03-11 12:12 +0200
SubjectRe: Balanced trees
Message-ID<87wqg1oxh4.fsf@elektro.pacujo.net>
In reply to#68187
Steven D'Aprano <steve+comp.lang.python@pearwood.info>:

> On Mon, 10 Mar 2014 19:24:07 -0400, Roy Smith wrote:
>
>> In article <8761nmrnfk.fsf@elektro.pacujo.net>,
>>  Marko Rauhamaa <marko@pacujo.net> wrote:
>> 
>>> Anyway, this whole debate is rather unnecessary since every developer
>>> is supposed to have both weapons in their arsenal.
>> 
>> The problem with having a choice is that it opens up the possibility of
>> making the wrong one :-)
>
> [...]
>
> In my experience, the average developer has an amazing talent for 
> pessimising code when they think they are optimising it.

Java is the straitjacket that prevents the "average developer" from
shooting himself in the foot.

Python should let skilled professionals do their work. Thankfully, for
the most part, it does.


Marko

[toc] | [prev] | [next] | [standalone]


#68253 — Re: Balanced trees

Fromalex23 <wuwei23@gmail.com>
Date2014-03-12 10:13 +1000
SubjectRe: Balanced trees
Message-ID<lfo8qf$9sm$1@dont-email.me>
In reply to#68206
On 11/03/2014 8:12 PM, Marko Rauhamaa wrote:
> Python should let skilled professionals do their work. Thankfully, for
> the most part, it does.

Skilled professionals don't solely rely on the standard library, either. 
If you know you need a balanced tree, you'll also know where to find an 
implementation of one.

[toc] | [prev] | [next] | [standalone]


#68205 — Re: Balanced trees

FromAlister <alister.ware@ntlworld.com>
Date2014-03-11 09:25 +0000
SubjectRe: Balanced trees
Message-ID<ODATu.44616$a74.40039@fx20.am4>
In reply to#68179
On Mon, 10 Mar 2014 19:24:07 -0400, Roy Smith wrote:

> In article <8761nmrnfk.fsf@elektro.pacujo.net>,
>  Marko Rauhamaa <marko@pacujo.net> wrote:
> 
>> Anyway, this whole debate is rather unnecessary since every developer
>> is supposed to have both weapons in their arsenal.
> 
> The problem with having a choice is that it opens up the possibility of
> making the wrong one :-)
> 
> As this discussion has shown, figuring out whether a hash table or a
> tree is better for a given problem is non-trivial.  My guess is that if
> you gave 1000 typical developers both data structures and let them pick
> freely, the number of cases where it really mattered and the developer
> picked the right one would be approximately equal to the number of cases
> where they picked the wrong one.

Perhaps you are not familiar with the 50/50/90 rule?

Given any 50/50 choice you are 90% certain to pick the wrong one :-)



-- 
I hold it, that a little rebellion, now and then, is a good thing...
		-- Thomas Jefferson

[toc] | [prev] | [next] | [standalone]


#68263 — Re: Balanced trees

FromAntoon Pardon <antoon.pardon@rece.vub.ac.be>
Date2014-03-12 10:08 +0100
SubjectRe: Balanced trees
Message-ID<mailman.8075.1394615335.18130.python-list@python.org>
In reply to#68179
Op 11-03-14 00:24, Roy Smith schreef:

> In article <8761nmrnfk.fsf@elektro.pacujo.net>,
>  Marko Rauhamaa <marko@pacujo.net> wrote:
>
>> Anyway, this whole debate is rather unnecessary since every developer is
>> supposed to have both weapons in their arsenal.
> The problem with having a choice is that it opens up the possibility of 
> making the wrong one :-)

This is just a standard defense for the status quo. Introducing the decimal
module also added a choice.

> As this discussion has shown, figuring out whether a hash table or a 
> tree is better for a given problem is non-trivial.  My guess is that if 
> you gave 1000 typical developers both data structures and let them pick 
> freely, the number of cases where it really mattered and the developer 
> picked the right one would be approximately equal to the number of cases 
> where they picked the wrong one.

You are only illustrating one part. How about all those cases now where the
wrong choice is more or less forced on the developer for lack of the alternative?

-- 
Antoon Pardon

[toc] | [prev] | [next] | [standalone]


#68159 — Re: Balanced trees

FromTim Chase <python.list@tim.thechases.com>
Date2014-03-10 11:33 -0500
SubjectRe: Balanced trees
Message-ID<mailman.8009.1394469217.18130.python-list@python.org>
In reply to#68143
On 2014-03-11 03:24, Chris Angelico wrote:
> Imagine, worst case, all one million records have the same
> song/user/add_time and you need to do twenty comparisons involving
> four fields. That's gotta be worse than one hashing of five fields.

And if you have one million songs that are indistinguishable, you
have Pop Top-40 playlists.  And that is *definitely* worse than
hashes OR trees.  :-)

-tkc

[toc] | [prev] | [next] | [standalone]


Page 3 of 6 — ← Prev page 1 2 [3] 4 5 6  Next page →

Back to top | Article view | comp.lang.python


csiph-web