Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #67985 > unrolled thread
| Started by | Duncan Booth <duncan.booth@invalid.invalid> |
|---|---|
| First post | 2014-03-07 09:33 +0000 |
| Last post | 2014-03-08 19:55 -0700 |
| Articles | 20 on this page of 108 — 24 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Tuples and immutability Duncan Booth <duncan.booth@invalid.invalid> - 2014-03-07 09:33 +0000
Re: Tuples and immutability Ben Finney <ben+python@benfinney.id.au> - 2014-03-07 22:04 +1100
Re: Tuples and immutability Chris Angelico <rosuav@gmail.com> - 2014-03-07 22:11 +1100
Re: Tuples and immutability Peter Otten <__peter__@web.de> - 2014-03-07 12:38 +0100
Re: Tuples and immutability Chris Angelico <rosuav@gmail.com> - 2014-03-07 22:45 +1100
Re: Tuples and immutability Alister <alister.ware@ntlworld.com> - 2014-03-07 11:51 +0000
Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-07 11:23 -0700
Re: Tuples and immutability Roy Smith <roy@panix.com> - 2014-03-07 08:37 -0500
Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-08 15:17 +1300
Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-07 21:17 -0700
Balanced trees (was: Re: Tuples and immutability) Marko Rauhamaa <marko@pacujo.net> - 2014-03-08 10:34 +0200
Re: Balanced trees (was: Re: Tuples and immutability) Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-08 04:03 -0700
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-08 13:26 +0200
Re: Balanced trees (was: Re: Tuples and immutability) Dan Stromberg <drsalists@gmail.com> - 2014-03-08 11:58 -0800
Re: Balanced trees Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-08 20:37 +0000
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-08 23:21 +0200
Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-08 17:22 -0500
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-09 11:17 +0200
Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-08 17:31 -0800
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-09 11:27 +0200
Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-09 14:20 -0700
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-09 23:32 +0200
Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-09 14:37 -0700
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-09 23:43 +0200
Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-09 15:08 -0700
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-10 00:24 +0200
Re: Balanced trees Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-09 18:04 -0600
Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-10 03:24 +0000
Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-10 03:24 +0000
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-10 08:16 +0200
Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-10 08:53 +0000
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-10 11:41 +0200
Re: Balanced trees Ned Batchelder <ned@nedbatchelder.com> - 2014-03-10 06:57 -0400
Re: Balanced trees Rustom Mody <rustompmody@gmail.com> - 2014-03-10 09:01 -0700
Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-11 02:02 +0000
Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-10 22:20 -0400
Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 13:29 +1100
Re: Balanced trees Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-10 09:51 +0000
Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-10 09:59 -0400
Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 03:20 +1100
Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 03:24 +1100
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-10 19:08 +0200
Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 04:17 +1100
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-10 19:34 +0200
Re: Balanced trees Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-13 12:40 -0600
Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-13 23:57 +0000
Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-13 20:12 -0700
Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-14 05:02 +0000
Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-10 19:24 -0400
Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 10:27 +1100
Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-11 01:26 +0000
Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 12:45 +1100
Re: Balanced trees Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-10 21:38 -0600
Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 15:28 +1100
Re: Balanced trees "Rhodri James" <rhodri@wildebst.org.uk> - 2014-03-12 00:57 +0000
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-11 12:12 +0200
Re: Balanced trees alex23 <wuwei23@gmail.com> - 2014-03-12 10:13 +1000
Re: Balanced trees Alister <alister.ware@ntlworld.com> - 2014-03-11 09:25 +0000
Re: Balanced trees Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2014-03-12 10:08 +0100
Re: Balanced trees Tim Chase <python.list@tim.thechases.com> - 2014-03-10 11:33 -0500
Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 03:39 +1100
Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-10 18:05 -0700
Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-10 22:13 -0400
Re: Balanced trees Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-03-10 19:57 -0400
Re: Balanced trees Joshua Landau <joshua@landau.ws> - 2014-03-15 01:13 +0000
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-18 00:05 +0200
Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-18 12:26 -0700
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-18 22:55 +0200
Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-18 14:45 -0700
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-19 00:03 +0200
Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-18 15:21 -0700
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-19 01:11 +0200
Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-19 01:15 +0000
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-19 10:49 +0200
Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-19 13:42 +0000
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-19 15:54 +0200
Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-19 10:06 -0400
Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-19 01:15 +0000
Re: Balanced trees Ethan Furman <ethan@stoneleaf.us> - 2014-03-19 08:15 -0700
Re: Balanced trees "Rhodri James" <rhodri@wildebst.org.uk> - 2014-03-20 02:16 +0000
Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-19 19:34 -0700
Re: Balanced trees Chris Kaynor <ckaynor@zindagigames.com> - 2014-03-18 18:02 -0700
Re: Balanced trees Daniel Stutzbach <stutzbach@google.com> - 2014-03-18 13:18 -0700
blist in standard library (was Re: Balanced trees) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-15 12:31 +0000
Re: Balanced trees Daniel Stutzbach <stutzbach@google.com> - 2014-03-17 14:16 -0700
Re: Balanced trees Joshua Landau <joshua@landau.ws> - 2014-03-18 00:08 +0000
Re: Balanced trees Daniel Stutzbach <stutzbach@google.com> - 2014-03-17 18:01 -0700
Re: Balanced trees Joshua Landau <joshua@landau.ws> - 2014-03-18 07:46 +0000
Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-09 13:40 +1300
Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-08 19:39 -0700
Re: Tuples and immutability Marko Rauhamaa <marko@pacujo.net> - 2014-03-09 11:35 +0200
Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-10 11:03 +1300
Re: Tuples and immutability Terry Reedy <tjreedy@udel.edu> - 2014-03-09 19:00 -0400
Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-09 17:42 -0600
Re: Tuples and immutability Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-10 02:37 +0000
Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-10 02:35 -0600
Re: Tuples and immutability Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-10 09:13 +0000
Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-11 18:15 +1300
Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-11 18:03 +1300
Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-11 04:39 -0600
Re: Tuples and immutability Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-11 16:46 +0000
Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-12 10:23 +1300
Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-11 17:06 -0600
Re: Tuples and immutability Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-12 23:20 +0000
Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-12 19:35 -0600
Re: Tuples and immutability Terry Reedy <tjreedy@udel.edu> - 2014-03-12 22:09 -0400
Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-09 13:45 +1300
Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-08 19:55 -0700
Page 5 of 6 — ← Prev page 1 2 3 4 [5] 6 Next page →
| From | Dan Stromberg <drsalists@gmail.com> |
|---|---|
| Date | 2014-03-19 19:34 -0700 |
| Subject | Re: Balanced trees |
| Message-ID | <mailman.8287.1395282892.18130.python-list@python.org> |
| In reply to | #68565 |
On Wed, Mar 19, 2014 at 7:16 PM, Rhodri James <rhodri@wildebst.org.uk> wrote: > 65536 is a suspiciously round number. You might want to double- > check that there's no 16-bit overflow causing something unexpected. It's because I'm using powers of 2. All the numbers in the report are round in hex.
[toc] | [prev] | [next] | [standalone]
| From | Chris Kaynor <ckaynor@zindagigames.com> |
|---|---|
| Date | 2014-03-18 18:02 -0700 |
| Subject | Re: Balanced trees |
| Message-ID | <mailman.8267.1395191378.18130.python-list@python.org> |
| In reply to | #68518 |
[Multipart message — attachments visible in raw view] — view raw
> > On Tue, Mar 18, 2014 at 1:55 PM, Marko Rauhamaa <marko@pacujo.net> wrote: > Dan Stromberg <drsalists@gmail.com>: >> > The results are at >> > >> http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/2014-03/ > > Size: 1048576, duration: 75.3, dictionary type: dict > [...] > Size: 262144, duration: 66.1, dictionary type: AVL_tree > [...] > Size: 65536, duration: 77.3, dictionary type: blist.sorteddict > Taking a quick look at this, I think it might be made much clearer if the number/second were added to the output. While it can be computed off the displayed data, it would make it much easier to compare in the table view by including it. Something like: Size: 1048576, duration: 75.3, dictionary type: 13925/second: dict Size: 262144, duration: 66.1, dictionary type: 3965/second: AVL_tree Size: 65536, duration: 77.3, dictionary type: 847/second: blist.sorteddict Chris
[toc] | [prev] | [next] | [standalone]
| From | Daniel Stutzbach <stutzbach@google.com> |
|---|---|
| Date | 2014-03-18 13:18 -0700 |
| Subject | Re: Balanced trees |
| Message-ID | <mailman.8259.1395173969.18130.python-list@python.org> |
| In reply to | #68464 |
[Multipart message — attachments visible in raw view] — view raw
On Tue, Mar 18, 2014 at 12:26 PM, Dan Stromberg <drsalists@gmail.com> wrote: > In short, blist.sorteddict didn't do that well, despite being in C. > blist.blist is written in C, but blist.sorteddict is written in Python on top of blist.blist. It won't perform as well as a class written entirely in C that has similar asymptotic properties. If the objects in the tree are non-trivial (i.e., not a built-in type), the cost of calling the __lt__ method may be more important than the choice of blist vs. AVL tree, but I haven't tested it. -- Daniel Stutzbach
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2014-03-15 12:31 +0000 |
| Subject | blist in standard library (was Re: Balanced trees) |
| Message-ID | <mailman.8155.1394886716.18130.python-list@python.org> |
| In reply to | #68026 |
On 15/03/2014 01:13, Joshua Landau wrote: > On 8 March 2014 20:37, Mark Lawrence <breamoreboy@yahoo.co.uk> wrote: >> I've found this link useful http://kmike.ru/python-data-structures/ >> >> I also don't want all sorts of data structures added to the Python library. >> I believe that there are advantages to leaving specialist data structures on >> pypi or other sites, plus it means Python in a Nutshell can still fit in >> your pocket and not a 40 ton articulated lorry, unlike the Java equivalent. > > The thing we really need is for the blist containers to become stdlib > (but not to replace the current list implementation). The rejected PEP > (http://legacy.python.org/dev/peps/pep-3128/) misses a few important > points, largely in how the "log(n)" has a really large base: > random.choice went from 1.2µs to 1.6µs from n=1 to n=10⁸, vs 1.2µs for > a standard list. > > Further, it's worth considering a few advantages: > > * copy is O(1), allowing code to avoid mutation by just copying its > input, which is good practice. > > * FIFO is effectively O(1), as the time just about doubles from n=1 to > n=10⁸ so will never actually branch that much. There is still a speed > benefit of collections.deque, but it's much, much less significant. > This is very useful when considering usage as a multi-purpose data > structure, and removes demand for explicit linked lists (which have > foolishly been reimplemented loads of times). > > * It reduces demand for trees: > * There are efficient implementations of sortedlist, sortedset and > sorteddict. > * Slicing, slice assignment and slice deletion are really fast. > * Addition of lists is sublinear. Instead of > "list(itertools.chain(...))", one can add in a loop and end up > *faster*. > > I think blist isn't very popular not because it isn't really good, but > because it isn't a specialised structure. It is, however, almost there > for almost every circumstance. This can help keep the standard library > clean, especially of tree data structures. > > Here's what we kill: > > * Linked lists and doubly-linked lists, which are scarily popular for > whatever reason. Sometimes people claim that collections.deque isn't > powerful enough for whatever they want, and blist will almost > definitely sate those cases. > > * Balanced trees, with blist.sortedlist. This is actually needed right now. > > * Poor performance in the cases where a lot of list merging and pruning happens. > > * Most uses of bisect. > > * Some instances where two data structures are used in parallel in > order to keep performance fast on disparate operations (like `x in y` > and `y[i]`). > > Now, I understand there are downsides to blist. Particularly, I've > looked through the "benchmarks" and they seem untruthful. Further, > we'd need a maintainer. Finally, nobody jumps at blists because > they're rarely the obvious solution. Rather, they attempt to be a > different general solution. Hopefully, though, a stdlib inclusion > could make them a lot more obvious, and support in some current > libraries could make them feel more at home. > > I don't know whether this is a good idea, but I do feel that it is > more promising and general than having a graph in the standard > library. > I'd raise this on python-dev or python ideas as a result of reading PEP 3128. Specifically the second paragraph states Raymond Hettinger's sage advice:- "Depending on its success as a third-party module, it still has a chance for inclusion in the collections module. The essential criteria for that is whether it is a superior choice for some real-world use cases. I've scanned my own code and found no instances where BList would have been preferable to a regular list. However, that scan has a selection bias because it doesn't reflect what I would have written had BList been available. So, after a few months, I intend to poll comp.lang.python for BList success stories. If they exist, then I have no problem with inclusion in the collections module. After all, its learning curve is near zero -- the only cost is the clutter factor stemming from indecision about the most appropriate data structure for a given task." -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
[toc] | [prev] | [next] | [standalone]
| From | Daniel Stutzbach <stutzbach@google.com> |
|---|---|
| Date | 2014-03-17 14:16 -0700 |
| Subject | Re: Balanced trees |
| Message-ID | <mailman.8225.1395091055.18130.python-list@python.org> |
| In reply to | #68026 |
[Multipart message — attachments visible in raw view] — view raw
On Fri, Mar 14, 2014 at 6:13 PM, Joshua Landau <joshua@landau.ws> wrote: > > Now, I understand there are downsides to blist. Particularly, I've > looked through the "benchmarks" and they seem untruthful. I worked hard to make those benchmarks as fair as possible. I recognize that evaluating your own work always runs the risk of introducing hidden biases, and I welcome input on how they could be improved. -- Daniel Stutzbach
[toc] | [prev] | [next] | [standalone]
| From | Joshua Landau <joshua@landau.ws> |
|---|---|
| Date | 2014-03-18 00:08 +0000 |
| Subject | Re: Balanced trees |
| Message-ID | <mailman.8229.1395101355.18130.python-list@python.org> |
| In reply to | #68026 |
On 17 March 2014 21:16, Daniel Stutzbach <stutzbach@google.com> wrote: > On Fri, Mar 14, 2014 at 6:13 PM, Joshua Landau <joshua@landau.ws> wrote: >> >> Now, I understand there are downsides to blist. Particularly, I've >> looked through the "benchmarks" and they seem untruthful. > > I worked hard to make those benchmarks as fair as possible. I recognize > that evaluating your own work always runs the risk of introducing hidden > biases, and I welcome input on how they could be improved. Thanks. First, I want to state that there are two aspects to my claim. The first is that these benchmarks to not represent typical use-cases. I will not go too far into this, though, because it's mostly obvious. The second is that of the the flaws in the benchmarks themselves. I'll go through in turn some that are apparent to me: "Create from an iterator" gives me relatively different results when I run it (Python 3). "Delete a slice" is fudged from its inclusion of multiplication, which is far faster on blists. I admit that it's not obvious how to fix this. "First in, first out (FIFO)" should be "x.append(0); x.pop(0)". "Last in, first out (LIFO)" should use "pop()" over "pop(-1)", although I admit it shouldn't make a meaningful difference. "Sort *" are really unfair because they put initialisation in the timed part and all have keys. The benchmarks on Github are less bad, but the website really should include all of them and fix the remaining problems. I do understand that TimSort isn't the most suited algorithm, though, so I won't read too far into these results. Further, some of these tests don't show growth where they should, such as in getitem. The growth is readily apparent when measured as such: >>> python -m timeit -s "from random import choice; import blist; lst = blist.blist(range(10**0))" "choice(lst)" 1000000 loops, best of 3: 1.18 usec per loop >>> python -m timeit -s "from random import choice; import blist; lst = blist.blist(range(10**8))" "choice(lst)" 1000000 loops, best of 3: 1.56 usec per loop Lower size ranges are hidden by the function-call overhead. Perhaps this effect is to do with caching, in which case the limits of the cache should be explained more readily. Nevertheless, my enthusiasm for blist as an alternative stdlib implementation remains. There are obvious and large advantages to be had, sometimes when you wouldn't even expect. The slower aspects of blist are also rarely part of the bottlenecks of programs. So yeah, go for it.
[toc] | [prev] | [next] | [standalone]
| From | Daniel Stutzbach <stutzbach@google.com> |
|---|---|
| Date | 2014-03-17 18:01 -0700 |
| Subject | Re: Balanced trees |
| Message-ID | <mailman.8233.1395104535.18130.python-list@python.org> |
| In reply to | #68026 |
[Multipart message — attachments visible in raw view] — view raw
On Mon, Mar 17, 2014 at 5:08 PM, Joshua Landau <joshua@landau.ws> wrote: > Thanks. First, I want to state that there are two aspects to my > claim. The first is that these benchmarks to not represent typical > use-cases. I will not go too far into this, though, because it's > mostly obvious. > I would love to have include macro-benchmarks. I keep waiting for the PyPy benchmark suite to get ported to Python 3... > "Create from an iterator" gives me relatively different results when I > run it (Python 3). > The graphs were originally created to compare vanilla Python with a Python modified to use blist as the built-in list type. I think I used Python 3.1, but I'm not certain. As I recall, the built-in type has a few small advantages over any third-party extension type, so that might be what you're seeing. Alternately, something may have changed between Python versions. > "Delete a slice" is fudged from its inclusion of multiplication, which > is far faster on blists. I admit that it's not obvious how to fix > this. > I could move the initialization into the timed part, similar to what I did for sort (see below). That has downsides too, of course, but it might be an improvement. > "First in, first out (FIFO)" should be "x.append(0); x.pop(0)". > Wow, I mangled that one badly. > "Last in, first out (LIFO)" should use "pop()" over "pop(-1)", > although I admit it shouldn't make a meaningful difference. > I like pop(-1) because it's explicit rather than implicit. I agree it shouldn't make a meaningful difference. > "Sort *" are really unfair because they put initialisation in the > timed part That's a limitation of timeit. The setup step is only executed once. If I put the initialization there, every sort after the first one would be sorting a pre-sorted list. If you compare the "Create form an iterator" and "Sort a random list", you'll see that the initialization cost is dwarfed by the sorting cost for n > 15 or so. > and all have keys. If you use classes with __lt__ methods instead of keys, the cost is dominated by the calls to __lt__. You're right that I should include both, though. > >>> python -m timeit -s "from random import choice; import blist; lst = > blist.blist(range(10**0))" "choice(lst)" > 1000000 loops, best of 3: 1.18 usec per loop > > >>> python -m timeit -s "from random import choice; import blist; lst = > blist.blist(range(10**8))" "choice(lst)" > 1000000 loops, best of 3: 1.56 usec per loop > > Lower size ranges are hidden by the function-call overhead. > Perhaps this effect is to do with caching, in which case the limits of > the cache should be explained more readily. > That's definitely a cache issue, which is always a risk with micro-benchmarks. I see growth even for the built-in list: gnusto:~$ python -m timeit -s "from random import choice; lst = list(range(10**0))" "choice(lst)" 1000000 loops, best of 3: 0.349 usec per loop gnusto:~$ python -m timeit -s "from random import choice; lst = list(range(10**8))" "choice(lst)" 1000000 loops, best of 3: 0.634 usec per loop I agree it's more interesting to pick items randomly instead of always querying the same index. The overhead of choice() is kind of a problem, though. Since I'm only plotting up to 10**5, I'd expect these to look more or less flat. Thanks for all of the feedback. I filed a bug with myself to improve the metrics: https://github.com/DanielStutzbach/blist/issues/64 -- Daniel Stutzbach
[toc] | [prev] | [next] | [standalone]
| From | Joshua Landau <joshua@landau.ws> |
|---|---|
| Date | 2014-03-18 07:46 +0000 |
| Subject | Re: Balanced trees |
| Message-ID | <mailman.8245.1395128850.18130.python-list@python.org> |
| In reply to | #68026 |
On 18 March 2014 01:01, Daniel Stutzbach <stutzbach@google.com> wrote:
> I would love to have include macro-benchmarks. I keep waiting for the PyPy
> benchmark suite to get ported to Python 3...
*grins*
>> "Delete a slice" is fudged from its inclusion of multiplication, which
>> is far faster on blists. I admit that it's not obvious how to fix
>> this.
>
> I could move the initialization into the timed part, similar to what I did
> for sort (see below). That has downsides too, of course, but it might be an
> improvement.
You could try making a baseline and subtracting it:
timer("del x[len(x)//4:3*len(x)//4]; x *= 2") - timer("x * 2")
Not ideal, but closer, assuming that the multiplication isn't much
larger than the deletion. Error would be summed.
>> "Sort *" are really unfair because they put initialisation in the
>> timed part
>
> That's a limitation of timeit. The setup step is only executed once. If I
> put the initialization there, every sort after the first one would be
> sorting a pre-sorted list. If you compare the "Create form an iterator" and
> "Sort a random list", you'll see that the initialization cost is dwarfed by
> the sorting cost for n > 15 or so.
This argument is slightly less convincing without the overhead of the
keys. It might be worth doing a subtraction and adding some error-bars
as I suggest above. Nevertheless, I do agree for n > some small n,
which is all that matters anyway.
>> and all have keys.
>
> If you use classes with __lt__ methods instead of keys, the cost is
> dominated by the calls to __lt__. You're right that I should include both,
> though.
This argument doesn't make sense to me. The only time this happens is
when you have a non-primitive and your transformation gives a
primitive which has optimised comparisons. This typically only happens
when the key is a getitem or getattr, in which case it's just
meaningless overhead. I see little reason to care about the key's cost
in those cases.
> That's definitely a cache issue, which is always a risk with
> micro-benchmarks.
>
> I agree it's more interesting to pick items randomly instead of always
> querying the same index. The overhead of choice() is kind of a problem,
> though. Since I'm only plotting up to 10**5, I'd expect these to look more
> or less flat.
You could try jumping around to avoid the cache without using random
numbers. Something like "idx = (idx + LARGE_PRIME) % n" might have less
overhead. Further, the subtraction method would probably work fine for
that.
Also, I don't think the cache is all bad. Chances are a lot of list
accesses have a lot of data locality.
> Thanks for all of the feedback.
Thanks in turn for the module :).
[toc] | [prev] | [next] | [standalone]
| From | Gregory Ewing <greg.ewing@canterbury.ac.nz> |
|---|---|
| Date | 2014-03-09 13:40 +1300 |
| Message-ID | <bo1rjpFll5mU1@mid.individual.net> |
| In reply to | #68023 |
Ian Kelly wrote:
> class LessThanFilter:
>
> def __init__(self, the_list):
> self._the_list = the_list
>
> def __getitem__(self, bound):
> return [x for x in self._the_list if x < bound]
>
>
> filter = LessThanFilter([10, 20, 30, 40, 50])
> filter[25] += [15, 17, 23]
>
> Should that last line not raise an exception?
In this case it will fail to catch what is probably an error,
but you can't expect the language to find all your bugs for
you. If you wrote the same bug this way:
filter[25].extend([15, 17, 23])
it wouldn't be caught either.
What's happening is that we're trying to use the syntax
a += b to mean two different things:
1) Shorthand for a = a + b
2) A way of expressing an in-place modification, such
as a.extend(b)
Case (2) is not really an assignment at all, so arguably
it shouldn't require the LHS to support assignment.
--
Greg
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2014-03-08 19:39 -0700 |
| Message-ID | <mailman.7944.1394332838.18130.python-list@python.org> |
| In reply to | #68055 |
On Sat, Mar 8, 2014 at 5:40 PM, Gregory Ewing <greg.ewing@canterbury.ac.nz> wrote: > Ian Kelly wrote: >> >> class LessThanFilter: >> >> def __init__(self, the_list): >> self._the_list = the_list >> >> def __getitem__(self, bound): >> return [x for x in self._the_list if x < bound] >> >> >> filter = LessThanFilter([10, 20, 30, 40, 50]) >> filter[25] += [15, 17, 23] >> >> Should that last line not raise an exception? > > > In this case it will fail to catch what is probably an error, > but you can't expect the language to find all your bugs for > you. If you wrote the same bug this way: > > filter[25].extend([15, 17, 23]) > > it wouldn't be caught either. > > What's happening is that we're trying to use the syntax > a += b to mean two different things: > > 1) Shorthand for a = a + b > > 2) A way of expressing an in-place modification, such > as a.extend(b) > > Case (2) is not really an assignment at all, so arguably > it shouldn't require the LHS to support assignment. In my view the second one is wrong. a += b should be understood as being equivalent to a = a + b, but with the *possible* and by no means guaranteed optimization that the operation may be performed in-place. In fact, if you read the documentation for lists, you may notice that while they clearly cover the + operator and the extend method, they do not explicitly document the list class's += operator. So although I'm not entirely sure whether it is intentional or not, and I would be quite surprised if some implementation were actually to differ on this point, the language does *not* from what I can see guarantee that the += operator on lists is equivalent to calling .extend. That having been said, code that uses += and relies on the operation to be performed in-place should be considered buggy. If you need the operation to be performed in-place, then use in-place methods like list.extend. If you need the operation not to be performed in-place, then use a = a + b. If you're ambivalent on the in-place issue and just want to write polymorphic code, that's when you should consider using +=.
[toc] | [prev] | [next] | [standalone]
| From | Marko Rauhamaa <marko@pacujo.net> |
|---|---|
| Date | 2014-03-09 11:35 +0200 |
| Message-ID | <87vbvn3eak.fsf@elektro.pacujo.net> |
| In reply to | #68060 |
Ian Kelly <ian.g.kelly@gmail.com>: > In my view the second one is wrong. a += b should be understood as > being equivalent to a = a + b, but with the *possible* and by no means > guaranteed optimization that the operation may be performed in-place. Some call it an optimization, others call it a side effect. Anyway, that's how it has been explicitly defined in the language specification so that's the reality whether one likes it or not. Marko
[toc] | [prev] | [next] | [standalone]
| From | Gregory Ewing <greg.ewing@canterbury.ac.nz> |
|---|---|
| Date | 2014-03-10 11:03 +1300 |
| Message-ID | <bo46qcF6dofU1@mid.individual.net> |
| In reply to | #68060 |
Ian Kelly wrote: > In my view the second one is wrong. a += b should be understood as > being equivalent to a = a + b, but with the *possible* and by no means > guaranteed optimization that the operation may be performed in-place. This interpretation is at odds with the Language Reference, section 6.2.1, Augmented Assignment Statements: "An augmented assignment expression like x += 1 can be rewritten as x = x + 1 to achieve a similar, but not exactly equal effect... when possible, the actual operation is performed in-place, meaning that rather than creating a new object and assigning that to the target, the old object is modified instead." Note that it says "when possible", not "if the implementation feels like it". > In fact, if you read the documentation for lists, you may notice that > while they clearly cover the + operator and the extend method, they do > not explicitly document the list class's += operator. The "when possible" clause, together with the fact that lists are mutable, implies that it *will* be done in-place. There is no need to document all the in-place operators explicitly for every type. -- Greg
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2014-03-09 19:00 -0400 |
| Message-ID | <mailman.7971.1394406046.18130.python-list@python.org> |
| In reply to | #68096 |
On 3/9/2014 6:03 PM, Gregory Ewing wrote: > Ian Kelly wrote: >> In my view the second one is wrong. a += b should be understood as >> being equivalent to a = a + b, but with the *possible* and by no means >> guaranteed optimization that the operation may be performed in-place. > > This interpretation is at odds with the Language Reference, > section 6.2.1, Augmented Assignment Statements: > > "An augmented assignment expression like x += 1 can be rewritten as x = > x + 1 to > achieve a similar, but not exactly equal effect... when possible, the > actual operation is performed > in-place, meaning that rather than creating a new object and assigning > that to > the target, the old object is modified instead." > > Note that it says "when possible", not "if the implementation > feels like it". > >> In fact, if you read the documentation for lists, you may notice that >> while they clearly cover the + operator and the extend method, they do >> not explicitly document the list class's += operator. > > The "when possible" clause, together with the fact that lists > are mutable, implies that it *will* be done in-place. There > is no need to document all the in-place operators explicitly > for every type. The discussion of .__iop__ in the datamodel chapter was just revised slightly. http://bugs.python.org/issue19953 -- Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2014-03-09 17:42 -0600 |
| Message-ID | <mailman.7972.1394408612.18130.python-list@python.org> |
| In reply to | #68096 |
On Sun, Mar 9, 2014 at 4:03 PM, Gregory Ewing <greg.ewing@canterbury.ac.nz> wrote: > Ian Kelly wrote: >> >> In my view the second one is wrong. a += b should be understood as >> being equivalent to a = a + b, but with the *possible* and by no means >> guaranteed optimization that the operation may be performed in-place. > > > This interpretation is at odds with the Language Reference, > section 6.2.1, Augmented Assignment Statements: > > "An augmented assignment expression like x += 1 can be rewritten as x = x + > 1 to > achieve a similar, but not exactly equal effect... when possible, the actual > operation is performed > > in-place, meaning that rather than creating a new object and assigning that > to > the target, the old object is modified instead." > > Note that it says "when possible", not "if the implementation > feels like it". That's quite vague, and not much stronger a guarantee than "maybe". It's technically "possible" for this augmented assignment to be performed in place: x = 12 x += 4 But it's not done in-place, because ints are meant to be immutable. In any case, this means that whether the operation is actually performed in-place is an implementation detail -- if not of the Python implementation then at least of the class -- and not something the user should take for granted.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-03-10 02:37 +0000 |
| Message-ID | <531d2551$0$29994$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #68107 |
On Sun, 09 Mar 2014 17:42:42 -0600, Ian Kelly wrote: > On Sun, Mar 9, 2014 at 4:03 PM, Gregory Ewing > <greg.ewing@canterbury.ac.nz> wrote: >> Note that it says "when possible", not "if the implementation feels >> like it". > > That's quite vague, and not much stronger a guarantee than "maybe". It's > technically "possible" for this augmented assignment to be performed in > place: > > x = 12 > x += 4 > > But it's not done in-place, because ints are meant to be immutable. That's incorrect. Ints aren't merely "meant" to be immutable, which implies that's it's optional, they are defined by the language specification and the reference implementation as immutable. Any interpreter where ints are mutable *is not Python*. > In any case, this means that whether the operation is actually performed > in-place is an implementation detail -- if not of the Python > implementation then at least of the class -- and not something the user > should take for granted. Whether += operates in place or not is part of the interface of the class, not the implementation. Would you say that whether list.append operates in place or creates a new list is an implementation detail? Whether str.upper() creates a new string or modifies the existing one in place? Mutability versus immutability is part of the interface, not implementation, not withstanding that somebody could create an alternative class with the opposite behaviour: a MutableStr, or ImmutableList. -- Steven D'Aprano http://import-that.dreamwidth.org/
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2014-03-10 02:35 -0600 |
| Message-ID | <mailman.7984.1394440586.18130.python-list@python.org> |
| In reply to | #68113 |
On Sun, Mar 9, 2014 at 8:37 PM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > On Sun, 09 Mar 2014 17:42:42 -0600, Ian Kelly wrote: > >> On Sun, Mar 9, 2014 at 4:03 PM, Gregory Ewing >> <greg.ewing@canterbury.ac.nz> wrote: > >>> Note that it says "when possible", not "if the implementation feels >>> like it". >> >> That's quite vague, and not much stronger a guarantee than "maybe". It's >> technically "possible" for this augmented assignment to be performed in >> place: >> >> x = 12 >> x += 4 >> >> But it's not done in-place, because ints are meant to be immutable. > > That's incorrect. Ints aren't merely "meant" to be immutable, which > implies that's it's optional, they are defined by the language > specification and the reference implementation as immutable. Any > interpreter where ints are mutable *is not Python*. That's true, but is beside the point, which is that "when possible" is not very meaningful. >> In any case, this means that whether the operation is actually performed >> in-place is an implementation detail -- if not of the Python >> implementation then at least of the class -- and not something the user >> should take for granted. > > Whether += operates in place or not is part of the interface of the > class, not the implementation. > > Would you say that whether list.append operates in place or creates a new > list is an implementation detail? Whether str.upper() creates a new > string or modifies the existing one in place? Of course not. list.append is documented as modifying the list. str.upper is documented as returning a copy of the string. > Mutability versus > immutability is part of the interface, not implementation, not > withstanding that somebody could create an alternative class with the > opposite behaviour: a MutableStr, or ImmutableList. If the in-place behavior of += is held to be part of the interface, then we must accept that += is not polymorphic across mutable and immutable types, which in my mind largely* defeats the purpose of having it. After all, there should be one -- and preferably only one -- obvious way to do it. If you want in-place concatenation, the obvious way to do it is by calling extend. If you want copy concatenation, the obvious way to do it is with the + operator. Why then should not just mutable sequences but immutable sequences as well even offer the += operator? * The one exception I can think of is numpy, where there is no more obvious way to do in-place addition, and in that context I would consider the in-place behavior to be part of the interface.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-03-10 09:13 +0000 |
| Message-ID | <531d8244$0$29994$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #68127 |
On Mon, 10 Mar 2014 02:35:36 -0600, Ian Kelly wrote:
> On Sun, Mar 9, 2014 at 8:37 PM, Steven D'Aprano
> <steve+comp.lang.python@pearwood.info> wrote:
>> On Sun, 09 Mar 2014 17:42:42 -0600, Ian Kelly wrote:
>>
>>> On Sun, Mar 9, 2014 at 4:03 PM, Gregory Ewing
>>> <greg.ewing@canterbury.ac.nz> wrote:
>>
>>>> Note that it says "when possible", not "if the implementation feels
>>>> like it".
>>>
>>> That's quite vague, and not much stronger a guarantee than "maybe".
>>> It's technically "possible" for this augmented assignment to be
>>> performed in place:
>>>
>>> x = 12
>>> x += 4
>>>
>>> But it's not done in-place, because ints are meant to be immutable.
>>
>> That's incorrect. Ints aren't merely "meant" to be immutable, which
>> implies that's it's optional, they are defined by the language
>> specification and the reference implementation as immutable. Any
>> interpreter where ints are mutable *is not Python*.
>
> That's true, but is beside the point, which is that "when possible" is
> not very meaningful.
It's meaningful. It refers not to ints, but the infinite number of
possible classes which might include augmented assignment. Some of them
will be immutable, in which case it is not possible for += etc. to be
performed in-place. Some of them will be mutable, but there won't be any
reasonable (or even unreasonable) way to perform += in-place.
But for some mutable classes, it will be possible to perform += in place,
in which case the docs say that they have to do so.
>>> In any case, this means that whether the operation is actually
>>> performed in-place is an implementation detail -- if not of the Python
>>> implementation then at least of the class -- and not something the
>>> user should take for granted.
>>
>> Whether += operates in place or not is part of the interface of the
>> class, not the implementation.
>>
>> Would you say that whether list.append operates in place or creates a
>> new list is an implementation detail? Whether str.upper() creates a new
>> string or modifies the existing one in place?
>
> Of course not. list.append is documented as modifying the list.
> str.upper is documented as returning a copy of the string.
Right. And += is documented as modifying the list too.
>> Mutability versus
>> immutability is part of the interface, not implementation, not
>> withstanding that somebody could create an alternative class with the
>> opposite behaviour: a MutableStr, or ImmutableList.
>
> If the in-place behavior of += is held to be part of the interface, then
> we must accept that += is not polymorphic across mutable and immutable
> types,
I'm fine with that.
> which in my mind largely* defeats the purpose of having it.
Not to my mind. I think that having augmented assignment is worthwhile
even if it behaves differently and incompatibly for different classes.
After all, so does * (multiplication):
py> x = 24
py> x*x
576
but:
py> x = []
py> x*x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't multiply sequence by non-int of type 'list'
I don't think that we ought to throw away * and I don't think we ought to
throw away *= either.
> After all, there should be one -- and preferably only one -- obvious way
> to do it. If you want in-place concatenation, the obvious way to do it
> is by calling extend. If you want copy concatenation, the obvious way
> to do it is with the + operator. Why then should not just mutable
> sequences but immutable sequences as well even offer the += operator?
*shrug*
I can see that each individual operation:
list.extend
list +
list +=
makes good sense in isolation, but I can also see that the combination
don't quite gel together as smoothly as we might hope.
--
Steven D'Aprano
http://import-that.dreamwidth.org/
[toc] | [prev] | [next] | [standalone]
| From | Gregory Ewing <greg.ewing@canterbury.ac.nz> |
|---|---|
| Date | 2014-03-11 18:15 +1300 |
| Message-ID | <bo7kfnFsehvU1@mid.individual.net> |
| In reply to | #68127 |
Ian Kelly wrote: > If the in-place behavior of += is held to be part of the interface, > then we must accept that += is not polymorphic across mutable and > immutable types, That's quite correct, it's not. As I said, it's one notation doing double duty. Usually there isn't any confusion, because you know whether any particular instance of it is intended to operate on a mutable or immutable type. If that's not the case -- if you're writing a function intended to operate on a variety of types, some mutable and some not -- then using in-place operators would not be appropriate. > If you want in-place concatenation, the > obvious way to do it is by calling extend. It might be the obvious way for that particular operation on that particular type. But what about all the others? What's the obvious way to spell in-place set intersection, for example? (Quickly -- no peeking at the docs!) The in-place operators provide a standardised spelling for in-place versions of all the binary operations. That's a useful thing to have, I think. -- Greg
[toc] | [prev] | [next] | [standalone]
| From | Gregory Ewing <greg.ewing@canterbury.ac.nz> |
|---|---|
| Date | 2014-03-11 18:03 +1300 |
| Message-ID | <bo7jo9Fsaf1U1@mid.individual.net> |
| In reply to | #68107 |
Ian Kelly wrote: > It's technically "possible" for this augmented assignment to be > performed in place: > > x = 12 > x += 4 > > But it's not done in-place, because ints are meant to be immutable. Which means it's *not* possible, because doing so would violate the documented properties of the int type. > In any case, this means that whether the operation is actually > performed in-place is an implementation detail The implementation could theoretically perform this optimisation if there are no other references to the object. But this will be completely invisible, because to even find out whether it's the same object, you need to keep another reference to the original object, preventing the optimisation from being performed. As far as observable effects are concerned, it's quite clear: mutable objects can *always* be updated in-place, and immutable objects can *never* be. -- Greg
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2014-03-11 04:39 -0600 |
| Message-ID | <mailman.8040.1394534428.18130.python-list@python.org> |
| In reply to | #68196 |
On Mon, Mar 10, 2014 at 11:03 PM, Gregory Ewing <greg.ewing@canterbury.ac.nz> wrote: > As far as observable effects are concerned, it's > quite clear: mutable objects can *always* be updated > in-place, and immutable objects can *never* be. Hm. Consider the circle-ellipse problem. Briefly, a circle is-an ellipse, so in an inheritance hierarchy it is natural to make Circle a subclass of Ellipse. Now suppose the Ellipse has a stretch method that mutates the ellipse by changing the length of one of its axes while preserving the other. To avoid violating LSP, the Circle class must support all the methods of its ancestor. However it cannot, because the stretch method would invalidate the invariant of the Circle class that both of its axes must always be equal. There are a number of possible solutions. One possibility would be to copy the Circle as an Ellipse and return the new object instead of mutating it. Then you have the situation where, given a mutable object x that satisfies isinstance(x, Ellipse), the stretch method *may* be able to update the object in-place, or it *may* not. I can't think of a reasonable example that would replace the stretch method here with an augmented assignment, but then it is rather late. > It might be the obvious way for that particular operation on > that particular type. But what about all the others? > What's the obvious way to spell in-place set intersection, > for example? (Quickly -- no peeking at the docs!) You mean set.intersection_update? The in-place set methods are not hard to remember, because they all end in _update.
[toc] | [prev] | [next] | [standalone]
Page 5 of 6 — ← Prev page 1 2 3 4 [5] 6 Next page →
Back to top | Article view | comp.lang.python
csiph-web