Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #68553
| From | Roy Smith <roy@panix.com> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: Balanced trees |
| Date | 2014-03-19 10:06 -0400 |
| Organization | PANIX Public Access Internet and UNIX, NYC |
| Message-ID | <roy-0F0BB4.10062919032014@news.panix.com> (permalink) |
| References | (15 earlier) <mailman.8264.1395181297.18130.python-list@python.org> <87r45zf6fu.fsf@elektro.pacujo.net> <5328efab$0$29994$c3e8da3$5496439d@news.astraweb.com> <87pplipo82.fsf@elektro.pacujo.net> <53299eac$0$29994$c3e8da3$5496439d@news.astraweb.com> |
In article <53299eac$0$29994$c3e8da3$5496439d@news.astraweb.com>,
Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
> If you have a million items, then the odds that those
> million items happen to be sorted (the worst-case order) are 1 in a
> million factorial. That's a rather small number, small enough that we can
> discount it from ever happening, not in a million lifetimes of the
> Universe.
If the items are coming from some inherently random process, then I
agree. But, not all data sources are random.
Imagine you had a web site, and wanted to store various bits of data
from the stream of input requests. You decided that the data structure
you were going to use was a balanced tree. If your keys were, say, a
hash of the client's IP address, then it's a pretty good guess they're
random. But, what if the keys were the time the request was received?
Those are obviously not random, and using them as a keys in a balanced
tree would give you sub-optimum performance.
This is not a hypothetical scenario. Songza uses MongoDB as our
database. The indexes are balanced trees. One of our biggest
collections has a multi-component key, one of the components being the
request time truncated to the hour. Insertion time into that collection
has an obvious sawtooth shape, with performance degrading as each hour
progresses and jumping back up as the time rolls over to the next hour.
This is due (we believe :-)) to the constant rebalancing of the index
trees.
Almost-sorted data sets are very common. For example, here's a pipeline
I run a lot (from memory, could have gotten some detail wrong):
grep pattern lofgile | awk '{print $7}' | sed 's/:[0-9][0-9]$//' | sort
| uniq -c
Field 7 is the timestamp for when a request came in. What I'm doing
here is counting how many requests of a certain type came in during each
minute of the day. Logging isn't exactly in chronological order, so I
need to sort the times before I count them. But, it's in *almost*
chronological order; a sort which had pathologically bad behavior on
almost sorted data would be unusable here.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Re: Tuples and immutability Duncan Booth <duncan.booth@invalid.invalid> - 2014-03-07 09:33 +0000
Re: Tuples and immutability Ben Finney <ben+python@benfinney.id.au> - 2014-03-07 22:04 +1100
Re: Tuples and immutability Chris Angelico <rosuav@gmail.com> - 2014-03-07 22:11 +1100
Re: Tuples and immutability Peter Otten <__peter__@web.de> - 2014-03-07 12:38 +0100
Re: Tuples and immutability Chris Angelico <rosuav@gmail.com> - 2014-03-07 22:45 +1100
Re: Tuples and immutability Alister <alister.ware@ntlworld.com> - 2014-03-07 11:51 +0000
Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-07 11:23 -0700
Re: Tuples and immutability Roy Smith <roy@panix.com> - 2014-03-07 08:37 -0500
Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-08 15:17 +1300
Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-07 21:17 -0700
Balanced trees (was: Re: Tuples and immutability) Marko Rauhamaa <marko@pacujo.net> - 2014-03-08 10:34 +0200
Re: Balanced trees (was: Re: Tuples and immutability) Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-08 04:03 -0700
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-08 13:26 +0200
Re: Balanced trees (was: Re: Tuples and immutability) Dan Stromberg <drsalists@gmail.com> - 2014-03-08 11:58 -0800
Re: Balanced trees Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-08 20:37 +0000
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-08 23:21 +0200
Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-08 17:22 -0500
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-09 11:17 +0200
Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-08 17:31 -0800
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-09 11:27 +0200
Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-09 14:20 -0700
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-09 23:32 +0200
Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-09 14:37 -0700
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-09 23:43 +0200
Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-09 15:08 -0700
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-10 00:24 +0200
Re: Balanced trees Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-09 18:04 -0600
Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-10 03:24 +0000
Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-10 03:24 +0000
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-10 08:16 +0200
Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-10 08:53 +0000
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-10 11:41 +0200
Re: Balanced trees Ned Batchelder <ned@nedbatchelder.com> - 2014-03-10 06:57 -0400
Re: Balanced trees Rustom Mody <rustompmody@gmail.com> - 2014-03-10 09:01 -0700
Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-11 02:02 +0000
Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-10 22:20 -0400
Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 13:29 +1100
Re: Balanced trees Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-10 09:51 +0000
Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-10 09:59 -0400
Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 03:20 +1100
Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 03:24 +1100
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-10 19:08 +0200
Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 04:17 +1100
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-10 19:34 +0200
Re: Balanced trees Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-13 12:40 -0600
Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-13 23:57 +0000
Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-13 20:12 -0700
Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-14 05:02 +0000
Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-10 19:24 -0400
Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 10:27 +1100
Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-11 01:26 +0000
Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 12:45 +1100
Re: Balanced trees Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-10 21:38 -0600
Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 15:28 +1100
Re: Balanced trees "Rhodri James" <rhodri@wildebst.org.uk> - 2014-03-12 00:57 +0000
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-11 12:12 +0200
Re: Balanced trees alex23 <wuwei23@gmail.com> - 2014-03-12 10:13 +1000
Re: Balanced trees Alister <alister.ware@ntlworld.com> - 2014-03-11 09:25 +0000
Re: Balanced trees Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2014-03-12 10:08 +0100
Re: Balanced trees Tim Chase <python.list@tim.thechases.com> - 2014-03-10 11:33 -0500
Re: Balanced trees Chris Angelico <rosuav@gmail.com> - 2014-03-11 03:39 +1100
Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-10 18:05 -0700
Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-10 22:13 -0400
Re: Balanced trees Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-03-10 19:57 -0400
Re: Balanced trees Joshua Landau <joshua@landau.ws> - 2014-03-15 01:13 +0000
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-18 00:05 +0200
Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-18 12:26 -0700
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-18 22:55 +0200
Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-18 14:45 -0700
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-19 00:03 +0200
Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-18 15:21 -0700
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-19 01:11 +0200
Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-19 01:15 +0000
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-19 10:49 +0200
Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-19 13:42 +0000
Re: Balanced trees Marko Rauhamaa <marko@pacujo.net> - 2014-03-19 15:54 +0200
Re: Balanced trees Roy Smith <roy@panix.com> - 2014-03-19 10:06 -0400
Re: Balanced trees Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-19 01:15 +0000
Re: Balanced trees Ethan Furman <ethan@stoneleaf.us> - 2014-03-19 08:15 -0700
Re: Balanced trees "Rhodri James" <rhodri@wildebst.org.uk> - 2014-03-20 02:16 +0000
Re: Balanced trees Dan Stromberg <drsalists@gmail.com> - 2014-03-19 19:34 -0700
Re: Balanced trees Chris Kaynor <ckaynor@zindagigames.com> - 2014-03-18 18:02 -0700
Re: Balanced trees Daniel Stutzbach <stutzbach@google.com> - 2014-03-18 13:18 -0700
blist in standard library (was Re: Balanced trees) Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-03-15 12:31 +0000
Re: Balanced trees Daniel Stutzbach <stutzbach@google.com> - 2014-03-17 14:16 -0700
Re: Balanced trees Joshua Landau <joshua@landau.ws> - 2014-03-18 00:08 +0000
Re: Balanced trees Daniel Stutzbach <stutzbach@google.com> - 2014-03-17 18:01 -0700
Re: Balanced trees Joshua Landau <joshua@landau.ws> - 2014-03-18 07:46 +0000
Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-09 13:40 +1300
Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-08 19:39 -0700
Re: Tuples and immutability Marko Rauhamaa <marko@pacujo.net> - 2014-03-09 11:35 +0200
Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-10 11:03 +1300
Re: Tuples and immutability Terry Reedy <tjreedy@udel.edu> - 2014-03-09 19:00 -0400
Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-09 17:42 -0600
Re: Tuples and immutability Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-10 02:37 +0000
Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-10 02:35 -0600
Re: Tuples and immutability Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-10 09:13 +0000
Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-11 18:15 +1300
Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-11 18:03 +1300
Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-11 04:39 -0600
Re: Tuples and immutability Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-11 16:46 +0000
Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-12 10:23 +1300
Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-11 17:06 -0600
Re: Tuples and immutability Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-03-12 23:20 +0000
Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-12 19:35 -0600
Re: Tuples and immutability Terry Reedy <tjreedy@udel.edu> - 2014-03-12 22:09 -0400
Re: Tuples and immutability Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-09 13:45 +1300
Re: Tuples and immutability Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-08 19:55 -0700
csiph-web