Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #19973
| From | Martin Gregorie <martin@address-in-sig.invalid> |
|---|---|
| Newsgroups | comp.lang.java.programmer |
| Subject | Re: SortedMap question? |
| Date | 2012-11-26 02:20 +0000 |
| Organization | UK Free Software Network |
| Message-ID | <k8ujlm$enu$1@localhost.localdomain> (permalink) |
| References | <k8rkud$i3o$1@dont-email.me> <q4qoyde15lpk$.13amx8nbfrj28.dlg@40tude.net> <k8t8dp$tv5$1@dont-email.me> <k8tljj$6jr$1@localhost.localdomain> <k8ucsh$o2v$1@dont-email.me> |
On Sun, 25 Nov 2012 16:25:02 -0800, Knute Johnson wrote: > So for the actual case that I am working on, I have a collection of > about 5000 elements and am using an Integer as the key. Data is rarely > changed but often accessed. There should never be a case where the > searched for key will not exist. What would you use, a HashMap or a > TreeMap? > I had an additional reason for thinking that an ordered storage method would help us: compression. We had, for the day, a very large data warehouse in which many of the keys were relatively large. It was storing phone network-related data and, for instance, it turned out that things like IMEIs and dialed numbers are very much larger than a 32 bit soft database key (sdbk), so we got a worthwhile compression merely by holding a single copy of, say, a phone number as a dimension table replacing it by the sdbk in the call- related data. Rinse, wash and repeat for all the dimension tables in that database. This gave as good a compression ratio as the zip algorithm and without the compress/decompress cpu overhead, but it meant that we needed to do the sdbk substitutions before we stored the records. At this point we found a problem: theoretically the datawarehouse could do the lookups but in practise it hit a brick wall at 350 lookups/second and we needed at least 15,000/sec to handle the projected daily input volume. Hence the giant lookup table: the last time I saw it there were around 300M phone numbers in the tree. We didn't have the memory to touch that with only one number per node but fortunately a late night brain storming session came up with a cunning plot. We discovered that, although actual the phone number space is sparsely populated, the least significant two digits are quite densely grouped into clusters, so after digging statistics out of a sample of 500,000 numbers, we constructed the key from the phone number less the last two digits and stored these as a bitmap. This is why we needed an ordered map: its job was to ensure that the numbers in the cluster all hit the same node. The lookup procedure was: - take off the last two digits - use the prefix as the B-tree index - test the bit indexed by the last two digits. - if unset, there is no match Please excuse the use of xcess bandwidth, but hopefully somebody may find this type of approach helpful. -- martin@ | Martin Gregorie gregorie. | Essex, UK org |
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
SortedMap question? Knute Johnson <nospam@rabbitbrush.frazmtn.com> - 2012-11-24 15:23 -0800
Re: SortedMap question? Joerg Meier <joergmmeier@arcor.de> - 2012-11-25 00:57 +0100
Re: SortedMap question? David Lamb <dalamb@cs.queensu.ca> - 2012-11-25 08:49 -0500
Re: SortedMap question? David Lamb <dalamb@cs.queensu.ca> - 2012-11-25 08:59 -0500
Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 14:02 +0000
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:20 -0500
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:24 -0500
Re: SortedMap question? David Lamb <dalamb@cs.queensu.ca> - 2012-11-25 21:25 -0500
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 22:24 -0500
Re: SortedMap question? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-26 15:45 -0800
Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-26 20:29 -0500
Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 09:02 -0500
Re: SortedMap question? Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-25 17:47 +0000
Re: SortedMap question? Knute Johnson <nospam@rabbitbrush.frazmtn.com> - 2012-11-25 16:25 -0800
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 19:36 -0500
Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 22:58 -0500
Re: SortedMap question? Lew <lewbloch@gmail.com> - 2012-11-25 16:53 -0800
Re: SortedMap question? Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-26 02:20 +0000
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:20 -0500
Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 14:44 -0500
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:18 -0500
Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 17:35 -0500
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 19:58 -0500
Re: SortedMap question? Jim Janney <jjanney@shell.xmission.com> - 2012-11-26 12:45 -0700
Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-26 14:26 -0600
Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 14:16 -0600
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:21 -0500
Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 17:07 -0600
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 20:05 -0500
Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-26 01:34 +0000
Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 20:36 -0600
Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-12-15 12:27 +0000
Re: SortedMap question? Robert Klemme <shortcutter@googlemail.com> - 2012-11-25 01:36 +0100
Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 20:00 -0500
Re: SortedMap question? markspace <-@.> - 2012-11-24 17:34 -0800
Re: SortedMap question? Robert Klemme <shortcutter@googlemail.com> - 2012-11-25 15:28 +0100
Re: SortedMap question? Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-25 17:49 +0000
Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 14:00 +0000
csiph-web