Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #19973

Re: SortedMap question?

From Martin Gregorie <martin@address-in-sig.invalid>
Newsgroups comp.lang.java.programmer
Subject Re: SortedMap question?
Date 2012-11-26 02:20 +0000
Organization UK Free Software Network
Message-ID <k8ujlm$enu$1@localhost.localdomain> (permalink)
References <k8rkud$i3o$1@dont-email.me> <q4qoyde15lpk$.13amx8nbfrj28.dlg@40tude.net> <k8t8dp$tv5$1@dont-email.me> <k8tljj$6jr$1@localhost.localdomain> <k8ucsh$o2v$1@dont-email.me>

Show all headers | View raw


On Sun, 25 Nov 2012 16:25:02 -0800, Knute Johnson wrote:

> So for the actual case that I am working on, I have a collection of
> about 5000 elements and am using an Integer as the key.  Data is rarely
> changed but often accessed.  There should never be a case where the
> searched for key will not exist.  What would you use, a HashMap or a
> TreeMap?
>
I had an additional reason for thinking that an ordered storage method 
would help us: compression. 

We had, for the day, a very large data warehouse in which many of the 
keys were relatively large. It was storing phone network-related data 
and, for instance, it turned out that things like IMEIs and dialed 
numbers are very much larger than a 32 bit soft database key (sdbk), so 
we got a worthwhile compression merely by holding a single copy of, say, 
a phone number as a dimension table replacing it by the sdbk in the call-
related data. Rinse, wash and repeat for all the dimension tables in that 
database. This gave as good a compression ratio as the zip algorithm and 
without the compress/decompress cpu overhead, but it meant that we needed 
to do the sdbk substitutions before we stored the records. At this point 
we found a problem: theoretically the datawarehouse could do the lookups 
but in practise it hit a brick wall at 350 lookups/second and we needed 
at least 15,000/sec to handle the projected daily input volume. 

Hence the giant lookup table: the last time I saw it there were around 
300M phone numbers in the tree. We didn't have the memory to touch that 
with only one number per node but fortunately a late night brain storming 
session came up with a cunning plot. We discovered that, although actual 
the phone number space is sparsely populated, the least significant two 
digits are quite densely grouped into clusters, so after digging 
statistics out of a sample of 500,000 numbers, we constructed the key 
from the phone number less the last two digits and stored these as a 
bitmap. This is why we needed an ordered map: its job was to ensure that 
the numbers in the cluster all hit the same node. The lookup procedure 
was: 
- take off the last two digits
- use the prefix as the B-tree index
- test the bit indexed by the last two digits.
- if unset, there is no match

Please excuse the use of xcess bandwidth, but hopefully somebody may find 
this type of approach helpful.
 

-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

SortedMap question? Knute Johnson <nospam@rabbitbrush.frazmtn.com> - 2012-11-24 15:23 -0800
  Re: SortedMap question? Joerg Meier <joergmmeier@arcor.de> - 2012-11-25 00:57 +0100
    Re: SortedMap question? David Lamb <dalamb@cs.queensu.ca> - 2012-11-25 08:49 -0500
      Re: SortedMap question? David Lamb <dalamb@cs.queensu.ca> - 2012-11-25 08:59 -0500
        Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 14:02 +0000
          Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:20 -0500
        Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:24 -0500
          Re: SortedMap question? David Lamb <dalamb@cs.queensu.ca> - 2012-11-25 21:25 -0500
            Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 22:24 -0500
            Re: SortedMap question? Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-26 15:45 -0800
              Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-26 20:29 -0500
    Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 09:02 -0500
      Re: SortedMap question? Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-25 17:47 +0000
        Re: SortedMap question? Knute Johnson <nospam@rabbitbrush.frazmtn.com> - 2012-11-25 16:25 -0800
          Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 19:36 -0500
            Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 22:58 -0500
          Re: SortedMap question? Lew <lewbloch@gmail.com> - 2012-11-25 16:53 -0800
          Re: SortedMap question? Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-26 02:20 +0000
      Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 14:20 -0500
        Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 14:44 -0500
          Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:18 -0500
            Re: SortedMap question? Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-25 17:35 -0500
              Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 19:58 -0500
              Re: SortedMap question? Jim Janney <jjanney@shell.xmission.com> - 2012-11-26 12:45 -0700
                Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-26 14:26 -0600
        Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 14:16 -0600
          Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 16:21 -0500
            Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 17:07 -0600
              Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-25 20:05 -0500
          Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-26 01:34 +0000
            Re: SortedMap question? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-11-25 20:36 -0600
              Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-12-15 12:27 +0000
  Re: SortedMap question? Robert Klemme <shortcutter@googlemail.com> - 2012-11-25 01:36 +0100
  Re: SortedMap question? Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 20:00 -0500
  Re: SortedMap question? markspace <-@.> - 2012-11-24 17:34 -0800
    Re: SortedMap question? Robert Klemme <shortcutter@googlemail.com> - 2012-11-25 15:28 +0100
      Re: SortedMap question? Martin Gregorie <martin@address-in-sig.invalid> - 2012-11-25 17:49 +0000
  Re: SortedMap question? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 14:00 +0000

csiph-web