Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #19886
| From | v_borchert@despammed.com (Volker Borchert) |
|---|---|
| Newsgroups | comp.lang.java.programmer |
| Subject | Re: optimsed HashMap |
| Date | 2012-11-24 08:05 +0000 |
| Organization | Private site at Eddersheim, Germany |
| Message-ID | <k8pv53$7gl$1@Gaia.teknon.de> (permalink) |
| References | <8i70b8d0pm6ibk03ti4t2pv60jd0bctlcs@4ax.com> |
Roedy Green wrote:
> Is there something like HashMap but that optimised when nearly always
> the thing you are looking up is not in the list, and when you can add
> the list of words to look up and then freeze it.
>
> I have to scan an entire website looking up every word.
Unless said web site is www.archive.org, just create a HashMap
and use a Collections.unmodifiableMap() wrapper for lookup. I'd
expect memory to become a problem way earlier than processor.
Depending on how often each String is looked up, and how long the
Strings are, a data structure that does not need String.hashCode()
might be faster.
I'd consider a tree along the following lines:
Characters not in [a-zA-Z] are expressed as `nnnn - the ASCII value of
'`' is 'a' - 1. Characters in [A-Z] are lowercased. Nodes contain a
boolean and a Node[27]. The tree contains a word if there is a path
from the root to a Node whose boolean is true and that is reached via
Node[String.charAt(l)] at level l.
For example, a tree containing exactly
"a", "at", "and", "no", "nil", "not"
would be
N0 = { false, [ null, N1, 12*null, N2, 12*null ] } // ""
N1 = { true, [ null, 13*null, N3, 5*null, N4, 6*null ] } // "a"
N2 = { false, [ null, 8*null, N5, 5*null, N6, 11*null ] } // "n"
N3 = { false, [ null, 13*null, N7, 12*null ] } // "an"
N4 = { true, [ null, 26*null ] } // "at"
N5 = { false, [ null, 11*null, N8, 14*null ] } // "ni"
N6 = { true, [ null, 19*null, N9, 6*null ] } // "no"
N7 = { true, [ null, 26*null ] } // "and"
N8 = { true, [ null, 26*null ] } // "nil"
N9 = { true, [ null, 26*null ] } // "not"
For a Map, replace the boolean with a field holding the value.
Possible improvements:
- Since we have three (32bit VM) or seven (64bit VM) padding bytes
in each Node, we could use two of them to hold a short indicating
the maximum length of Strings in the subtree.
- Instantiate Node[] only if actually needed to hold a child, and only
large enough to hold all currently known children. Use one of the
padding bytes to hold an index offset, such that a Node with only
a single child only holds a Node[1]:
N4 = { true, 0, null }
N6 = { true, 20, [ N9 ] }
- Use an interface and specialized concrete classes for
- leaf node (no fields at all, because boolean will always be true)
- single child node { boolean, Node }
or even "true" and "false" single child nodes
- ...
but this will considerable increase preprocessing cost.
- If many of your words contain non-[a-z] characters, provide array
positions for the most common of these, or use a linear probing
hash table for the non-[a-z] children.
- If the tree is very sparse, doing so even for the [a-z] children
might be better in the long run because more of it fits into caches.
This surely has been invented before and given a nice name...
Apache's or Google's collection libraries might even feature
high quality implementations.
--
"I'm a doctor, not a mechanic." Dr Leonard McCoy <mccoy@ncc1701.starfleet.fed>
"I'm a mechanic, not a doctor." Volker Borchert <v_borchert@despammed.com>
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
optimsed HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-11-23 17:12 -0800
Re: optimsed HashMap Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 20:19 -0500
Re: optimsed HashMap markspace <-@.> - 2012-11-23 17:33 -0800
Re: optimsed HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-11-23 22:42 -0800
Re: optimsed HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-11-24 03:34 -0800
Re: optimsed HashMap Knute Johnson <nospam@knutejohnson.com> - 2012-11-24 08:39 -0800
Re: optimsed HashMap Knute Johnson <nospam@rabbitbrush.frazmtn.com> - 2012-11-24 15:14 -0800
Re: optimsed HashMap Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 13:24 -0500
Re: optimsed HashMap markspace <-@.> - 2012-11-24 10:44 -0800
Re: optimsed HashMap "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 13:40 +0000
Re: optimsed HashMap Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-11-26 22:03 +0100
Re: optimsed HashMap Robert Klemme <shortcutter@googlemail.com> - 2012-11-26 23:32 +0100
Re: optimsed HashMap Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-11-27 03:24 +0100
Re: optimsed HashMap Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-11-27 03:35 +0100
Re: optimsed HashMap Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-27 08:44 -0500
Re: optimsed HashMap Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-27 14:20 -0800
Re: optimsed HashMap Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-11-30 03:35 +0100
Re: optimsed HashMap Patricia Shanahan <pats@acm.org> - 2012-11-23 19:51 -0800
Re: optimsed HashMap "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-24 10:21 +0000
Re: optimsed HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-11-24 03:39 -0800
Re: optimsed HashMap Robert Klemme <shortcutter@googlemail.com> - 2012-11-24 16:24 +0100
Re: optimsed HashMap "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 13:50 +0000
Re: optimsed HashMap Robert Klemme <shortcutter@googlemail.com> - 2012-11-25 15:30 +0100
Re: optimsed HashMap "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-26 21:13 +0000
Re: optimsed HashMap Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 13:16 -0500
Re: optimsed HashMap v_borchert@despammed.com (Volker Borchert) - 2012-11-24 08:05 +0000
Re: optimsed HashMap Silvio <silvio@internet.com> - 2012-11-26 11:57 +0100
Re: optimsed HashMap Jim Janney <jjanney@shell.xmission.com> - 2012-11-26 11:13 -0700
Re: optimsed HashMap Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-26 15:44 -0800
Re: optimsed HashMap Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-26 20:28 -0500
Re: optimsed HashMap Arved Sandstrom <asandstrom2@eastlink.ca> - 2012-11-27 06:01 -0400
Re: optimsed HashMap Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-27 08:56 -0500
Re: optimsed HashMap Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-27 14:16 -0800
csiph-web