Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #19993
| From | Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> |
|---|---|
| Newsgroups | comp.lang.java.programmer |
| Subject | Re: optimsed HashMap |
| Date | 2012-11-27 03:35 +0100 |
| Organization | A noiseless patient Spider |
| Message-ID | <k918u3$686$1@dont-email.me> (permalink) |
| References | (1 earlier) <k8p85p$hqr$1@dont-email.me> <8ip0b8p7blu31eub502so8cus1h9so3m9s@4ax.com> <k90les$q1j$1@dont-email.me> <ahi90qF48bvU1@mid.individual.net> <k918bb$45a$1@dont-email.me> |
On 27/11/2012 03:24, Daniele Futtorovic allegedly wrote: > On 26/11/2012 23:32, Robert Klemme allegedly wrote: >> On 11/26/2012 10:03 PM, Daniele Futtorovic wrote: >>> On 24/11/2012 07:42, Roedy Green allegedly wrote: >> >>>> You go through the files for a website looking at each word of text >>>> (avoiding HTML markup) in the HashMap. If you find it you replace it. >>>> >>>> Most of the time word you look up is not in the list. >>>> >>>> This is a time-consuming process. I would like to speed it up. >>> >>> You might want to intern() the input to avoid having to recompute the >>> hash every time (if applicable). Other than that, you'll either be >>> wanting a better hashing algorithm, to avoid collisions, or indeed >>> something altogether fancier (but riskier in terms or RoI). >> >> How would interning help? The input is read only once anyway > > Depends on the input, of course. But natural text on the web (which > appears to be what this is about) is quite likely to contain the same > words more than once each. > >> and if you >> mean to intern individual words of the input then how does the JVM do >> the interning? > > Like it does all interning? I must admit I couldn't lay out the details > off the top of my head, but the JLS should have them within reasonable > accuracy. > > Of course, this would only be an option for a batch-like program. You > wouldn't want to clutter the string pool of a long-running app. > > Interning would also perhaps allow one to use an IdentityHashMap, and > thus doing away with the (probably relatively costly) string comparisons. > > For sure, this wouldn't be a replacement for more sophisticated > solutions, but could one of the things to try if it is to be kept "simple". > >> My guess would be that some form of hashing would be >> used there as well - plus that internal data structure must be thread >> safe... > > True. > Hm. According to Roedy himself (<http://www.mindprod.com/jgloss/interned.html#UNDERTHEHOOD>), the JVM uses a HashMap for intern()'d string lookup. So there may be no point in doing it after all. -- DF.
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
optimsed HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-11-23 17:12 -0800
Re: optimsed HashMap Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 20:19 -0500
Re: optimsed HashMap markspace <-@.> - 2012-11-23 17:33 -0800
Re: optimsed HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-11-23 22:42 -0800
Re: optimsed HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-11-24 03:34 -0800
Re: optimsed HashMap Knute Johnson <nospam@knutejohnson.com> - 2012-11-24 08:39 -0800
Re: optimsed HashMap Knute Johnson <nospam@rabbitbrush.frazmtn.com> - 2012-11-24 15:14 -0800
Re: optimsed HashMap Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 13:24 -0500
Re: optimsed HashMap markspace <-@.> - 2012-11-24 10:44 -0800
Re: optimsed HashMap "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 13:40 +0000
Re: optimsed HashMap Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-11-26 22:03 +0100
Re: optimsed HashMap Robert Klemme <shortcutter@googlemail.com> - 2012-11-26 23:32 +0100
Re: optimsed HashMap Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-11-27 03:24 +0100
Re: optimsed HashMap Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-11-27 03:35 +0100
Re: optimsed HashMap Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-27 08:44 -0500
Re: optimsed HashMap Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-27 14:20 -0800
Re: optimsed HashMap Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-11-30 03:35 +0100
Re: optimsed HashMap Patricia Shanahan <pats@acm.org> - 2012-11-23 19:51 -0800
Re: optimsed HashMap "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-24 10:21 +0000
Re: optimsed HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-11-24 03:39 -0800
Re: optimsed HashMap Robert Klemme <shortcutter@googlemail.com> - 2012-11-24 16:24 +0100
Re: optimsed HashMap "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 13:50 +0000
Re: optimsed HashMap Robert Klemme <shortcutter@googlemail.com> - 2012-11-25 15:30 +0100
Re: optimsed HashMap "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-26 21:13 +0000
Re: optimsed HashMap Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 13:16 -0500
Re: optimsed HashMap v_borchert@despammed.com (Volker Borchert) - 2012-11-24 08:05 +0000
Re: optimsed HashMap Silvio <silvio@internet.com> - 2012-11-26 11:57 +0100
Re: optimsed HashMap Jim Janney <jjanney@shell.xmission.com> - 2012-11-26 11:13 -0700
Re: optimsed HashMap Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-26 15:44 -0800
Re: optimsed HashMap Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-26 20:28 -0500
Re: optimsed HashMap Arved Sandstrom <asandstrom2@eastlink.ca> - 2012-11-27 06:01 -0400
Re: optimsed HashMap Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-27 08:56 -0500
Re: optimsed HashMap Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-27 14:16 -0800
csiph-web