Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #19992

Re: optimsed HashMap

From Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid>
Newsgroups comp.lang.java.programmer
Subject Re: optimsed HashMap
Date 2012-11-27 03:24 +0100
Organization A noiseless patient Spider
Message-ID <k918bb$45a$1@dont-email.me> (permalink)
References <8i70b8d0pm6ibk03ti4t2pv60jd0bctlcs@4ax.com> <k8p85p$hqr$1@dont-email.me> <8ip0b8p7blu31eub502so8cus1h9so3m9s@4ax.com> <k90les$q1j$1@dont-email.me> <ahi90qF48bvU1@mid.individual.net>

Show all headers | View raw


On 26/11/2012 23:32, Robert Klemme allegedly wrote:
> On 11/26/2012 10:03 PM, Daniele Futtorovic wrote:
>> On 24/11/2012 07:42, Roedy Green allegedly wrote:
> 
>>> You go through the files for a website looking at each word of text
>>> (avoiding HTML markup) in the HashMap. If you find it you replace it.
>>>
>>> Most of the time word you look up is not in the list.
>>>
>>> This is a time-consuming process.  I would like to speed it up.
>>
>> You might want to intern() the input to avoid having to recompute the
>> hash every time (if applicable). Other than that, you'll either be
>> wanting a better hashing algorithm, to avoid collisions, or indeed
>> something altogether fancier (but riskier in terms or RoI).
> 
> How would interning help?  The input is read only once anyway 

Depends on the input, of course. But natural text on the web (which
appears to be what this is about) is quite likely to contain the same
words more than once each.

> and if you
> mean to intern individual words of the input then how does the JVM do
> the interning?  

Like it does all interning? I must admit I couldn't lay out the details
off the top of my head, but the JLS should have them within reasonable
accuracy.

Of course, this would only be an option for a batch-like program. You
wouldn't want to clutter the string pool of a long-running app.

Interning would also perhaps allow one to use an IdentityHashMap, and
thus doing away with the (probably relatively costly) string comparisons.

For sure, this wouldn't be a replacement for more sophisticated
solutions, but could one of the things to try if it is to be kept "simple".

> My guess would be that some form of hashing would be
> used there as well - plus that internal data structure must be thread
> safe...

True.

-- 
DF.

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

optimsed HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-11-23 17:12 -0800
  Re: optimsed HashMap Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 20:19 -0500
  Re: optimsed HashMap markspace <-@.> - 2012-11-23 17:33 -0800
    Re: optimsed HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-11-23 22:42 -0800
      Re: optimsed HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-11-24 03:34 -0800
        Re: optimsed HashMap Knute Johnson <nospam@knutejohnson.com> - 2012-11-24 08:39 -0800
          Re: optimsed HashMap Knute Johnson <nospam@rabbitbrush.frazmtn.com> - 2012-11-24 15:14 -0800
      Re: optimsed HashMap Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 13:24 -0500
      Re: optimsed HashMap markspace <-@.> - 2012-11-24 10:44 -0800
        Re: optimsed HashMap "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 13:40 +0000
      Re: optimsed HashMap Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-11-26 22:03 +0100
        Re: optimsed HashMap Robert Klemme <shortcutter@googlemail.com> - 2012-11-26 23:32 +0100
          Re: optimsed HashMap Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-11-27 03:24 +0100
            Re: optimsed HashMap Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-11-27 03:35 +0100
              Re: optimsed HashMap Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-27 08:44 -0500
                Re: optimsed HashMap Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-27 14:20 -0800
                Re: optimsed HashMap Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-11-30 03:35 +0100
  Re: optimsed HashMap Patricia Shanahan <pats@acm.org> - 2012-11-23 19:51 -0800
    Re: optimsed HashMap "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-24 10:21 +0000
      Re: optimsed HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-11-24 03:39 -0800
        Re: optimsed HashMap Robert Klemme <shortcutter@googlemail.com> - 2012-11-24 16:24 +0100
          Re: optimsed HashMap "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 13:50 +0000
            Re: optimsed HashMap Robert Klemme <shortcutter@googlemail.com> - 2012-11-25 15:30 +0100
      Re: optimsed HashMap "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-26 21:13 +0000
    Re: optimsed HashMap Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 13:16 -0500
  Re: optimsed HashMap v_borchert@despammed.com (Volker Borchert) - 2012-11-24 08:05 +0000
  Re: optimsed HashMap Silvio <silvio@internet.com> - 2012-11-26 11:57 +0100
  Re: optimsed HashMap Jim Janney <jjanney@shell.xmission.com> - 2012-11-26 11:13 -0700
  Re: optimsed HashMap Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-26 15:44 -0800
    Re: optimsed HashMap Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-26 20:28 -0500
      Re: optimsed HashMap Arved Sandstrom <asandstrom2@eastlink.ca> - 2012-11-27 06:01 -0400
        Re: optimsed HashMap Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-27 08:56 -0500
      Re: optimsed HashMap Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-27 14:16 -0800

csiph-web