Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #19993

Re: optimsed HashMap

From Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid>
Newsgroups comp.lang.java.programmer
Subject Re: optimsed HashMap
Date 2012-11-27 03:35 +0100
Organization A noiseless patient Spider
Message-ID <k918u3$686$1@dont-email.me> (permalink)
References (1 earlier) <k8p85p$hqr$1@dont-email.me> <8ip0b8p7blu31eub502so8cus1h9so3m9s@4ax.com> <k90les$q1j$1@dont-email.me> <ahi90qF48bvU1@mid.individual.net> <k918bb$45a$1@dont-email.me>

Show all headers | View raw


On 27/11/2012 03:24, Daniele Futtorovic allegedly wrote:
> On 26/11/2012 23:32, Robert Klemme allegedly wrote:
>> On 11/26/2012 10:03 PM, Daniele Futtorovic wrote:
>>> On 24/11/2012 07:42, Roedy Green allegedly wrote:
>>
>>>> You go through the files for a website looking at each word of text
>>>> (avoiding HTML markup) in the HashMap. If you find it you replace it.
>>>>
>>>> Most of the time word you look up is not in the list.
>>>>
>>>> This is a time-consuming process.  I would like to speed it up.
>>>
>>> You might want to intern() the input to avoid having to recompute the
>>> hash every time (if applicable). Other than that, you'll either be
>>> wanting a better hashing algorithm, to avoid collisions, or indeed
>>> something altogether fancier (but riskier in terms or RoI).
>>
>> How would interning help?  The input is read only once anyway 
> 
> Depends on the input, of course. But natural text on the web (which
> appears to be what this is about) is quite likely to contain the same
> words more than once each.
> 
>> and if you
>> mean to intern individual words of the input then how does the JVM do
>> the interning?  
> 
> Like it does all interning? I must admit I couldn't lay out the details
> off the top of my head, but the JLS should have them within reasonable
> accuracy.
> 
> Of course, this would only be an option for a batch-like program. You
> wouldn't want to clutter the string pool of a long-running app.
> 
> Interning would also perhaps allow one to use an IdentityHashMap, and
> thus doing away with the (probably relatively costly) string comparisons.
> 
> For sure, this wouldn't be a replacement for more sophisticated
> solutions, but could one of the things to try if it is to be kept "simple".
> 
>> My guess would be that some form of hashing would be
>> used there as well - plus that internal data structure must be thread
>> safe...
> 
> True.
> 

Hm. According to Roedy himself
(<http://www.mindprod.com/jgloss/interned.html#UNDERTHEHOOD>), the JVM
uses a HashMap for intern()'d string lookup. So there may be no point in
doing it after all.

-- 
DF.

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

optimsed HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-11-23 17:12 -0800
  Re: optimsed HashMap Arne Vajhøj <arne@vajhoej.dk> - 2012-11-23 20:19 -0500
  Re: optimsed HashMap markspace <-@.> - 2012-11-23 17:33 -0800
    Re: optimsed HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-11-23 22:42 -0800
      Re: optimsed HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-11-24 03:34 -0800
        Re: optimsed HashMap Knute Johnson <nospam@knutejohnson.com> - 2012-11-24 08:39 -0800
          Re: optimsed HashMap Knute Johnson <nospam@rabbitbrush.frazmtn.com> - 2012-11-24 15:14 -0800
      Re: optimsed HashMap Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 13:24 -0500
      Re: optimsed HashMap markspace <-@.> - 2012-11-24 10:44 -0800
        Re: optimsed HashMap "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 13:40 +0000
      Re: optimsed HashMap Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-11-26 22:03 +0100
        Re: optimsed HashMap Robert Klemme <shortcutter@googlemail.com> - 2012-11-26 23:32 +0100
          Re: optimsed HashMap Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-11-27 03:24 +0100
            Re: optimsed HashMap Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-11-27 03:35 +0100
              Re: optimsed HashMap Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-27 08:44 -0500
                Re: optimsed HashMap Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-27 14:20 -0800
                Re: optimsed HashMap Daniele Futtorovic <da.futt.news@laposte-dot-net.invalid> - 2012-11-30 03:35 +0100
  Re: optimsed HashMap Patricia Shanahan <pats@acm.org> - 2012-11-23 19:51 -0800
    Re: optimsed HashMap "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-24 10:21 +0000
      Re: optimsed HashMap Roedy Green <see_website@mindprod.com.invalid> - 2012-11-24 03:39 -0800
        Re: optimsed HashMap Robert Klemme <shortcutter@googlemail.com> - 2012-11-24 16:24 +0100
          Re: optimsed HashMap "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-25 13:50 +0000
            Re: optimsed HashMap Robert Klemme <shortcutter@googlemail.com> - 2012-11-25 15:30 +0100
      Re: optimsed HashMap "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-11-26 21:13 +0000
    Re: optimsed HashMap Arne Vajhøj <arne@vajhoej.dk> - 2012-11-24 13:16 -0500
  Re: optimsed HashMap v_borchert@despammed.com (Volker Borchert) - 2012-11-24 08:05 +0000
  Re: optimsed HashMap Silvio <silvio@internet.com> - 2012-11-26 11:57 +0100
  Re: optimsed HashMap Jim Janney <jjanney@shell.xmission.com> - 2012-11-26 11:13 -0700
  Re: optimsed HashMap Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-26 15:44 -0800
    Re: optimsed HashMap Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-26 20:28 -0500
      Re: optimsed HashMap Arved Sandstrom <asandstrom2@eastlink.ca> - 2012-11-27 06:01 -0400
        Re: optimsed HashMap Eric Sosman <esosman@comcast-dot-net.invalid> - 2012-11-27 08:56 -0500
      Re: optimsed HashMap Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-11-27 14:16 -0800

csiph-web