Path: csiph.com!usenet.pasdenom.info!gegeweb.org!eternal-september.org!feeder.eternal-september.org!mx04.eternal-september.org!.POSTED!not-for-mail From: Daniele Futtorovic Newsgroups: comp.lang.java.programmer Subject: Re: optimsed HashMap Date: Tue, 27 Nov 2012 03:35:39 +0100 Organization: A noiseless patient Spider Lines: 54 Message-ID: References: <8i70b8d0pm6ibk03ti4t2pv60jd0bctlcs@4ax.com> <8ip0b8p7blu31eub502so8cus1h9so3m9s@4ax.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Injection-Date: Tue, 27 Nov 2012 02:35:47 +0000 (UTC) Injection-Info: mx04.eternal-september.org; posting-host="a9556dffb1d3cfdf28d52ddfeb38f36b"; logging-data="6406"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19LIW1WculO34JkZMZT9Jjm" User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 In-Reply-To: Cancel-Lock: sha1:/db+dfA7Fegf+hX5Gbj6VxObOv4= Xref: csiph.com comp.lang.java.programmer:19993 On 27/11/2012 03:24, Daniele Futtorovic allegedly wrote: > On 26/11/2012 23:32, Robert Klemme allegedly wrote: >> On 11/26/2012 10:03 PM, Daniele Futtorovic wrote: >>> On 24/11/2012 07:42, Roedy Green allegedly wrote: >> >>>> You go through the files for a website looking at each word of text >>>> (avoiding HTML markup) in the HashMap. If you find it you replace it. >>>> >>>> Most of the time word you look up is not in the list. >>>> >>>> This is a time-consuming process. I would like to speed it up. >>> >>> You might want to intern() the input to avoid having to recompute the >>> hash every time (if applicable). Other than that, you'll either be >>> wanting a better hashing algorithm, to avoid collisions, or indeed >>> something altogether fancier (but riskier in terms or RoI). >> >> How would interning help? The input is read only once anyway > > Depends on the input, of course. But natural text on the web (which > appears to be what this is about) is quite likely to contain the same > words more than once each. > >> and if you >> mean to intern individual words of the input then how does the JVM do >> the interning? > > Like it does all interning? I must admit I couldn't lay out the details > off the top of my head, but the JLS should have them within reasonable > accuracy. > > Of course, this would only be an option for a batch-like program. You > wouldn't want to clutter the string pool of a long-running app. > > Interning would also perhaps allow one to use an IdentityHashMap, and > thus doing away with the (probably relatively costly) string comparisons. > > For sure, this wouldn't be a replacement for more sophisticated > solutions, but could one of the things to try if it is to be kept "simple". > >> My guess would be that some form of hashing would be >> used there as well - plus that internal data structure must be thread >> safe... > > True. > Hm. According to Roedy himself (), the JVM uses a HashMap for intern()'d string lookup. So there may be no point in doing it after all. -- DF.