Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!eternal-september.org!feeder.eternal-september.org!mx04.eternal-september.org!.POSTED!not-for-mail From: Jim Janney Newsgroups: comp.lang.java.programmer Subject: Re: optimsed HashMap Date: Mon, 26 Nov 2012 11:13:16 -0700 Organization: spotted and inconstant Lines: 21 Message-ID: References: <8i70b8d0pm6ibk03ti4t2pv60jd0bctlcs@4ax.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: mx04.eternal-september.org; posting-host="75975abe3fe3503ca7350803ab98e478"; logging-data="26644"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX198+koPMrjc1eDnBiSQQvX8" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (gnu/linux) Cancel-Lock: sha1:mXpUiRQPitiSm1oSdwyw5aFdfsM= sha1:S/vIv01HIOxugkecMiBd4m+cCXo= Xref: csiph.com comp.lang.java.programmer:19981 Roedy Green writes: > Is there something like HashMap but that optimised when nearly always > the thing you are looking up is not in the list, and when you can add > the list of words to look up and then freeze it. > > I have to scan an entire website looking up every word. In this case, where you are always looking up a newly created string, it's likely that computing the hash code takes more time than the actual lookup: java.lang.String.hashCode() looks at every character in the string. You could experiment with a weaker hash code that is cheaper to compute but generates more collisions, for example only hash the first n characters for some small value of n. Or take the length of the string (very cheap to compute) and check it against a BitSet. Or do the same with the first (or last) character. How well these strategies work will depend very much on the actual data. -- Jim Janney