Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!.POSTED!not-for-mail From: supercalifragilisticexpialadiamaticonormalizeringelimatisticantations Newsgroups: comp.lang.java.programmer Subject: Re: Why "lock" functionality is introduced for all the objects? Date: Wed, 29 Jun 2011 01:05:29 -0400 Organization: supercalifragilisticexpialadiamaticonormalizeringelimatisticantations Lines: 79 Message-ID: References: NNTP-Posting-Host: iGmuHcWtyc5pbaBTyNZhJQ.user.speranza.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: abuse@aioe.org User-Agent: WinVN 0.99.12z (x86 32bit) X-Notice: Filtered by postfilter v. 0.8.2 Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:5763 On 28/06/2011 3:34 PM, Patricia Shanahan wrote: > On 6/28/2011 11:54 AM, > supercalifragilisticexpialadiamaticonormalizeringelimatisticantations > wrote: >> On 28/06/2011 2:42 PM, Patricia Shanahan wrote: >>> Each String instance has the following fields: >>> >>> private final char value[]; >>> private final int offset; >>> private final int count; >>> private int hash; >>> >>> There are 12 bytes in addition to the char array. The offset and count >>> fields allow quick sub-string construction, and hash is used to cache >>> the hashCode result. >> >> Oh, geez, even *more* overhead. And let's not forget the array has its >> own separate object header and length field! > > The array may be shared by several String objects. It usually won't be. Really, how often does anyone use .substring except for a very short-lived object that usually is fed directly into StringBuilder.append() or something that calls that under the hood, or else to an I/O write operation? > In general, many trade-offs in Java, not just the decision to make every > object capable of being a lock, assume that other considerations are > more important than minimizing memory use. For example, caching the hash > code pays four bytes per String in order to have a hash code that > depends on the entire string, without paying the cost of calculating it > repeatedly when a String is used as a hash table key. Funnily enough, using four characters (if there are that many, else the whole string) from near the middle of the string would probably work nearly as well, even for the fairly common cases of many strings sharing a common prefix, suffix, or both. Strings with highly regular middles and variable ends are not very common by contrast. And what does that require? int mid = length >> 1; // emphasizing that a cheap shift op works int start = max(mid-2,0); int end = min(mid+2, length); int hash = 0; int fct = 1; for (int i = start; i < end; ++i) { hash += fct*content[i]; fct *= 256; } For the common case of Latin-1 strings this turns the characters there into the hash bytes directly. Throw in some unicode characters and it gets a bit more interesting as the characters may affect two bytes of the hash each, except the last one of the four. Of course, they could also have used a smarter caching strategy. When is hash caching useful? When the string's in a hash map and going to be looked up in it frequently. But this turns into two subcases: 1. The string already in the hash map is the same *object* as the string used for lookup. 2. The strings are not the same object, though they have the same content. In the latter case, the string passed to get() is obviously not interned and is probably being constructed anew each time, likely from I/O reads. Caching its hash is useless since it's going to be GC'd and recreated sans cached hash. In the former case, the string probably *is* interned, in which case the smart place for the hash cache is in the *string interning table* rather than in the individual string objects, particularly if you could arrange the under-the-hood implementation to use an int[] to hold *all* the hashes instead of separate int fields all over the system. > If, for your purposes, minimal memory use is very important, you may > want to consider other languages with other trade-offs. And here I thought they were trying to heavily push Java for use on mobile phones and other devices with limited memory.