Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!.POSTED!not-for-mail
From: supercalifragilisticexpialadiamaticonormalizeringelimatisticantations <supercalifragilisticexpialadiamaticonormalizeringelimatisticantations@averylongandannoyingdomainname.com>
Newsgroups: comp.lang.java.programmer
Subject: Re: Why "lock" functionality is introduced for all the objects?
Date: Wed, 29 Jun 2011 01:05:29 -0400
Organization: supercalifragilisticexpialadiamaticonormalizeringelimatisticantations
Lines: 79
Message-ID: <iuebqn$ld8$1@speranza.aioe.org>
References: <d0bb9e06-16f0-4282-a37e-47e9ca9630ec@r2g2000vbj.googlegroups.com> <iuce66$2nh$1@news.albasani.net> <iud083$2jr$1@news.onet.pl> <iud1u5$f4r$1@news.albasani.net> <iud4f6$lcm$1@news.onet.pl> <iud5js$nno$1@news.albasani.net> <iud66f$8al$1@speranza.aioe.org> <J5KdnRBP4eYkvZfTnZ2dnUVZ_jqdnZ2d@earthlink.com> <iud80u$ci0$2@speranza.aioe.org> <rYCdnRgQGaQjsZfTnZ2dnUVZ_hydnZ2d@earthlink.com>
NNTP-Posting-Host: iGmuHcWtyc5pbaBTyNZhJQ.user.speranza.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: WinVN 0.99.12z (x86 32bit)
X-Notice: Filtered by postfilter v. 0.8.2
Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:5763

On 28/06/2011 3:34 PM, Patricia Shanahan wrote:
> On 6/28/2011 11:54 AM,
> supercalifragilisticexpialadiamaticonormalizeringelimatisticantations
> wrote:
>> On 28/06/2011 2:42 PM, Patricia Shanahan wrote:
>>> Each String instance has the following fields:
>>>
>>> private final char value[];
>>> private final int offset;
>>> private final int count;
>>> private int hash;
>>>
>>> There are 12 bytes in addition to the char array. The offset and count
>>> fields allow quick sub-string construction, and hash is used to cache
>>> the hashCode result.
>>
>> Oh, geez, even *more* overhead. And let's not forget the array has its
>> own separate object header and length field!
>
> The array may be shared by several String objects.

It usually won't be. Really, how often does anyone use .substring except 
for a very short-lived object that usually is fed directly into 
StringBuilder.append() or something that calls that under the hood, or 
else to an I/O write operation?

> In general, many trade-offs in Java, not just the decision to make every
> object capable of being a lock, assume that other considerations are
> more important than minimizing memory use. For example, caching the hash
> code pays four bytes per String in order to have a hash code that
> depends on the entire string, without paying the cost of calculating it
> repeatedly when a String is used as a hash table key.

Funnily enough, using four characters (if there are that many, else the 
whole string) from near the middle of the string would probably work 
nearly as well, even for the fairly common cases of many strings sharing 
a common prefix, suffix, or both. Strings with highly regular middles 
and variable ends are not very common by contrast. And what does that 
require?

int mid = length >> 1; // emphasizing that a cheap shift op works
int start = max(mid-2,0);
int end = min(mid+2, length);
int hash = 0;
int fct = 1;
for (int i = start; i < end; ++i) {
     hash += fct*content[i];
     fct *= 256;
}

For the common case of Latin-1 strings this turns the characters there 
into the hash bytes directly. Throw in some unicode characters and it 
gets a bit more interesting as the characters may affect two bytes of 
the hash each, except the last one of the four.

Of course, they could also have used a smarter caching strategy. When is 
hash caching useful? When the string's in a hash map and going to be 
looked up in it frequently. But this turns into two subcases:

1. The string already in the hash map is the same *object* as the
    string used for lookup.
2. The strings are not the same object, though they have the same
    content.

In the latter case, the string passed to get() is obviously not interned 
and is probably being constructed anew each time, likely from I/O reads. 
Caching its hash is useless since it's going to be GC'd and recreated 
sans cached hash. In the former case, the string probably *is* interned, 
in which case the smart place for the hash cache is in the *string 
interning table* rather than in the individual string objects, 
particularly if you could arrange the under-the-hood implementation to 
use an int[] to hold *all* the hashes instead of separate int fields all 
over the system.

> If, for your purposes, minimal memory use is very important, you may
> want to consider other languages with other trade-offs.

And here I thought they were trying to heavily push Java for use on 
mobile phones and other devices with limited memory.