Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!.POSTED!not-for-mail From: supercalifragilisticexpialadiamaticonormalizeringelimatisticantations Newsgroups: comp.lang.java.programmer Subject: Re: Why "lock" functionality is introduced for all the objects? Date: Tue, 28 Jun 2011 14:52:10 -0400 Organization: supercalifragilisticexpialadiamaticonormalizeringelimatisticantations Lines: 132 Message-ID: References: NNTP-Posting-Host: quLqbYjFExRqcR6wI1lYGQ.user.speranza.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: abuse@aioe.org User-Agent: WinVN 0.99.12z (x86 32bit) X-Notice: Filtered by postfilter v. 0.8.2 Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:5746 On 28/06/2011 2:33 PM, Lew wrote: > On 06/28/2011 02:23 PM, > supercalifragilisticexpialadiamaticonormalizeringelimatisticantations > wrote: >> On 28/06/2011 2:13 PM, Lew wrote: >>> Michal Kleczek wrote: >>>> Lew wrote: >>>>> Show me the numbers. What penalty? >>>> >>>> It is (almost) twice as much memory as it could be and twice as much GC >>>> cycles. Almost because in real world the number of objects that you >>>> need to >>> >>> Nonsense. It's an extra 4 bytes per object. Most objects are much larger >>> than 4 bytes, >> >> Bullpuckey, other than that a nontrivial object is always at least 12 >> bytes > > So 4 bytes overhead is less than 100%, as I said. I didn't dispute that. I disputed "most objects are much larger than 4 bytes". Most objects are only a little bit larger than 4 bytes. > Most strings in a typical program are non-empty and generally longer > than two bytes. A lot longer, though, or only a little? > A good percentage are interned. Not in my experience. > Strings in many runtime contexts refer to substrings of those already > in memory, saving overhead. Not in my experience. And substring sharing is a three-edged sword, with two possible downsides: 1. A small string hangs onto a much larger char array than is needed, the rest of which is unused but can't be collected. 2. Small strings are even less efficient if one adds an offset as well as a length field to the string, besides the pointer to the char array. And let's not forget that a string incurs the object overhead twice, once for the string and once for the embedded array, assuming that array ISN'T (and it usually isn't) shared. So we're looking at one object header going along with 12 bytes of offset, length, pointer to array; then another going along with 4 bytes of length and 2 per character for the actual array contents. For an eight-character string we have 16 bytes of actual data and 32 bytes of overhead from two redundant (if the array isn't shared) length fields, an offset field, a pointer, and two 8-byte object headers. That's 33% meat and 67% fat, folks. For an EIGHT character string. A substrings-are-separate implementation fares somewhat better: eight byte object header, four byte pointer, eight byte object header, four byte length, array contents, for 24 rather than 32 bytes of cruft. Still 60% overhead. If Java had const and typedef and auxiliary methods so you could just declare that String is another name for const char[] and tack on the String methods, you'd get away with just 12 bytes of overhead: array object header and length field. Now the 8 character string is actually more than 50% meat instead of fat. Well, unless you count all the empty space between the probably-ASCII-bytes ... encoding them internally as UTF-8 would save a lot more space in the common case. Maybe we should assume that only about 60% of the space taken up by the actual chars in the string is non-wasted, presuming a low but nonzero prevalence of characters outside of ISO-8859-1; now a ten character string has four wasted bytes internally, plus the object header/various fields of overhead. Still somewhat icky. Java strings are quite inefficient any way you slice 'em. But at least we can get their lengths in O(1) instead of O(n). Take *that*, C weenies! Oh, wait, most strings are short and it wouldn't take many cycles to find their lengths at O(n) anyway ... > Integer objects make up a small fraction of most programs. Many Integer > instances are shared, especially if one follows best practices. Not a > lot of memory pressure there. Not my experience again, not since 1.5. Before autoboxing was introduced you might have been right; now I expect there's a lot of "hidden" (i.e., the capitalized classname doesn't appear in code much) use of boxed primitives, particularly in collections. > You show only that the overhead of 4 bytes per object is less than 100% > of the object's memory footprint, which is what I said. Keep on attacking that straw man ... > Which footprint can be reduced by HotSpot, to the point of pulling an > object out of the heap altogether. ??? > Where are the numbers? Everyone's arguing from speculation. Show me the > numbers. > > Real numbers. From actual runs. What is the overhead, really? Stop > making shit up. Stop accusing me of lying when I've done nothing of the sort. > Show me the numbers. http://c2.com/cgi/wiki?JavaObjectOverheadIsRidiculous People ran tests and found an 8 byte overhead per object, much as was claimed earlier in this thread. Oh, and that an array of java.awt.Points containing pairs of ints is 60% overhead and 40% actual integer values in the limit of many array elements -- so array-related overheads (object header, length field) go away. That suggests something eating another 4 bytes per object *on top of* the Points' object headers and integer values, showing that Point has some extra field in it taking up space. And in regard to the original topic of this thread, http://c2.com/cgi/wiki?EveryObjectIsaMonitor raises some very good points, including that forcing people to use the java.util.concurrent classes (while making "synchronized" exception-safely lock a ReentrantLock, or similar) or having objects only be lockable if they implemented a Lockable interface or inherited a Monitor class would have resulted in code having to document its concurrency semantics and explicitly declare which objects and which types of objects were meant to be used as monitors; this more-self-documenting-code point in which intended-to-be-locked is part of something's type and subjected to type safety was not raised in this thread. Until now.