Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!news.musoftware.de!wum.musoftware.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: Robert Klemme <shortcutter@googlemail.com>
Newsgroups: comp.lang.java.programmer
Subject: Re: Bulk Array Element Allocation, is it faster?
Date: Sun, 25 Sep 2011 14:16:49 +0200
Lines: 102
Message-ID: <9e8kdhF6lmU1@mid.individual.net>
References: <j5lvf0$bhl$1@news.albasani.net> <Ypydnd8C3rt7G-PTnZ2dnUVZ_iydnZ2d@earthlink.com> <9e8fplF19bU1@mid.individual.net> <j5n3qv$dbo$1@news.albasani.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net eQmlFnc/W6t9yNex71QgOQLIQHD0I0KMkrrb8zZwWp+7gJ86E=
Cancel-Lock: sha1:GA8qRIIxbWWA3ZwR5hTjztkMV30=
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.21) Gecko/20110831 Lightning/1.0b2 Thunderbird/3.1.13
In-Reply-To: <j5n3qv$dbo$1@news.albasani.net>
Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:8291

On 09/25/2011 01:38 PM, Jan Burse wrote:
> Robert Klemme schrieb:
>> On 09/25/2011 03:41 AM, Patricia Shanahan wrote:
>>> On 9/24/2011 6:17 PM, Jan Burse wrote:
>>>> Dear All,
>>>>
>>>> I just wonder wether modern JIT do optimize
>>>> Code of the following form:
>>>>
>>>> Bla[] bla = new Bla[n];
>>>> for (int i=0; i<n; i++) {
>>>> bla[i] = new Bla();
>>>> }
>>>>
>>>> When I use the above in my code, my application
>>>> spends 8800 ms in the benchmark I have.
>>>>
>>>> When I then change the code to:
>>>>
>>>> Bla[] bla = new Bla[n];
>>>>
>>>> ...
>>>>
>>>> if (bla[i]==null)
>>>> bla[i] = new Bla();
>>>>
>>>> ..
>>>>
>>>> So instead of allocating all elements at once,
>>>> I will allocate them on demand in the subsequent
>>>> flow of my application.
>>>>
>>>> When I use the above in my code, my application
>>>> now suddently sends 9600 ms in the benchmark
>>>> I have.
>>>>
>>>> So I wonder whether eventually the JIT has
>>>> a way to detect the bulk allocate of the first
>>>> version and transform it into something more
>>>> efficient than my lazy allocation.
>>>>
>>>> Any clues?
>>>
>>> You also need to consider the general optimization of processors in
>>> favor of doing efficiently those things they have done in the recent
>>> past.
>>>
>>> When you do the allocation all at once, the code and data for "new
>>> Bla()" is in cache on the second and subsequent calls. There may be
>>> cache conflicts between "new Bla()" and the actual work, leading to
>>> many more cache misses when you interleave them.
>>>
>>> Doing the initialization on demand may be adding an unpredictable
>>> conditional branch to the subsequent flow.
>>
>> This and the fact that lazy initialization has concurrency issues when
>> used in a multi threading context (which e.g. final members do not have)
>> has made me use this approach less frequently. Also, for short lived
>> objects in total it might be much more efficient to just allocate the
>> object even if it is not used because it won't survive new generation
>> anyway. I think the lazy init idiom only really pays off if construction
>> is expensive (e.g. because it involves IO or time consuming
>> calculations). In all other cases it's better to just unconditionally
>> create and let GC work. And because of improvements in JVM technology
>> the balance has moved a lot away from the lazy side because allocation
>> and deallocation overhead became smaller than in earlier Java versions.
>>
>> Kind regards
>>
>> robert
>
> Yes, really seems so. Looks that the infinite heap idea
> works (notion borrowed from a talk by Cliff Click found
> on the net, should indicate that we just do this, allocate
> and don't care much).
>
> I made some additional benchmarks, in the present case the lazy
> does not so much good in that it saves allocates. I got the
> following figures for the application at hand:
>
> Bulk: 84'393'262 allocates
> Lazy: 81'662'843 allocates
>
> So the application does not have many exit points in the flow
> following the array creation, except for the last exit point
> when anyway all objects were needed.
>
> But still I have some doubts! The array I allocate are only
> 4 elements long or so? Why is there still such a big
> difference between allocating in bulk only 4 elements and
> doing them one after the other?
>
> Overhead by the lazy detection itself? I doubt this also
> since the array is anyway accessed, and the lazy check
> is a small null pointer check.

Yes, but the cost is not in the check but in the branching on processor 
level (see what Patricia wrote).

Kind regards

	robert