Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!.POSTED!not-for-mail
From: Jan Burse <janburse@fastmail.fm>
Newsgroups: comp.lang.java.programmer
Subject: Re: Bulk Array Element Allocation, is it faster?
Date: Sun, 25 Sep 2011 13:38:06 +0200
Organization: albasani.net
Lines: 98
Message-ID: <j5n3qv$dbo$1@news.albasani.net>
References: <j5lvf0$bhl$1@news.albasani.net> <Ypydnd8C3rt7G-PTnZ2dnUVZ_iydnZ2d@earthlink.com> <9e8fplF19bU1@mid.individual.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: news.albasani.net ma+Kq1MFUXfjWcYHYKz2w8XGWHKZKyafQcxk/SM6WW6OUKyC75riGdTUnn8c2BKQORFncO0chRsU1F8mj4Nc1S9DKw0yJj1Cv7sNMCF6weokAVA/SDpjvzhWUyErPisr
NNTP-Posting-Date: Sun, 25 Sep 2011 11:38:07 +0000 (UTC)
Injection-Info: news.albasani.net; logging-data="L0uV6I6dHwvNQkutilNuSfyzlwqxEsOt493l2xw0wvzucAQqy8dCGh+wyqjvvCGUmPxFtlAuWPjTrBzxYHDX2nA6LX7B/EyOdTl8TQaUfh6mV6CLhW/RJBOJp1RauBIk"; mail-complaints-to="abuse@albasani.net"
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20110902 Firefox/6.0.2 SeaMonkey/2.3.3
In-Reply-To: <9e8fplF19bU1@mid.individual.net>
Cancel-Lock: sha1:KbZzrk143QFplyySxflR3ixSQqc=
Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:8289

Robert Klemme schrieb:
> On 09/25/2011 03:41 AM, Patricia Shanahan wrote:
>> On 9/24/2011 6:17 PM, Jan Burse wrote:
>>> Dear All,
>>>
>>> I just wonder wether modern JIT do optimize
>>> Code of the following form:
>>>
>>> Bla[] bla = new Bla[n];
>>> for (int i=0; i<n; i++) {
>>> bla[i] = new Bla();
>>> }
>>>
>>> When I use the above in my code, my application
>>> spends 8800 ms in the benchmark I have.
>>>
>>> When I then change the code to:
>>>
>>> Bla[] bla = new Bla[n];
>>>
>>> ...
>>>
>>> if (bla[i]==null)
>>> bla[i] = new Bla();
>>>
>>> ..
>>>
>>> So instead of allocating all elements at once,
>>> I will allocate them on demand in the subsequent
>>> flow of my application.
>>>
>>> When I use the above in my code, my application
>>> now suddently sends 9600 ms in the benchmark
>>> I have.
>>>
>>> So I wonder whether eventually the JIT has
>>> a way to detect the bulk allocate of the first
>>> version and transform it into something more
>>> efficient than my lazy allocation.
>>>
>>> Any clues?
>>
>> You also need to consider the general optimization of processors in
>> favor of doing efficiently those things they have done in the recent
>> past.
>>
>> When you do the allocation all at once, the code and data for "new
>> Bla()" is in cache on the second and subsequent calls. There may be
>> cache conflicts between "new Bla()" and the actual work, leading to
>> many more cache misses when you interleave them.
>>
>> Doing the initialization on demand may be adding an unpredictable
>> conditional branch to the subsequent flow.
>
> This and the fact that lazy initialization has concurrency issues when
> used in a multi threading context (which e.g. final members do not have)
> has made me use this approach less frequently. Also, for short lived
> objects in total it might be much more efficient to just allocate the
> object even if it is not used because it won't survive new generation
> anyway. I think the lazy init idiom only really pays off if construction
> is expensive (e.g. because it involves IO or time consuming
> calculations). In all other cases it's better to just unconditionally
> create and let GC work. And because of improvements in JVM technology
> the balance has moved a lot away from the lazy side because allocation
> and deallocation overhead became smaller than in earlier Java versions.
>
> Kind regards
>
> robert

Yes, really seems so. Looks that the infinite heap idea
works (notion borrowed from a talk by Cliff Click found
on the net, should indicate that we just do this, allocate
and don't care much).

I made some additional benchmarks, in the present case the lazy
does not so much good in that it saves allocates. I got the
following figures for the application at hand:

   Bulk:  84'393'262 allocates
   Lazy:  81'662'843 allocates

So the application does not have many exit points in the flow
following the array creation, except for the last exit point
when anyway all objects were needed.

But still I have some doubts! The array I allocate are only
4 elements long or so? Why is there still such a big
difference between allocating in bulk only 4 elements and
doing them one after the other?

Overhead by the lazy detection itself? I doubt this also
since the array is anyway accessed, and the lazy check
is a small null pointer check.

I am still puzzled.

Bye