Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!news.musoftware.de!wum.musoftware.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Robert Klemme Newsgroups: comp.lang.java.programmer Subject: Re: Bulk Array Element Allocation, is it faster? Date: Sun, 25 Sep 2011 14:16:49 +0200 Lines: 102 Message-ID: <9e8kdhF6lmU1@mid.individual.net> References: <9e8fplF19bU1@mid.individual.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: individual.net eQmlFnc/W6t9yNex71QgOQLIQHD0I0KMkrrb8zZwWp+7gJ86E= Cancel-Lock: sha1:GA8qRIIxbWWA3ZwR5hTjztkMV30= User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.21) Gecko/20110831 Lightning/1.0b2 Thunderbird/3.1.13 In-Reply-To: Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:8291 On 09/25/2011 01:38 PM, Jan Burse wrote: > Robert Klemme schrieb: >> On 09/25/2011 03:41 AM, Patricia Shanahan wrote: >>> On 9/24/2011 6:17 PM, Jan Burse wrote: >>>> Dear All, >>>> >>>> I just wonder wether modern JIT do optimize >>>> Code of the following form: >>>> >>>> Bla[] bla = new Bla[n]; >>>> for (int i=0; i>>> bla[i] = new Bla(); >>>> } >>>> >>>> When I use the above in my code, my application >>>> spends 8800 ms in the benchmark I have. >>>> >>>> When I then change the code to: >>>> >>>> Bla[] bla = new Bla[n]; >>>> >>>> ... >>>> >>>> if (bla[i]==null) >>>> bla[i] = new Bla(); >>>> >>>> .. >>>> >>>> So instead of allocating all elements at once, >>>> I will allocate them on demand in the subsequent >>>> flow of my application. >>>> >>>> When I use the above in my code, my application >>>> now suddently sends 9600 ms in the benchmark >>>> I have. >>>> >>>> So I wonder whether eventually the JIT has >>>> a way to detect the bulk allocate of the first >>>> version and transform it into something more >>>> efficient than my lazy allocation. >>>> >>>> Any clues? >>> >>> You also need to consider the general optimization of processors in >>> favor of doing efficiently those things they have done in the recent >>> past. >>> >>> When you do the allocation all at once, the code and data for "new >>> Bla()" is in cache on the second and subsequent calls. There may be >>> cache conflicts between "new Bla()" and the actual work, leading to >>> many more cache misses when you interleave them. >>> >>> Doing the initialization on demand may be adding an unpredictable >>> conditional branch to the subsequent flow. >> >> This and the fact that lazy initialization has concurrency issues when >> used in a multi threading context (which e.g. final members do not have) >> has made me use this approach less frequently. Also, for short lived >> objects in total it might be much more efficient to just allocate the >> object even if it is not used because it won't survive new generation >> anyway. I think the lazy init idiom only really pays off if construction >> is expensive (e.g. because it involves IO or time consuming >> calculations). In all other cases it's better to just unconditionally >> create and let GC work. And because of improvements in JVM technology >> the balance has moved a lot away from the lazy side because allocation >> and deallocation overhead became smaller than in earlier Java versions. >> >> Kind regards >> >> robert > > Yes, really seems so. Looks that the infinite heap idea > works (notion borrowed from a talk by Cliff Click found > on the net, should indicate that we just do this, allocate > and don't care much). > > I made some additional benchmarks, in the present case the lazy > does not so much good in that it saves allocates. I got the > following figures for the application at hand: > > Bulk: 84'393'262 allocates > Lazy: 81'662'843 allocates > > So the application does not have many exit points in the flow > following the array creation, except for the last exit point > when anyway all objects were needed. > > But still I have some doubts! The array I allocate are only > 4 elements long or so? Why is there still such a big > difference between allocating in bulk only 4 elements and > doing them one after the other? > > Overhead by the lazy detection itself? I doubt this also > since the array is anyway accessed, and the lazy check > is a small null pointer check. Yes, but the cost is not in the check but in the branching on processor level (see what Patricia wrote). Kind regards robert