Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail From: Eric Sosman Newsgroups: comp.lang.java.programmer Subject: Re: Bulk Array Element Allocation, is it faster? Date: Sun, 25 Sep 2011 10:59:25 -0400 Organization: A noiseless patient Spider Lines: 61 Message-ID: References: <9e8fplF19bU1@mid.individual.net> <9e8kdhF6lmU1@mid.individual.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Sun, 25 Sep 2011 15:00:01 +0000 (UTC) Injection-Info: mx04.eternal-september.org; posting-host="f8igmItKsWs6nM5YanFxAA"; logging-data="25333"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+udjx+I5KOtZuZCQ8W1V+d" User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20110902 Thunderbird/6.0.2 In-Reply-To: Cancel-Lock: sha1:Trz0KCh9OmDws1AfEHNoculwtiE= Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:8295 On 9/25/2011 10:04 AM, Jan Burse wrote: > Robert Klemme schrieb: >> Yes, but the cost is not in the check but in the branching on processor >> level (see what Patricia wrote). > > Depends on the processor and on the branch. If > new is just heap -= size, what some papers suggest, > then it might not be important. > > But if new is much more, then sure the branch > interrupts the normal code flow so much that > instruction piplining gets out of sync. And > the speed gain by instruction overlapping > is lost. > > But my hypothesis is more that something > algorithmically on a higher level happens than > something on the lower hardware level. > > So I also found something about "Lock > coarsening"(*), so if the new needs some lock > this lock could be aquired before the initialization > loop and released after the initialization > loop. [...] I'm speculating almost as wildly as you are, but I strongly doubt that a lock is acquired. Object creation happens so often that I'm sure the JVM implementors will use something like compare- and-swap on any platform that provides it (which I think means "all" nowadays). Even the check for "Should I wait for the garbage collector to finish?" uses no lock, on at least some platforms. Their JVM's dedicate an entire memory page to serve as a flag, whose state is not recorded in its content but in its MMU protection bits. To see if it's safe to allocate, an allocating thread just stores something in the special page; if the store works allocation is safe. If GC has raised its STOP sign, the page is write-protected and the store generates a hardware trap -- very high overhead, to be sure, but extremely low (not even a conditional branch!) in the common case. > Would need n locking instruction pairs. But I > would still need some confirmation that JITs > are able to do such an optimization on a higher > level in the present case. Again, I point out that the bulk and lazy variants do not do the same thing. Consider, for example class Bla { private static int master_seqno = 0; public final int seqno = ++master_seqno; } Observe that the value of bla[42].seqno differs between the two variants; it would therefore be an error to "optimize" either by transforming it into the other. -- Eric Sosman esosman@ieee-dot-org.invalid