Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: Eric Sosman <esosman@ieee-dot-org.invalid>
Newsgroups: comp.lang.java.programmer
Subject: Re: Bulk Array Element Allocation, is it faster?
Date: Sun, 25 Sep 2011 10:59:25 -0400
Organization: A noiseless patient Spider
Lines: 61
Message-ID: <j5nflh$onl$1@dont-email.me>
References: <j5lvf0$bhl$1@news.albasani.net> <Ypydnd8C3rt7G-PTnZ2dnUVZ_iydnZ2d@earthlink.com> <9e8fplF19bU1@mid.individual.net> <j5n3qv$dbo$1@news.albasani.net> <9e8kdhF6lmU1@mid.individual.net> <j5ncco$hl$1@news.albasani.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 25 Sep 2011 15:00:01 +0000 (UTC)
Injection-Info: mx04.eternal-september.org; posting-host="f8igmItKsWs6nM5YanFxAA"; logging-data="25333"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+udjx+I5KOtZuZCQ8W1V+d"
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20110902 Thunderbird/6.0.2
In-Reply-To: <j5ncco$hl$1@news.albasani.net>
Cancel-Lock: sha1:Trz0KCh9OmDws1AfEHNoculwtiE=
Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:8295

On 9/25/2011 10:04 AM, Jan Burse wrote:
> Robert Klemme schrieb:
>> Yes, but the cost is not in the check but in the branching on processor
>> level (see what Patricia wrote).
>
> Depends on the processor and on the branch. If
> new is just heap -= size, what some papers suggest,
> then it might not be important.
>
> But if new is much more, then sure the branch
> interrupts the normal code flow so much that
> instruction piplining gets out of sync. And
> the speed gain by instruction overlapping
> is lost.
>
> But my hypothesis is more that something
> algorithmically on a higher level happens than
> something on the lower hardware level.
>
> So I also found something about "Lock
> coarsening"(*), so if the new needs some lock
> this lock could be aquired before the initialization
> loop and released after the initialization
> loop. [...]

     I'm speculating almost as wildly as you are, but I strongly
doubt that a lock is acquired.  Object creation happens so often
that I'm sure the JVM implementors will use something like compare-
and-swap on any platform that provides it (which I think means "all"
nowadays).

     Even the check for "Should I wait for the garbage collector to
finish?" uses no lock, on at least some platforms.  Their JVM's
dedicate an entire memory page to serve as a flag, whose state is
not recorded in its content but in its MMU protection bits.  To see
if it's safe to allocate, an allocating thread just stores something
in the special page; if the store works allocation is safe.  If GC
has raised its STOP sign, the page is write-protected and the store
generates a hardware trap -- very high overhead, to be sure, but
extremely low (not even a conditional branch!) in the common case.

> Would need n locking instruction pairs. But I
> would still need some confirmation that JITs
> are able to do such an optimization on a higher
> level in the present case.

     Again, I point out that the bulk and lazy variants do not do
the same thing.  Consider, for example

	class Bla {
	    private static int master_seqno = 0;
	    public final int seqno = ++master_seqno;
	}

Observe that the value of bla[42].seqno differs between the two
variants; it would therefore be an error to "optimize" either by
transforming it into the other.

-- 
Eric Sosman
esosman@ieee-dot-org.invalid