Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!.POSTED!not-for-mail
From: BGB <cr88192@hotmail.com>
Newsgroups: comp.lang.java.programmer
Subject: Re: higher precision doubles
Date: Wed, 10 Aug 2011 08:53:51 -0700
Organization: albasani.net
Lines: 107
Message-ID: <j1u9rf$9cu$1@news.albasani.net>
References: <j1hqc1$1ch$1@news.albasani.net> <cOqdnfXfM99lhaDTnZ2dnUVZ_vednZ2d@earthlink.com> <j1qdrt$30a$1@speranza.aioe.org> <j1rio5$ucd$1@dont-email.me> <4e41ef89$0$306$14726298@news.sunsite.dk> <j1ssit$nc0$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: news.albasani.net FXEhWZ0I5TsS6xpl43i2RX7XqLVttUL2oOy6bU+kJZAMhJ5rF/tN2BMuTGMe9eK7jgLgiJe5sJ65vjRbFjsHr7naBVwrerVPMvuxXwrQOewxYslMqs7PF1XmZ4U07GHw
NNTP-Posting-Date: Wed, 10 Aug 2011 15:58:39 +0000 (UTC)
Injection-Info: news.albasani.net; logging-data="V5OZhEo8IzBi+H0NYTt1qxMwLD/PPmVVNjS3Xgx9fT2p0nvG25Ey5NL87bTwBsVKiHqdM9TZDJnaMG/uYfSxz9fj3cIbANaUknJCowSNSPLzMhNSp9auK3wGu0Lrdqa4"; mail-complaints-to="abuse@albasani.net"
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0
In-Reply-To: <j1ssit$nc0$1@dont-email.me>
Cancel-Lock: sha1:WdMfFR1HHzDJPpnpoP+bpISTthA=
Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:6973

On 8/9/2011 8:06 PM, Joshua Cranmer wrote:
> On 8/9/2011 9:40 PM, Arne Vajhøj wrote:
>> On 8/9/2011 11:11 AM, Joshua Cranmer wrote:
>>> On 8/8/2011 11:42 PM,
>>> supercalifragilisticexpialadiamaticonormalizeringelimatisticantations
>>> wrote:
>>>> How does that interact with JIT, though? On x86, the simplest way for
>>>> JIT to make non-strictfp code use the FPU would be to just load the
>>>> initial values into the (80-bit-wide!) registers and perform FADDs,
>>>> FMULs, etc. on them. As long as the computation stayed in registers the
>>>> higher precision then ought to remain in effect -- for JITted code.
>>>> Adding extra code to mask off 16 of the register bits (or the mantissa
>>>> subset of the extra bits) after every FP op would slow things down. Is
>>>> the JLS interpreted to require the JIT do this (for non-strictfp code)?
>>>> And, if not, what does the HotSpot JIT do in actuality?
>>>
>>> All modern x86 processors sport the SSE-style instructions, which can do
>>> 32-bit and 64-bit instructions (also in a SIMD format) without touching
>>> the FPU, and I suspect that these are slightly faster than using the x87
>>> FPU instructions. I wouldn't be surprised if the JIT emitted SSE in the
>>> vast majority of cases, so that JIT'd non-strictfp code would end up
>>> returning the same results as JIT'd strictfp code.
>>
>> Is that possible in 32 bit mode?
>>

SSE2, yes, is available in 32-bit mode.
it is not used by default by C compilers though.
x86-64 made it mandatory, and 64-bit ABIs used it by default for doing 
floating point.

>> Practically all CPU's today are 64 bit capable, but many still run
>> 32 bit desktop OS'es.
>
> I don't recall x86's mode-switching semantics off the top of my head,
> but I do believe that it is possible to run 64-bit instructions in
> 32-bit mode. The problem is the C ABI, particularly register saves and
> restores, don't handle 64-bit stuff properly when compiled with 32-bit
> targets for compilers.
>

not exactly...

64-bit operations can only be used (at all) in "long mode", whereas so 
can 32-bit operations. otherwise, one is in "legacy mode" which only has 
32-bit instructions.

it is, however, possible to create a 32-bit OS in long mode (mostly the 
same as before), which could in-turn run 64-bit code in processes. 
however, running in long-mode, one can no longer make use of VMM86 
(Virtual 86) mode, segmented addressing, or several other rarely-used 
features (TSS-based processes anyone?...), and several instructions are 
officially dropped (IIRC, they were dropped from the Opteron, but Intel 
partly re-added them in their implementation, and AMD followed Intel's 
lead AFAIK).

I am not currently aware of any OS's which have done the above.

AFAIK, the OS would be mostly the same as in 32-bit legacy mode, apart 
from needing to use new page-tables and a few other things.


> SSE2 dates back to something like the Pentium II, so it's not a 64-bit
> mode thing. Although I think SSE2 itself may only be limited to single
> precision floats.
>

no. SSE was added in the Pentium3, and SSE2 and later IIRC in the 
Pentium4 and AMD Athlon lines.

Pentium 2 only had MMX, which was considerably worse (64-bit byte and 
short vectors aliased to FPU registers type stuff), which was rarely 
used as most people would rather have a working FPU than lame byte vectors.

SSE added new registers and vector floating point operations.
SSE2 added scalar floats and doubles, double-vectors, and most MMX 
instructions were retrofitted onto SSE (allowing byte and word-vector 
operations and similar).

SSE3/... mostly added more modest extensions.

XOP (AMD) and AVX (Intel) basically add a much more drastic set of new 
features (as well as allowing basically 3-5 register instruction forms 
and 256-bit YMM registers, which can hold 4 doubles or 8 floats).


> Poking around the hotspot source code for Java 7 does indicate that sse2
> support is in the JIT, although I don't know the entire set of
> circumstances that triggers it.
>

probably CPU support.
there are few good reasons not to use it anymore.

now as for performance (x87 vs SSE), there are tradeoffs either way:
naive scalar SSE can be slower than well-generated x87, but is generally 
faster than naive x87.

vector SSE is, at this point, generally somewhat faster than trying to 
use x87 (in most cases, a few operations are faster on x87 absent using 
SSE3 or SSE4 instructions, such as vector dot-product, ...). vector SSE 
was slightly slower than x87 in the Pentium3.


x87 still has a few features which SSE doesn't, such as the 
trigonometric functions, ... so x87 can still be used for these (manual 
calculation is slower).