Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!.POSTED!not-for-mail From: BGB Newsgroups: comp.lang.java.programmer Subject: Re: higher precision doubles Date: Wed, 10 Aug 2011 08:53:51 -0700 Organization: albasani.net Lines: 107 Message-ID: References: <4e41ef89$0$306$14726298@news.sunsite.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Trace: news.albasani.net FXEhWZ0I5TsS6xpl43i2RX7XqLVttUL2oOy6bU+kJZAMhJ5rF/tN2BMuTGMe9eK7jgLgiJe5sJ65vjRbFjsHr7naBVwrerVPMvuxXwrQOewxYslMqs7PF1XmZ4U07GHw NNTP-Posting-Date: Wed, 10 Aug 2011 15:58:39 +0000 (UTC) Injection-Info: news.albasani.net; logging-data="V5OZhEo8IzBi+H0NYTt1qxMwLD/PPmVVNjS3Xgx9fT2p0nvG25Ey5NL87bTwBsVKiHqdM9TZDJnaMG/uYfSxz9fj3cIbANaUknJCowSNSPLzMhNSp9auK3wGu0Lrdqa4"; mail-complaints-to="abuse@albasani.net" User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0 In-Reply-To: Cancel-Lock: sha1:WdMfFR1HHzDJPpnpoP+bpISTthA= Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:6973 On 8/9/2011 8:06 PM, Joshua Cranmer wrote: > On 8/9/2011 9:40 PM, Arne Vajhøj wrote: >> On 8/9/2011 11:11 AM, Joshua Cranmer wrote: >>> On 8/8/2011 11:42 PM, >>> supercalifragilisticexpialadiamaticonormalizeringelimatisticantations >>> wrote: >>>> How does that interact with JIT, though? On x86, the simplest way for >>>> JIT to make non-strictfp code use the FPU would be to just load the >>>> initial values into the (80-bit-wide!) registers and perform FADDs, >>>> FMULs, etc. on them. As long as the computation stayed in registers the >>>> higher precision then ought to remain in effect -- for JITted code. >>>> Adding extra code to mask off 16 of the register bits (or the mantissa >>>> subset of the extra bits) after every FP op would slow things down. Is >>>> the JLS interpreted to require the JIT do this (for non-strictfp code)? >>>> And, if not, what does the HotSpot JIT do in actuality? >>> >>> All modern x86 processors sport the SSE-style instructions, which can do >>> 32-bit and 64-bit instructions (also in a SIMD format) without touching >>> the FPU, and I suspect that these are slightly faster than using the x87 >>> FPU instructions. I wouldn't be surprised if the JIT emitted SSE in the >>> vast majority of cases, so that JIT'd non-strictfp code would end up >>> returning the same results as JIT'd strictfp code. >> >> Is that possible in 32 bit mode? >> SSE2, yes, is available in 32-bit mode. it is not used by default by C compilers though. x86-64 made it mandatory, and 64-bit ABIs used it by default for doing floating point. >> Practically all CPU's today are 64 bit capable, but many still run >> 32 bit desktop OS'es. > > I don't recall x86's mode-switching semantics off the top of my head, > but I do believe that it is possible to run 64-bit instructions in > 32-bit mode. The problem is the C ABI, particularly register saves and > restores, don't handle 64-bit stuff properly when compiled with 32-bit > targets for compilers. > not exactly... 64-bit operations can only be used (at all) in "long mode", whereas so can 32-bit operations. otherwise, one is in "legacy mode" which only has 32-bit instructions. it is, however, possible to create a 32-bit OS in long mode (mostly the same as before), which could in-turn run 64-bit code in processes. however, running in long-mode, one can no longer make use of VMM86 (Virtual 86) mode, segmented addressing, or several other rarely-used features (TSS-based processes anyone?...), and several instructions are officially dropped (IIRC, they were dropped from the Opteron, but Intel partly re-added them in their implementation, and AMD followed Intel's lead AFAIK). I am not currently aware of any OS's which have done the above. AFAIK, the OS would be mostly the same as in 32-bit legacy mode, apart from needing to use new page-tables and a few other things. > SSE2 dates back to something like the Pentium II, so it's not a 64-bit > mode thing. Although I think SSE2 itself may only be limited to single > precision floats. > no. SSE was added in the Pentium3, and SSE2 and later IIRC in the Pentium4 and AMD Athlon lines. Pentium 2 only had MMX, which was considerably worse (64-bit byte and short vectors aliased to FPU registers type stuff), which was rarely used as most people would rather have a working FPU than lame byte vectors. SSE added new registers and vector floating point operations. SSE2 added scalar floats and doubles, double-vectors, and most MMX instructions were retrofitted onto SSE (allowing byte and word-vector operations and similar). SSE3/... mostly added more modest extensions. XOP (AMD) and AVX (Intel) basically add a much more drastic set of new features (as well as allowing basically 3-5 register instruction forms and 256-bit YMM registers, which can hold 4 doubles or 8 floats). > Poking around the hotspot source code for Java 7 does indicate that sse2 > support is in the JIT, although I don't know the entire set of > circumstances that triggers it. > probably CPU support. there are few good reasons not to use it anymore. now as for performance (x87 vs SSE), there are tradeoffs either way: naive scalar SSE can be slower than well-generated x87, but is generally faster than naive x87. vector SSE is, at this point, generally somewhat faster than trying to use x87 (in most cases, a few operations are faster on x87 absent using SSE3 or SSE4 instructions, such as vector dot-product, ...). vector SSE was slightly slower than x87 in the Pentium3. x87 still has a few features which SSE doesn't, such as the trigonometric functions, ... so x87 can still be used for these (manual calculation is slower).