Path: csiph.com!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: minforth Newsgroups: comp.lang.forth Subject: Re: Parsing timestamps? Date: Thu, 10 Jul 2025 07:37:02 +0200 Lines: 36 Message-ID: References: <1f433fabcb4d053d16cbc098dedc6c370608ac01@i2pn2.org> <2025Jul2.172222@mips.complang.tuwien.ac.at> <300ba9a1581bea9a01ab85d5d361e6eaeedbf23a@i2pn2.org> <4d440297d7e17251ebc50774bacfec73e184f9bc@i2pn2.org> <2025Jul5.104922@mips.complang.tuwien.ac.at> <6fd9f665e73ad93270fff88eca894ba69424cac7@i2pn2.org> <87a55dxbft.fsf@nightsong.com> <87y0swwtqt.fsf@nightsong.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: individual.net mUJ1n4pSKuOzb2W0PHpLXg8QpP6n5TKa8Y7YlSlFRXdgMARRJp Cancel-Lock: sha1:gpeFNlK8UvirE07NHhMLItExCms= sha256:w7nFWSmZY1ppoFMSHLmRBJ472xguOp3+dN4v/qORcso= User-Agent: Mozilla Thunderbird In-Reply-To: <87y0swwtqt.fsf@nightsong.com> Xref: csiph.com comp.lang.forth:134002 Am 10.07.2025 um 06:32 schrieb Paul Rubin: > minforth writes: >> You don't need 64-bit doubles for signal or image processing. >> Most vector/matrix operations on streaming data don't require >> them either. Whether SSE2 is adequate or not to handle such data >> depends on the application. > > Sure, and for that matter, AI inference uses 8 bit and even 4 bit > floating point. Or fuzzy control for instance. > Kahan on the other hand was interested in engineering > and scientific applications like PDE solvers (airfoils, fluid dynamics, > FEM, etc.). That's an area where roundoff error builds up after many > iterations, thus extended precision. That's why I use Kahan summation for dot products. It is slow but rounding error accumulation remains small. A while ago I read an article about this issue in which the author(s) performed extensive tests of different dot product calculation algorithms on many serial data sets from finance, geology, oil industry, meteorology etc. Their target criterion was to find an acceptable balance between computational speed and minimal error. The 'winner' was a chained fused-multiply-add algorithm (many CPUs/GPUs can perform FMA in hardware) which makes for shorter code (good for caching). And it supports speed improvement by parallelization (recursive halving of the sets until manageable vector size followed by parallel computation). I don't do parallelization, but I was still surprised by the good results using FMA. In other words, increasing floating-point number size is not always the way to go. Anyhow, first step is to select the best fp rounding method ....