Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!nuzba.szn.dk!pnx.dk!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Giovanni Azua Newsgroups: comp.lang.java.programmer Subject: Re: particle container in Java Date: Sun, 06 Nov 2011 15:15:04 +0100 Lines: 56 Message-ID: References: <17fed3e9-15e0-466c-bb24-10e74633ea1b@t8g2000yql.googlegroups.com> <4mfab7hjme9lrklbo9k3415fljail4cgl4@4ax.com> <8dqcb7dpnonje061cp69hr944tinltg0eq@4ax.com> Mime-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Trace: individual.net Ecgb+g7u7mi6ftXyKMXZpgPGFXaI6S/f46feQnGQYB6KucGdFs Cancel-Lock: sha1:9jEkfrd0H4N2WCSPIw5rZYuRhM8= User-Agent: Microsoft-Entourage/12.31.0.110725 Thread-Topic: particle container in Java Thread-Index: AcycjnpqhY+S1+OQ1EiFkz7xE/B+0A== Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:9652 Hi Roedy, On 11/6/11 12:07 PM, in article 8dqcb7dpnonje061cp69hr944tinltg0eq@4ax.com, "Roedy Green" wrote: > On Sat, 05 Nov 2011 09:16:03 -0700, BGB wrote, > quoted or indirectly quoted someone who said : > >> I try where possible to keep things in the 40-60fps range, > > Another totally different approach is to fob the work off on the GPU. > I have not coded at low level for a very long time, so I don't know > what modern video cards can do, and how standard the APIs are. > > But, it seem to me what you are doing is common to many games, and > surely modern video cards have some way of helping out. They have > some skookum processing power. I would think some parallel processing > hardware would really make this sort of thing fly. > Absolutely :) when I saw this OP, the first thing that crossed my mind was "why is he using Java for this ... ?". These are the reasons I would stay away from Java and rather use C++ to do such a job: 1) Manual optimization for the memory hierarchies e.g. Scalar replacement; optimizing loops with the level of unrolling that maximizes the cache hit for the underlying system cache specification. This needs "manual analysis and manual doing" since e.g. memory aliasing and function calls prevent C++ compilers from doing such optimizations automatically. Same concepts apply to CUDA 3) below. 2) Same reason as 1) to do manual vectorization i.e. SSE, SSE2, SSE3, SSE4 etc e.g. float a[4] = {1.0, 2.0, 3.0, 4.0}; __m128 t = _mm_load_ps(a); b = _mm_mul_ps(a, a); 3) CUDA nVidia C++ extensions that allows embedding GPU code and invoking it from CPU. Here an example that matches the OP: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch31.html Needs a lot of brainwork to make get it right i.e. To be able to outperform CPU in numerical methods e.g. MMM, MVM. Trying to implement a Fast Multipole with CUDA would be a daunting task ... but is not impossible: Best regards, Giovanni