Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!nuzba.szn.dk!pnx.dk!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: Giovanni Azua <bravegag@hotmail.com>
Newsgroups: comp.lang.java.programmer
Subject: Re: particle container in Java
Date: Sun, 06 Nov 2011 15:15:04 +0100
Lines: 56
Message-ID: <CADC54F8.8D01%bravegag@hotmail.com>
References: <17fed3e9-15e0-466c-bb24-10e74633ea1b@t8g2000yql.googlegroups.com> <CAD8D679.891A%bravegag@hotmail.com> <j8vtuh$qhm$1@news.albasani.net> <4mfab7hjme9lrklbo9k3415fljail4cgl4@4ax.com> <j93ngj$a4n$1@news.albasani.net> <8dqcb7dpnonje061cp69hr944tinltg0eq@4ax.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Trace: individual.net Ecgb+g7u7mi6ftXyKMXZpgPGFXaI6S/f46feQnGQYB6KucGdFs
Cancel-Lock: sha1:9jEkfrd0H4N2WCSPIw5rZYuRhM8=
User-Agent: Microsoft-Entourage/12.31.0.110725
Thread-Topic: particle container in Java
Thread-Index: AcycjnpqhY+S1+OQ1EiFkz7xE/B+0A==
Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:9652

Hi Roedy,

On 11/6/11 12:07 PM, in article 8dqcb7dpnonje061cp69hr944tinltg0eq@4ax.com,
"Roedy Green" <see_website@mindprod.com.invalid> wrote:
> On Sat, 05 Nov 2011 09:16:03 -0700, BGB <cr88192@hotmail.com> wrote,
> quoted or indirectly quoted someone who said :
> 
>> I try where possible to keep things in the 40-60fps range,
> 
> Another totally different approach is to fob the work off on the GPU.
> I have not coded at low level for a very long time, so I don't know
> what modern video cards can do, and how standard the APIs are.
> 
> But, it seem to me what you are doing is common to many games, and
> surely modern video cards have some way of helping out.  They have
> some skookum processing power.  I would think some parallel processing
> hardware would really make this sort of thing fly.
>
Absolutely :) when I saw this OP, the first thing that crossed my mind was
"why is he using Java for this ... ?". These are the reasons I would stay
away from Java and rather use C++ to do such a job:

1) Manual optimization for the memory hierarchies e.g. Scalar replacement;
   optimizing loops with the level of unrolling that maximizes the cache hit
   for the underlying system cache specification. This needs "manual
   analysis and manual doing" since e.g. memory aliasing and function calls
   prevent C++ compilers from doing such optimizations automatically. Same
   concepts apply to CUDA 3) below.

2) Same reason as 1) to do manual vectorization i.e. SSE, SSE2, SSE3, SSE4
   etc e.g.
    float a[4] = {1.0, 2.0, 3.0, 4.0};
    __m128 t = _mm_load_ps(a);
    b = _mm_mul_ps(a, a);

3) CUDA nVidia C++ extensions that allows embedding GPU code and invoking it
   from CPU. Here an example that matches the OP:
   http://http.developer.nvidia.com/GPUGems3/gpugems3_ch31.html

   Needs a lot of brainwork to make get it right i.e. To be able
   to outperform CPU in numerical methods e.g. MMM, MVM.
   Trying to implement a Fast Multipole with CUDA would be a daunting task
   ... but is not impossible:

<http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CCEQFjA
A&url=http%3A%2F%2Farxiv.org%2Fpdf%2F1010.1482&ei=wZW2TpHAI4mJ4gTX5f37Aw&usg
=AFQjCNGln1eVVemPaPZRP3RtOxKcYeNWjw&sig2=WONUfOoU5zSEYjzUfWLR3w>

<http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CCoQFjA
B&url=http%3A%2F%2Farxiv.org%2Fabs%2F1010.1482&ei=wZW2TpHAI4mJ4gTX5f37Aw&usg
=AFQjCNH8rO4vBk9E-p8W2Le11kDPiQYjkg&sig2=x0t31s43D22f6yN7cq8l6Q>
 
Best regards,
Giovanni