Groups > comp.lang.java.programmer > #13554 > unrolled thread

Pattern suggestion

Started by	FrenKy <frenky__nn@gmail.com>
First post	2012-04-15 16:11 +0200
Last post	2012-04-15 21:58 -0400
Articles	12 — 10 participants

Back to article view | Back to comp.lang.java.programmer

  Pattern suggestion FrenKy <frenky__nn@gmail.com> - 2012-04-15 16:11 +0200
    Re: Pattern suggestion Rui Maciel <rui.maciel@gmail.com> - 2012-04-15 16:04 +0100
      Re: Pattern suggestion Lew <noone@lewscanon.com> - 2012-04-15 08:15 -0700
    Re: Pattern suggestion markspace <-@.> - 2012-04-15 08:17 -0700
      Re: Pattern suggestion Arne Vajhøj <arne@vajhoej.dk> - 2012-04-15 22:01 -0400
    Re: Pattern suggestion Jan Burse <janburse@fastmail.fm> - 2012-04-15 17:41 +0200
      Re: Pattern suggestion Jan Burse <janburse@fastmail.fm> - 2012-04-16 00:37 +0200
    Re: Pattern suggestion Patricia Shanahan <pats@acm.org> - 2012-04-15 09:17 -0700
      Re: Pattern suggestion Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-04-15 13:57 -0300
        Re: Pattern suggestion Martin Gregorie <martin@address-in-sig.invalid> - 2012-04-15 19:56 +0000
          Re: Pattern suggestion Robert Klemme <shortcutter@googlemail.com> - 2012-04-16 09:55 +0200
    Re: Pattern suggestion Arne Vajhøj <arne@vajhoej.dk> - 2012-04-15 21:58 -0400

#13554 — Pattern suggestion

From	FrenKy <frenky__nn@gmail.com>
Date	2012-04-15 16:11 +0200
Subject	Pattern suggestion
Message-ID	<jmel0t$jrh$1@news2.carnet.hr>

Hi *,
I have a huge file (~10GB) which I'm reading line by line. Each line has 
to be analyzed by many number of different analyzers. The problem I have 
is that to make it at least a bit performance optimized due to sometimes 
time consuming processing (usually because of delays due to external 
interfaces) i would need to make it heavily multithreaded.
File should be read only once to reduce IO on disks.

So I need "1 driver to many workers" pattern where workers are 
multithreaded.

I have a solution now based on Observable/Observer that I use (and it 
works) but I'm not sure if it is the best way.

Any suggestion would be appreciated.
Thanks in advance!

[toc] | [next] | [standalone]

#13555

From	Rui Maciel <rui.maciel@gmail.com>
Date	2012-04-15 16:04 +0100
Message-ID	<jmeo1d$mg4$1@speranza.aioe.org>
In reply to	#13554

FrenKy wrote:

> Hi *,
> I have a huge file (~10GB) which I'm reading line by line. Each line has
> to be analyzed by many number of different analyzers. The problem I have
> is that to make it at least a bit performance optimized due to sometimes
> time consuming processing (usually because of delays due to external
> interfaces) i would need to make it heavily multithreaded.
> File should be read only once to reduce IO on disks.
> 
> So I need "1 driver to many workers" pattern where workers are
> multithreaded.
> 
> I have a solution now based on Observable/Observer that I use (and it
> works) but I'm not sure if it is the best way.
> 
> Any suggestion would be appreciated.
> Thanks in advance!

What about this:

Consider three types of component: the file reader, data analyzer and a data 
queue.

The data queue component acts as a data buffer.  This component is 
responsible for managing the data queue, and supplying the data analyzer 
components with the data from the file.  It buffers the data contained in a 
file by storing it in multiple data snippets in such a way that can be 
accessed concurrently by each data analyzer components.  Once a data snippet 
is analyzed by all data analyzers, the data queue frees the resource.

The file reader component is responsible for supplying the data queue 
component with data read from a file.  It accomplishes this by pushing data 
snippets on top of the data queue on request.

The data analyzer component requests data snippets from the data queue 
component.  Once the data is analyzed, the data analyzer component releases 
the resource and requests more data from the file by signaling the data 
queue for the next data snippet.

This scheme would, at least in theory, provide you with a way to have 
multiple data analyzers analyze independent parts of a file without forcing 
a set of data analyziers to wait for a component to finish processing a data 
snippet.  You can also avoid the observer pattern with this sort of 
solution, and therefore end up with a component which is simpler to 
understand and a bit more robust to change.

Hope this helps,
Rui Maciel

[toc] | [prev] | [next] | [standalone]

#13556

From	Lew <noone@lewscanon.com>
Date	2012-04-15 08:15 -0700
Message-ID	<jmeolq$bo7$1@news.albasani.net>
In reply to	#13555

On 04/15/2012 08:04 AM, Rui Maciel wrote:
> FrenKy wrote:
>
>> Hi *,
>> I have a huge file (~10GB) which I'm reading line by line. Each line has
>> to be analyzed by many number of different analyzers. The problem I have
>> is that to make it at least a bit performance optimized due to sometimes
>> time consuming processing (usually because of delays due to external
>> interfaces) i would need to make it heavily multithreaded.
>> File should be read only once to reduce IO on disks.
>>
>> So I need "1 driver to many workers" pattern where workers are
>> multithreaded.
>>
>> I have a solution now based on Observable/Observer that I use (and it
>> works) but I'm not sure if it is the best way.

Well, we aren't either.

"Best" is a very loose term, and it encompasses multiple often 
mutually-exclusive metrics. What is your metric for "best"?

How do you measure it?

Even after you answer that, we don't know. We haven't seen your code. You 
might have a simply brilliant implementation of what you call 
"Observable/Observer" that simply blows any other approach out of the water. 
Conversely, your "Observable/Observer" label may be wildly inapplicable, and 
the code utter crap, and even were that the correct pattern the execution 
thereof might have barely crawled out of the crapper.

We simply don't know. You maybe have the best, worst or somewhere between of 
all possible implementations. I hope it's nearer the first than the second.

Would you go to a nutritionist and say, "I'm on a high-carb diet, is that the 
best way?" It might be, if the carbs are starchy like pasta and you're at the 
shore running marathons. It might not be, if you're eating only Twinkies and 
beer and watching your fourth straight /Jersey Shore/ marathon.

Tell you what. How about you share what you've done and then, based on 
information, we provide feedback?

http://sscce.org/

-- 
Lew

[toc] | [prev] | [next] | [standalone]

#13557

From	markspace <-@.>
Date	2012-04-15 08:17 -0700
Message-ID	<jmeor9$n3o$1@dont-email.me>
In reply to	#13554

On 4/15/2012 7:11 AM, FrenKy wrote:
> So I need "1 driver to many workers" pattern where workers are
> multithreaded.
>
> I have a solution now based on Observable/Observer that I use (and it
> works) but I'm not sure if it is the best way.

Map-Reduce is the current standard pattern for big searches like these, 
I believe.  Check Google and Wikipedia to get started. I also recommend 
O'Reilly's "Hadoop" book.

There's is also Fork-Join, an older pattern that's a little more labor 
intensive in terms of code, imo.

[toc] | [prev] | [next] | [standalone]

#13567

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2012-04-15 22:01 -0400
Message-ID	<4f8b7d7e$0$293$14726298@news.sunsite.dk>
In reply to	#13557

On 4/15/2012 11:17 AM, markspace wrote:
> On 4/15/2012 7:11 AM, FrenKy wrote:
>> So I need "1 driver to many workers" pattern where workers are
>> multithreaded.
>>
>> I have a solution now based on Observable/Observer that I use (and it
>> works) but I'm not sure if it is the best way.
>
> Map-Reduce is the current standard pattern for big searches like these,
> I believe. Check Google and Wikipedia to get started. I also recommend
> O'Reilly's "Hadoop" book.
>
> There's is also Fork-Join, an older pattern that's a little more labor
> intensive in terms of code, imo.

Even though both are about parallellizing a workload, then I
think that those are mostly used for different purposes.

Hadoop is great for processing TB/PB residing on disk
on multiple computers.

Fork Join is great for processing MB/GB residing in
memory on multiple cores on the same computer.

Arne

[toc] | [prev] | [next] | [standalone]

#13558

From	Jan Burse <janburse@fastmail.fm>
Date	2012-04-15 17:41 +0200
Message-ID	<jmeq76$f72$1@news.albasani.net>
In reply to	#13554

FrenKy schrieb:
> Hi *,
> I have a huge file (~10GB) which I'm reading line by line. Each line has
> to be analyzed by many number of different analyzers. The problem I have
> is that to make it at least a bit performance optimized due to sometimes
> time consuming processing (usually because of delays due to external
> interfaces) i would need to make it heavily multithreaded.
> File should be read only once to reduce IO on disks.
>
> So I need "1 driver to many workers" pattern where workers are
> multithreaded.
>
> I have a solution now based on Observable/Observer that I use (and it
> works) but I'm not sure if it is the best way.
>
> Any suggestion would be appreciated.
> Thanks in advance!

Some penny of thoughts:

- Check whether the bottleneck is writing the
   result of the analysers and not reading the file,
   you might add extra workers and queses for writing
   the result.

- Check whether the analysers are all equally fast,
   otherwise the least performant analyser will
   delay the processing, even with some queues,
   since they will be limited in size.

Bye

[toc] | [prev] | [next] | [standalone]

#13565

From	Jan Burse <janburse@fastmail.fm>
Date	2012-04-16 00:37 +0200
Message-ID	<jmfijo$3t8$1@news.albasani.net>
In reply to	#13558

Jan Burse schrieb:
>
> - Check whether the analysers are all equally fast,
>    otherwise the least performant analyser will
>    delay the processing, even with some queues,
>    since they will be limited in size.

Assume you have 2 analyzers, one slow and one
fast. And 4 cores.

Load junks of 1GB into memory, split the 1GB into
two 500MB. And let:

    1st 500MB slow analyzer with core 1 affinity
    2nd 500MB slow analyzer with core 2 affinity
    1GB fast analyzer with core 3 affinity

Or let the 2nd slow analyer start at 5GB from the
beginning.

Bye

BTW: Last time I looked at shootout:
http://shootout.alioth.debian.org/

many of the problems now have parallel
implementations. The test maschine is 4 core.
If you do a little calculation for the Java
results, you will see that they have an average
core  utilization of a factor of 2.6, similar
to the Erlang results.

Maybe you can pick up some ideas of how to
code parallel solutions from there. Otherwise
there is a nice book by Doug Lea:

Concurrent Programming in Java™: Design Principles and Pattern
http://www.amazon.com/Concurrent-Programming-Java-Principles-Pattern/dp/0201310090

[toc] | [prev] | [next] | [standalone]

#13560

From	Patricia Shanahan <pats@acm.org>
Date	2012-04-15 09:17 -0700
Message-ID	<WradnUhB_qEbaRfSnZ2dnUVZ_i2dnZ2d@earthlink.com>
In reply to	#13554

On 4/15/2012 7:11 AM, FrenKy wrote:
> Hi *,
> I have a huge file (~10GB) which I'm reading line by line. Each line has
> to be analyzed by many number of different analyzers. The problem I have
> is that to make it at least a bit performance optimized due to sometimes
> time consuming processing (usually because of delays due to external
> interfaces) i would need to make it heavily multithreaded.
> File should be read only once to reduce IO on disks.
>
> So I need "1 driver to many workers" pattern where workers are
> multithreaded.
>
> I have a solution now based on Observable/Observer that I use (and it
> works) but I'm not sure if it is the best way.

I suggest taking a look at java.util.concurrent.ThreadPoolExecutor and
related classes.

Try to minimize ordering relationships between processing on the lines,
so that you can overlap work on multiple lines as much as possible.

Patricia

[toc] | [prev] | [next] | [standalone]

#13561

From	Arved Sandstrom <asandstrom3minus1@eastlink.ca>
Date	2012-04-15 13:57 -0300
Message-ID	<b6Dir.3201$bU5.353@newsfe04.iad>
In reply to	#13560

On 12-04-15 01:17 PM, Patricia Shanahan wrote:
> On 4/15/2012 7:11 AM, FrenKy wrote:
>> Hi *,
>> I have a huge file (~10GB) which I'm reading line by line. Each line has
>> to be analyzed by many number of different analyzers. The problem I have
>> is that to make it at least a bit performance optimized due to sometimes
>> time consuming processing (usually because of delays due to external
>> interfaces) i would need to make it heavily multithreaded.
>> File should be read only once to reduce IO on disks.
>>
>> So I need "1 driver to many workers" pattern where workers are
>> multithreaded.
>>
>> I have a solution now based on Observable/Observer that I use (and it
>> works) but I'm not sure if it is the best way.
> 
> I suggest taking a look at java.util.concurrent.ThreadPoolExecutor and
> related classes.
> 
> Try to minimize ordering relationships between processing on the lines,
> so that you can overlap work on multiple lines as much as possible.
> 
> Patricia

I agree. A problem description like this, java.util.concurrent is the
first thing that pops into my head. markspace mentioned map-reduce, and
there is specifically fork-join in Java 1.7 (they are similar insofar as
they are algorithms for dividing problems); I don't know if any of
that's involved because the line analysis may be independent. IOW, this
may not be a distributable problem, this may be millions of individual
problems.

java.util.concurrent will definitely have something. It could well be
that the processing of each line is isolated, and I'd assuredly be
thinking of ThreadPoolExecutor or something similar for managing these.
It has a lot of tuning options including queues. If the analyzers for
each line have to coordinate (and maybe there's some final processing
after all complete) there are classes for that too, like CyclicBarrier.

AHS
-- 
A fly was very close to being called a "land," cause that's what they do
half the time.
-- Mitch Hedberg

[toc] | [prev] | [next] | [standalone]

#13562

From	Martin Gregorie <martin@address-in-sig.invalid>
Date	2012-04-15 19:56 +0000
Message-ID	<jmf957$8p7$1@localhost.localdomain>
In reply to	#13561

On Sun, 15 Apr 2012 13:57:42 -0300, Arved Sandstrom wrote:

> On 12-04-15 01:17 PM, Patricia Shanahan wrote:
>> On 4/15/2012 7:11 AM, FrenKy wrote:
>>> Hi *,
>>> I have a huge file (~10GB) which I'm reading line by line. Each line
>>> has to be analyzed by many number of different analyzers. The problem
>>> I have is that to make it at least a bit performance optimized due to
>>> sometimes time consuming processing (usually because of delays due to
>>> external interfaces) i would need to make it heavily multithreaded.
>>> File should be read only once to reduce IO on disks.
>>>
>>> So I need "1 driver to many workers" pattern where workers are
>>> multithreaded.
>>>
>>> I have a solution now based on Observable/Observer that I use (and it
>>> works) but I'm not sure if it is the best way.
>> 
>> I suggest taking a look at java.util.concurrent.ThreadPoolExecutor and
>> related classes.
>> 
>> Try to minimize ordering relationships between processing on the lines,
>> so that you can overlap work on multiple lines as much as possible.
>> 
>> Patricia
> 
> I agree. A problem description like this, java.util.concurrent is the
> first thing that pops into my head. markspace mentioned map-reduce, and
> there is specifically fork-join in Java 1.7 (they are similar insofar as
> they are algorithms for dividing problems); I don't know if any of
> that's involved because the line analysis may be independent. IOW, this
> may not be a distributable problem, this may be millions of individual
> problems.
> 
> java.util.concurrent will definitely have something. It could well be
> that the processing of each line is isolated, and I'd assuredly be
> thinking of ThreadPoolExecutor or something similar for managing these.
> It has a lot of tuning options including queues. If the analyzers for
> each line have to coordinate (and maybe there's some final processing
> after all complete) there are classes for that too, like CyclicBarrier.
> 
Yes. Since the OP doesn't give any indication that you can decide that 
the analysers needed for each line can be selected by some sort of fast, 
simple inspection, about all you can do is:

    foreach line l
        foreach analyser a
            start a thread for a(l)
        wait for all threads to finish

At first glance you might thinkusing a queue per analyser would help but, 
with the data volumes quoted that will soon fall apart if any analyser is 
more than trivially slower than the rest. As the OP has already said that 
some analysers can be much slower due to external interface delays (I 
presume that means waiting for DNS queries, etc.), I think he's stuck 
with the sort of logic I sketched out. After processing has gotten under 
way and any analyser-specific queues have filled up, the performance of 
any more complex logic will degrade to the above long before the input 
has been completely read and processed. 

In summary, don't try to do anything more sophisticated than the above.
 

-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

[toc] | [prev] | [next] | [standalone]

#13575

From	Robert Klemme <shortcutter@googlemail.com>
Date	2012-04-16 09:55 +0200
Message-ID	<9v21k0FmcU1@mid.individual.net>
In reply to	#13562

On 15.04.2012 21:56, Martin Gregorie wrote:
> On Sun, 15 Apr 2012 13:57:42 -0300, Arved Sandstrom wrote:
>
>> On 12-04-15 01:17 PM, Patricia Shanahan wrote:
>>> On 4/15/2012 7:11 AM, FrenKy wrote:
>>>> Hi *,
>>>> I have a huge file (~10GB) which I'm reading line by line. Each line
>>>> has to be analyzed by many number of different analyzers. The problem
>>>> I have is that to make it at least a bit performance optimized due to
>>>> sometimes time consuming processing (usually because of delays due to
>>>> external interfaces) i would need to make it heavily multithreaded.
>>>> File should be read only once to reduce IO on disks.
>>>>
>>>> So I need "1 driver to many workers" pattern where workers are
>>>> multithreaded.
>>>>
>>>> I have a solution now based on Observable/Observer that I use (and it
>>>> works) but I'm not sure if it is the best way.

Observer does seem weird for this.  Fork / join approaches seem much 
more natural.

>>> I suggest taking a look at java.util.concurrent.ThreadPoolExecutor and
>>> related classes.

In the past we had a few issues with TPE because it seemed to try to be 
too smart about resource usage (threads can die if min and max are set 
differently).  I don't remember the details but it might be worth 
checking the official bug database.

>>> Try to minimize ordering relationships between processing on the lines,
>>> so that you can overlap work on multiple lines as much as possible.

+1

>> java.util.concurrent will definitely have something. It could well be
>> that the processing of each line is isolated, and I'd assuredly be
>> thinking of ThreadPoolExecutor or something similar for managing these.
>> It has a lot of tuning options including queues. If the analyzers for
>> each line have to coordinate (and maybe there's some final processing
>> after all complete) there are classes for that too, like CyclicBarrier.
>>
> Yes. Since the OP doesn't give any indication that you can decide that
> the analysers needed for each line can be selected by some sort of fast,
> simple inspection, about all you can do is:
>
>      foreach line l
>          foreach analyser a
>              start a thread for a(l)
>          wait for all threads to finish

Starting threads and waiting for their termination only makes sense if 
all the results need to be processed together.  If that's not the case 
I'd rather use ThreadPoolExecutor (or something similar which is pretty 
easily coded) and just throw tasks into the queue.

> At first glance you might thinkusing a queue per analyser would help but,
> with the data volumes quoted that will soon fall apart if any analyser is
> more than trivially slower than the rest. As the OP has already said that
> some analysers can be much slower due to external interface delays (I
> presume that means waiting for DNS queries, etc.), I think he's stuck
> with the sort of logic I sketched out. After processing has gotten under
> way and any analyser-specific queues have filled up, the performance of
> any more complex logic will degrade to the above long before the input
> has been completely read and processed.
>
> In summary, don't try to do anything more sophisticated than the above.

I would suggest a change depending on the answer to the question: is 
there a fixed format of lines which needs to be parsed identical for 
each analysis?  If so I'd avoid multiple identical parse steps and write 
a variant like this:

     foreach line l
         dat = parse(l)
         foreach analyser a
             start a thread for a(dat)
         wait for all threads to finish

If results of analysis do not have to be aligned I'd simply do

     queue = new TPE

     foreach line l
         dat = parse(l)
         foreach analyzer a
             queue.execute(a.task(dat))

     queue.shutdown
     while (!queue.awaitTermination(...)) {/* nop */}

Kind regards

	robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

[toc] | [prev] | [next] | [standalone]

#13566

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2012-04-15 21:58 -0400
Message-ID	<4f8b7cb6$0$293$14726298@news.sunsite.dk>
In reply to	#13554

On 4/15/2012 10:11 AM, FrenKy wrote:
> I have a huge file (~10GB) which I'm reading line by line. Each line has
> to be analyzed by many number of different analyzers. The problem I have
> is that to make it at least a bit performance optimized due to sometimes
> time consuming processing (usually because of delays due to external
> interfaces) i would need to make it heavily multithreaded.
> File should be read only once to reduce IO on disks.
>
> So I need "1 driver to many workers" pattern where workers are
> multithreaded.
>
> I have a solution now based on Observable/Observer that I use (and it
> works) but I'm not sure if it is the best way.

As I see it then you need 3 things:
* A single reader thread. That is relative simple just be sure to
   read big chunks of data
* N threads doing M analysis's. There are various ways of doing this.
   Manually started threads and thread pool. I think the best choice
   between those will depend on the solution for the next bullet.
* A way of moving data data from the reader to M analyzers.

The first two solutions that come to my mind are:

A1) Use a single java.util.concurrent blocking queue, use
     a custom thread pool, use command pattern, have
     the reader put M commands on the queue containing the
     same data and the analysis to perform, the N threads
     read the commands from the queue and analyze as instructed.
A2) Use the standard ExecutorService thread pool, use command
     pattern, have the reader submit M commands that are also tasks
     to the executor containing the same data and the analysis
     to perform, the N threads read the commands from the queue
     and analyze as instructed.
(A1 and A2 are really the same solution just slightingly different
implementation)
B) Use non persistent message queue and JMS, use publish subscribe
    pattern, have the reader publish the data to the queue, have a
    multipla of M custom treads each implementing a single analysis
    subscribing to the queue, reading and analyzing.

A has less overhead than B. A is more efficient than B if some
analysis's take longer time than others.

But B can be used in a clustered approach.

(I guess you could do A3 with commands on a message queue and
a thread pool on each cluster member as well)

Arne

[toc] | [prev] | [standalone]

csiph-web

Pattern suggestion

Contents

#13554 — Pattern suggestion

#13555

#13556

#13557

#13567

#13558

#13565

#13560

#13561

#13562

#13575

#13566