Groups > comp.lang.java.programmer > #10349 > unrolled thread

Thread question

Started by	Roedy Green <see_website@mindprod.com.invalid>
First post	2011-11-30 01:31 -0800
Last post	2011-11-30 13:23 -0800
Articles	18 — 7 participants

Back to article view | Back to comp.lang.java.programmer

  Thread question Roedy Green <see_website@mindprod.com.invalid> - 2011-11-30 01:31 -0800
    Re: Thread question Peter Duniho <NpOeStPeAdM@NnOwSlPiAnMk.com> - 2011-11-30 01:48 -0800
    Re: Thread question Roedy Green <see_website@mindprod.com.invalid> - 2011-11-30 02:32 -0800
      Re: Thread question Roedy Green <see_website@mindprod.com.invalid> - 2011-11-30 02:48 -0800
      Re: Thread question Patricia Shanahan <pats@acm.org> - 2011-11-30 03:11 -0800
        Re: Thread question Paul Cager <paul.cager@googlemail.com> - 2011-11-30 07:10 -0800
          Re: Thread question Patricia Shanahan <pats@acm.org> - 2011-11-30 09:34 -0800
      Re: Thread question Roedy Green <see_website@mindprod.com.invalid> - 2011-11-30 06:10 -0800
        Re: Thread question markspace <-@.> - 2011-11-30 06:13 -0800
        Re: Thread question Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2011-11-30 11:53 -0800
    Re: Thread question Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-11-30 07:28 -0500
      Re: Thread question markspace <-@.> - 2011-11-30 05:30 -0800
        Re: Thread question Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-11-30 21:04 -0500
          Re: Thread question markspace <-@.> - 2011-11-30 18:28 -0800
    Re: Thread question markspace <-@.> - 2011-11-30 05:50 -0800
      Re: Thread question markspace <-@.> - 2011-11-30 06:04 -0800
    Re: Thread question Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2011-11-30 11:48 -0800
    Re: Thread question Roedy Green <see_website@mindprod.com.invalid> - 2011-11-30 13:23 -0800

#10349 — Thread question

From	Roedy Green <see_website@mindprod.com.invalid>
Date	2011-11-30 01:31 -0800
Subject	Thread question
Message-ID	<i9tbd7ppgalucct4i2t2agj9hldjumnmk4@4ax.com>

The application is I have a list of 1200 books and a list of 21 online
bookstores.  I want to find out which bookstores carry the book.  I
have code now that given a book and bookstore will find out if it
carries it and records the result.  It probes the appropriate page and
scans for clues, positive and negative.

The process is very slow because it mostly spends its time waiting for
the bookstore to respond.  I figured I could fairly easily make one
book-x-store probe into a Runnable. Happily there is only very simple
interactions between Runnables.

You could imagine setting all the Runnables loose at once.  

I need two features:

1. some sort of throttle on releasing them that I don't swamp the JVM.
2. some  way of knowing when the last one completed.

I could do this by having my Runnables increment and decrement global
counts, however I suspect there is something built in to handle this
flawlessly.

There are so many tools. I wonder if anyone would like to point me to
the most appropriate one for this task.

If I get this working, I would like to add similar logic to the
BrokenLinks link checker.
-- 
Roedy Green Canadian Mind Products
http://mindprod.com
For me, the appeal of computer programming is that
even though I am quite a klutz,
I can still produce something, in a sense
perfect, because the computer gives me as many
chances as I please to get it right.

[toc] | [next] | [standalone]

#10350

From	Peter Duniho <NpOeStPeAdM@NnOwSlPiAnMk.com>
Date	2011-11-30 01:48 -0800
Message-ID	<zZidnTjhYohuZkjTnZ2dnUVZ_jidnZ2d@posted.palinacquisition>
In reply to	#10349

On 11/30/11 1:31 AM, Roedy Green wrote:
> [...]
> I need two features:
>
> 1. some sort of throttle on releasing them that I don't swamp the JVM.
> 2. some  way of knowing when the last one completed.

I'm not really sure precisely what you have in mind.  But it sounds like 
the Semaphore and CountDownLatch classes (both in java.util.concurrent) 
may be what you're looking for.  Semaphore allows you to set a limit on 
concurrent use of a resource, while CountDownLatch allows you to wait 
until the count reaches zero.

Note that an alternative to the Semaphore would be to queue the Runnable 
instances in a thread pool with the maximum number of threads set to the 
degree of concurrency you want.  Depending on the thread implementation, 
this may be a significantly more efficient approach than starting a new 
thread for each operation and having most of them wait (until they are 
able to acquire the semaphore).  Blocked threads don't cost you CPU 
time, but they still eat up memory.

Pete

[toc] | [prev] | [next] | [standalone]

#10351

From	Roedy Green <see_website@mindprod.com.invalid>
Date	2011-11-30 02:32 -0800
Message-ID	<081cd7paqgsa8bbg5gjh8v1kinfp7asgt0@4ax.com>
In reply to	#10349

On Wed, 30 Nov 2011 01:31:23 -0800, Roedy Green
<see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted
someone who said :

>There are so many tools. I wonder if anyone would like to point me to
>the most appropriate one for this task.

ThreadPoolExecutor is  looking promising, though overly elaborate.  In
my case threads can die as soon as there is no more work for them and
there will be no need to dynamically grow or shrink the pool.
Now I have to figure out what flavour of BlockingQueue to use.

I'll be curious to find out the optimal number of threads.  I don't
like to pester the book stores. I can debug this with 2 threads
without doing too much pestering, but repeated runs to binary search
the optimum number of threads...  maybe I should use the fancy
features to dynamically adjust it via a feedback mechanism.

-- 
Roedy Green Canadian Mind Products
http://mindprod.com
For me, the appeal of computer programming is that
even though I am quite a klutz,
I can still produce something, in a sense
perfect, because the computer gives me as many
chances as I please to get it right.

[toc] | [prev] | [next] | [standalone]

#10352

From	Roedy Green <see_website@mindprod.com.invalid>
Date	2011-11-30 02:48 -0800
Message-ID	<5a2cd7d5au1e5idj3hp178ie38nl4am6au@4ax.com>
In reply to	#10351

On Wed, 30 Nov 2011 02:32:14 -0800, Roedy Green
<see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted
someone who said :

>Now I have to figure out what flavour of BlockingQueue to use.

It looks like a LinkedBlockingQueue will do well, with the advantage
it will scale well for my other project.  This is beginning to look a
heck of a lot simpler than I thought it would be.  Most of the work
will be refactoring to bundle up all that needs to be done in a
Runnable.

There are so many tools I think I got the idea to do anything you had
to use 90% of them in some complicated way.  But the actuality is you
can do something useful with just a couple.
-- 
Roedy Green Canadian Mind Products
http://mindprod.com
For me, the appeal of computer programming is that
even though I am quite a klutz,
I can still produce something, in a sense
perfect, because the computer gives me as many
chances as I please to get it right.

[toc] | [prev] | [next] | [standalone]

#10353

From	Patricia Shanahan <pats@acm.org>
Date	2011-11-30 03:11 -0800
Message-ID	<YbKdnc-ylpHAkkvTnZ2dnUVZ_vudnZ2d@earthlink.com>
In reply to	#10351

On 11/30/2011 2:32 AM, Roedy Green wrote:
...
> I'll be curious to find out the optimal number of threads.  I don't
> like to pester the book stores. I can debug this with 2 threads
> without doing too much pestering, but repeated runs to binary search
> the optimum number of threads...  maybe I should use the fancy
> features to dynamically adjust it via a feedback mechanism.
>

Here's a basic approach to picking a thread count:

1. Measure a single thread and a small number of threads, and estimate
the usage for each critical resource as a function of the number of
threads.

The critical resources will be things like CPU utilization, memory
footprint, and average number of outstanding requests. For most
resources, N threads will have N times the utilization of a single
thread. Memory footprint will probably show an overhead for shared code
plus a component that scales with the number of threads.

2. Decide how much of each resource you want to permit the job to use.
This may change with e.g. the hardware configuration on which it is
running. It is rarely a good idea to allow one job to use the whole of
anything.

3. For each resource, calculate the maximum number of threads that will
not overload the resource.

4. Pick the smallest result from step 3. That is the largest number of
threads that will not overload any resource.

For example, suppose you are willing to have an average of 20 requests
outstanding, and each thread averages one request outstanding. Then that
resource sets a limit of 20 threads.

This approach needs only a couple of measurement runs and some decisions
about the total resource levels you want to dedicate to this job.

Patricia

[toc] | [prev] | [next] | [standalone]

#10364

From	Paul Cager <paul.cager@googlemail.com>
Date	2011-11-30 07:10 -0800
Message-ID	<2da33860-dc78-4e22-a53b-0b7e0631900a@y7g2000vbe.googlegroups.com>
In reply to	#10353

On Nov 30, 11:11 am, Patricia Shanahan <p...@acm.org> wrote:
> On 11/30/2011 2:32 AM, Roedy Green wrote:
> ...
>
> > I'll be curious to find out the optimal number of threads.  I don't
> > like to pester the book stores.
> > ...
>
> Here's a basic approach to picking a thread count:
> ...

In this case it's not just your own resources you have to take into
consideration. Some web sites won't like having 20 simultaneous
connections from the same IP scraping the site.

[toc] | [prev] | [next] | [standalone]

#10368

From	Patricia Shanahan <pats@acm.org>
Date	2011-11-30 09:34 -0800
Message-ID	<ba6dne3Q6tyq9EvTnZ2dnUVZ_qmdnZ2d@earthlink.com>
In reply to	#10364

On 11/30/2011 7:10 AM, Paul Cager wrote:
> On Nov 30, 11:11 am, Patricia Shanahan<p...@acm.org>  wrote:
>> On 11/30/2011 2:32 AM, Roedy Green wrote:
>> ...
>>
>>> I'll be curious to find out the optimal number of threads.  I don't
>>> like to pester the book stores.
>>> ...
>>
>> Here's a basic approach to picking a thread count:
>> ...
>
> In this case it's not just your own resources you have to take into
> consideration. Some web sites won't like having 20 simultaneous
> connections from the same IP scraping the site.

The number of connections, just like the number of outstanding
requests, should indeed be considered a critical resource, and will give
one of the bounds on the number of threads. I would not be surprised if
the external limits gave tighter bounds than the internal ones.

Patricia

[toc] | [prev] | [next] | [standalone]

#10361

From	Roedy Green <see_website@mindprod.com.invalid>
Date	2011-11-30 06:10 -0800
Message-ID	<q5ecd7d0rnk86qeqab0r42qpjrff2007mc@4ax.com>
In reply to	#10351

On Wed, 30 Nov 2011 02:32:14 -0800, Roedy Green
<see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted
someone who said :

>ThreadPoolExecutor is  looking promising, though overly elaborate. 

I got my code to compile. Then I discovered something quite idiotic
about the class,  Quoting Goetz et al "There is no predefined
saturation policy to make "execute" block when the work queue is
full."

Huh?  is that not the most common case?
So it looks like I have to compose a custom blocking policy handler.

-- 
Roedy Green Canadian Mind Products
http://mindprod.com
For me, the appeal of computer programming is that
even though I am quite a klutz,
I can still produce something, in a sense
perfect, because the computer gives me as many
chances as I please to get it right.

[toc] | [prev] | [next] | [standalone]

#10362

From	markspace <-@.>
Date	2011-11-30 06:13 -0800
Message-ID	<jb5dlu$5g5$1@dont-email.me>
In reply to	#10361

On 11/30/2011 6:10 AM, Roedy Green wrote:

> Huh?  is that not the most common case?

No.

> So it looks like I have to compose a custom blocking policy handler.

Read it again, or just use a fixed size thread pool from Executors. 
It'll work fine.

[toc] | [prev] | [next] | [standalone]

#10374

From	Daniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date	2011-11-30 11:53 -0800
Message-ID	<5RvBq.2561$8O1.1041@newsfe07.iad>
In reply to	#10361

On 11/30/11 6:10 AM, Roedy Green wrote:
> On Wed, 30 Nov 2011 02:32:14 -0800, Roedy Green
> <see_website@mindprod.com.invalid>  wrote, quoted or indirectly quoted
> someone who said :
>
>> ThreadPoolExecutor is  looking promising, though overly elaborate.
>
> I got my code to compile. Then I discovered something quite idiotic
> about the class,  Quoting Goetz et al "There is no predefined
> saturation policy to make "execute" block when the work queue is
> full."
>
> Huh?  is that not the most common case?
> So it looks like I have to compose a custom blocking policy handler.
>

There is a somewhat useful alternative, which is the CallerRunPolicy.

Then look at the implementations (in ThreadPoolExecutor) for

http://docs.oracle.com/javase/6/docs/api/index.html?java/util/concurrent/RejectedExecutionHandler.html

<http://docs.oracle.com/javase/6/docs/api/index.html?java/util/concurrent/ThreadPoolExecutor.html>

[toc] | [prev] | [next] | [standalone]

#10354

From	Eric Sosman <esosman@ieee-dot-org.invalid>
Date	2011-11-30 07:28 -0500
Message-ID	<jb57hc$v6g$1@dont-email.me>
In reply to	#10349

On 11/30/2011 4:31 AM, Roedy Green wrote:
> The application is I have a list of 1200 books and a list of 21 online
> bookstores.  I want to find out which bookstores carry the book.  I
> have code now that given a book and bookstore will find out if it
> carries it and records the result.  It probes the appropriate page and
> scans for clues, positive and negative.
>
> The process is very slow because it mostly spends its time waiting for
> the bookstore to respond.  I figured I could fairly easily make one
> book-x-store probe into a Runnable. Happily there is only very simple
> interactions between Runnables.
>
> You could imagine setting all the Runnables loose at once.
>
> I need two features:
>
> 1. some sort of throttle on releasing them that I don't swamp the JVM.

     Use a java.util.concurrent.ExecutorService.  There's a large
number of ways to configure them, but I'd suggest starting out
simple, with a fixed-size thread pool plucking tasks from a queue.

> 2. some  way of knowing when the last one completed.

     A simple counter will do it.  Note that you'll likely want to
be able to proceed without waiting for every last probe to complete:
If you've got twenty answers but qcvfl.com[*] appears hung, it may be
best to abandon the lallygagger and proceed with what you've got.  So,
consider having a time-out as an alternative "completion" signal.

     [*]"Quaint and Curious Volumes of Forgotten Lore."

-- 
Eric Sosman
esosman@ieee-dot-org.invalid

[toc] | [prev] | [next] | [standalone]

#10358

From	markspace <-@.>
Date	2011-11-30 05:30 -0800
Message-ID	<jb5b6d$kbn$1@dont-email.me>
In reply to	#10354

On 11/30/2011 4:28 AM, Eric Sosman wrote:

> On 11/30/2011 4:31 AM, Roedy Green wrote:
>> 2. some way of knowing when the last one completed.

> A simple counter will do it.


I'd recommend the simple built-in method "ExecutorService::shutdown()". 
  It waits until all tasks have completed.

[toc] | [prev] | [next] | [standalone]

#10390

From	Eric Sosman <esosman@ieee-dot-org.invalid>
Date	2011-11-30 21:04 -0500
Message-ID	<jb6ncf$vae$1@dont-email.me>
In reply to	#10358

On 11/30/2011 8:30 AM, markspace wrote:
> On 11/30/2011 4:28 AM, Eric Sosman wrote:
>
>> On 11/30/2011 4:31 AM, Roedy Green wrote:
>>> 2. some way of knowing when the last one completed.
>
>> A simple counter will do it.
>
>
> I'd recommend the simple built-in method "ExecutorService::shutdown()".
> It waits until all tasks have completed.

     ITYM awaitTermination(), a method I hadn't noticed but which
does as you say *and* has a useful timeout.  shutdown(), however,
does not wait: It just "initiates an orderly shutdown" and returns
with the shutdown still (potentially) in progress.

-- 
Eric Sosman
esosman@ieee-dot-org.invalid

[toc] | [prev] | [next] | [standalone]

#10392

From	markspace <-@.>
Date	2011-11-30 18:28 -0800
Message-ID	<jb6onj$58o$1@dont-email.me>
In reply to	#10390

On 11/30/2011 6:04 PM, Eric Sosman wrote:

> ITYM awaitTermination(), a method I hadn't noticed but which
> does as you say *and* has a useful timeout. shutdown(), however,
> does not wait: It just "initiates an orderly shutdown" and returns
> with the shutdown still (potentially) in progress.
>

You're probably right, it's been a while since I actually used an 
executor service.

[toc] | [prev] | [next] | [standalone]

#10359

From	markspace <-@.>
Date	2011-11-30 05:50 -0800
Message-ID	<jb5cbf$t1t$1@dont-email.me>
In reply to	#10349

On 11/30/2011 1:31 AM, Roedy Green wrote:

> 1. some sort of throttle on releasing them that I don't swamp the JVM.

I wouldn't bother.  It's actually well known that for fast efficient IO 
you should start as many threads as possible.  21 threads aren't going 
to swamp anything.

If you start to have in excess of say 100 to 1000 threads, then maybe 
you can think about a throttle.  Until there's a chance of that many 
threads, you're just gold-plating your software.

<http://en.wikipedia.org/wiki/Gold_plating_%28software_engineering%29>

[toc] | [prev] | [next] | [standalone]

#10360

From	markspace <-@.>
Date	2011-11-30 06:04 -0800
Message-ID	<jb5d4v$2k4$1@dont-email.me>
In reply to	#10359

On 11/30/2011 5:50 AM, markspace wrote:
> On 11/30/2011 1:31 AM, Roedy Green wrote:
>
>> 1. some sort of throttle on releasing them that I don't swamp the JVM.
>
> I wouldn't bother. It's actually well known that for fast efficient IO
> you should start as many threads as possible. 21 threads aren't going to
> swamp anything.
>
> If you start to have in excess of say 100 to 1000 threads, then maybe
> you can think about a throttle. Until there's a chance of that many
> threads, you're just gold-plating your software.
>
> <http://en.wikipedia.org/wiki/Gold_plating_%28software_engineering%29>
>

Reading the docs carefully, it looks like Executors::newFixedThreadPool 
might only allocate new threads as it needs them, up to a maximum. 
Rather than say allocate the maximum number of specified threads 
immediately.  If it does, this would be ideal for throttling your tasks.

Even if it doesn't, a relatively small number of threads, say 
"newFixedThreadPool(50)" is probably much much easier than trying to 
invent some throttling mechanism yourself.

<http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executors.html#newFixedThreadPool%28int%29>

[toc] | [prev] | [next] | [standalone]

#10373

From	Daniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date	2011-11-30 11:48 -0800
Message-ID	<ZLvBq.1201$X02.458@newsfe03.iad>
In reply to	#10349

On 11/30/11 1:31 AM, Roedy Green wrote:
> The application is I have a list of 1200 books and a list of 21 online
> bookstores.  I want to find out which bookstores carry the book.  I
> have code now that given a book and bookstore will find out if it
> carries it and records the result.  It probes the appropriate page and
> scans for clues, positive and negative.
>
> The process is very slow because it mostly spends its time waiting for
> the bookstore to respond.  I figured I could fairly easily make one
> book-x-store probe into a Runnable. Happily there is only very simple
> interactions between Runnables.
>
> You could imagine setting all the Runnables loose at once.
>
> I need two features:
>
> 1. some sort of throttle on releasing them that I don't swamp the JVM.
> 2. some  way of knowing when the last one completed.
>
> I could do this by having my Runnables increment and decrement global
> counts, however I suspect there is something built in to handle this
> flawlessly.
>
> There are so many tools. I wonder if anyone would like to point me to
> the most appropriate one for this task.
>
> If I get this working, I would like to add similar logic to the
> BrokenLinks link checker.
Look into ExecutorService and Executors in java.util.concurrency. They 
have exactly what you want.

<http://docs.oracle.com/javase/6/docs/api/index.html?java/util/concurrent/ExecutorService.html>

<http://docs.oracle.com/javase/6/docs/api/index.html?java/util/concurrent/Executors.html>

The idea is that you submit jobs to the ExecutorService (which is 
basically some sort of Thread Pool), and you can wait for the "Future" 
result of those jobs.

You might even consider turning your BookStoreProbe into a Callable:

<http://docs.oracle.com/javase/6/docs/api/index.html?java/util/concurrent/Callable.html>


More useful things in package summary:
<http://docs.oracle.com/javase/6/docs/api/index.html?java/util/concurrent/package-summary.html>

[toc] | [prev] | [next] | [standalone]

#10380

From	Roedy Green <see_website@mindprod.com.invalid>
Date	2011-11-30 13:23 -0800
Message-ID	<bahcd75d98jf5eog2qjit5dq9qvdnouqgq@4ax.com>
In reply to	#10349

On Wed, 30 Nov 2011 01:31:23 -0800, Roedy Green
<see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted
someone who said :

I used a semaphore to block when the queue and threads are full.  Even
with 5 threads it is way faster than before.

I see triples
finishing a
starting b
submitting c

pause

It does not dilly dally recycling a thread.

The big problem is the docs. They are aimed at advanced users. They
expect you to already understand the basic stuff.  I feel irritation
about the way you have kludge in blocking.  What were they thinking?
It is like selling a car without wheels.
-- 
Roedy Green Canadian Mind Products
http://mindprod.com
For me, the appeal of computer programming is that
even though I am quite a klutz,
I can still produce something, in a sense
perfect, because the computer gives me as many
chances as I please to get it right.

[toc] | [prev] | [standalone]

csiph-web

Thread question

Contents

#10349 — Thread question

#10350

#10351

#10352

#10353

#10364

#10368

#10361

#10362

#10374

#10354

#10358

#10390

#10392

#10359

#10360

#10373

#10380