Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #10349 > unrolled thread
| Started by | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| First post | 2011-11-30 01:31 -0800 |
| Last post | 2011-11-30 13:23 -0800 |
| Articles | 18 — 7 participants |
Back to article view | Back to comp.lang.java.programmer
Thread question Roedy Green <see_website@mindprod.com.invalid> - 2011-11-30 01:31 -0800
Re: Thread question Peter Duniho <NpOeStPeAdM@NnOwSlPiAnMk.com> - 2011-11-30 01:48 -0800
Re: Thread question Roedy Green <see_website@mindprod.com.invalid> - 2011-11-30 02:32 -0800
Re: Thread question Roedy Green <see_website@mindprod.com.invalid> - 2011-11-30 02:48 -0800
Re: Thread question Patricia Shanahan <pats@acm.org> - 2011-11-30 03:11 -0800
Re: Thread question Paul Cager <paul.cager@googlemail.com> - 2011-11-30 07:10 -0800
Re: Thread question Patricia Shanahan <pats@acm.org> - 2011-11-30 09:34 -0800
Re: Thread question Roedy Green <see_website@mindprod.com.invalid> - 2011-11-30 06:10 -0800
Re: Thread question markspace <-@.> - 2011-11-30 06:13 -0800
Re: Thread question Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2011-11-30 11:53 -0800
Re: Thread question Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-11-30 07:28 -0500
Re: Thread question markspace <-@.> - 2011-11-30 05:30 -0800
Re: Thread question Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-11-30 21:04 -0500
Re: Thread question markspace <-@.> - 2011-11-30 18:28 -0800
Re: Thread question markspace <-@.> - 2011-11-30 05:50 -0800
Re: Thread question markspace <-@.> - 2011-11-30 06:04 -0800
Re: Thread question Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2011-11-30 11:48 -0800
Re: Thread question Roedy Green <see_website@mindprod.com.invalid> - 2011-11-30 13:23 -0800
| From | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| Date | 2011-11-30 01:31 -0800 |
| Subject | Thread question |
| Message-ID | <i9tbd7ppgalucct4i2t2agj9hldjumnmk4@4ax.com> |
The application is I have a list of 1200 books and a list of 21 online bookstores. I want to find out which bookstores carry the book. I have code now that given a book and bookstore will find out if it carries it and records the result. It probes the appropriate page and scans for clues, positive and negative. The process is very slow because it mostly spends its time waiting for the bookstore to respond. I figured I could fairly easily make one book-x-store probe into a Runnable. Happily there is only very simple interactions between Runnables. You could imagine setting all the Runnables loose at once. I need two features: 1. some sort of throttle on releasing them that I don't swamp the JVM. 2. some way of knowing when the last one completed. I could do this by having my Runnables increment and decrement global counts, however I suspect there is something built in to handle this flawlessly. There are so many tools. I wonder if anyone would like to point me to the most appropriate one for this task. If I get this working, I would like to add similar logic to the BrokenLinks link checker. -- Roedy Green Canadian Mind Products http://mindprod.com For me, the appeal of computer programming is that even though I am quite a klutz, I can still produce something, in a sense perfect, because the computer gives me as many chances as I please to get it right.
[toc] | [next] | [standalone]
| From | Peter Duniho <NpOeStPeAdM@NnOwSlPiAnMk.com> |
|---|---|
| Date | 2011-11-30 01:48 -0800 |
| Message-ID | <zZidnTjhYohuZkjTnZ2dnUVZ_jidnZ2d@posted.palinacquisition> |
| In reply to | #10349 |
On 11/30/11 1:31 AM, Roedy Green wrote: > [...] > I need two features: > > 1. some sort of throttle on releasing them that I don't swamp the JVM. > 2. some way of knowing when the last one completed. I'm not really sure precisely what you have in mind. But it sounds like the Semaphore and CountDownLatch classes (both in java.util.concurrent) may be what you're looking for. Semaphore allows you to set a limit on concurrent use of a resource, while CountDownLatch allows you to wait until the count reaches zero. Note that an alternative to the Semaphore would be to queue the Runnable instances in a thread pool with the maximum number of threads set to the degree of concurrency you want. Depending on the thread implementation, this may be a significantly more efficient approach than starting a new thread for each operation and having most of them wait (until they are able to acquire the semaphore). Blocked threads don't cost you CPU time, but they still eat up memory. Pete
[toc] | [prev] | [next] | [standalone]
| From | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| Date | 2011-11-30 02:32 -0800 |
| Message-ID | <081cd7paqgsa8bbg5gjh8v1kinfp7asgt0@4ax.com> |
| In reply to | #10349 |
On Wed, 30 Nov 2011 01:31:23 -0800, Roedy Green <see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted someone who said : >There are so many tools. I wonder if anyone would like to point me to >the most appropriate one for this task. ThreadPoolExecutor is looking promising, though overly elaborate. In my case threads can die as soon as there is no more work for them and there will be no need to dynamically grow or shrink the pool. Now I have to figure out what flavour of BlockingQueue to use. I'll be curious to find out the optimal number of threads. I don't like to pester the book stores. I can debug this with 2 threads without doing too much pestering, but repeated runs to binary search the optimum number of threads... maybe I should use the fancy features to dynamically adjust it via a feedback mechanism. -- Roedy Green Canadian Mind Products http://mindprod.com For me, the appeal of computer programming is that even though I am quite a klutz, I can still produce something, in a sense perfect, because the computer gives me as many chances as I please to get it right.
[toc] | [prev] | [next] | [standalone]
| From | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| Date | 2011-11-30 02:48 -0800 |
| Message-ID | <5a2cd7d5au1e5idj3hp178ie38nl4am6au@4ax.com> |
| In reply to | #10351 |
On Wed, 30 Nov 2011 02:32:14 -0800, Roedy Green <see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted someone who said : >Now I have to figure out what flavour of BlockingQueue to use. It looks like a LinkedBlockingQueue will do well, with the advantage it will scale well for my other project. This is beginning to look a heck of a lot simpler than I thought it would be. Most of the work will be refactoring to bundle up all that needs to be done in a Runnable. There are so many tools I think I got the idea to do anything you had to use 90% of them in some complicated way. But the actuality is you can do something useful with just a couple. -- Roedy Green Canadian Mind Products http://mindprod.com For me, the appeal of computer programming is that even though I am quite a klutz, I can still produce something, in a sense perfect, because the computer gives me as many chances as I please to get it right.
[toc] | [prev] | [next] | [standalone]
| From | Patricia Shanahan <pats@acm.org> |
|---|---|
| Date | 2011-11-30 03:11 -0800 |
| Message-ID | <YbKdnc-ylpHAkkvTnZ2dnUVZ_vudnZ2d@earthlink.com> |
| In reply to | #10351 |
On 11/30/2011 2:32 AM, Roedy Green wrote: ... > I'll be curious to find out the optimal number of threads. I don't > like to pester the book stores. I can debug this with 2 threads > without doing too much pestering, but repeated runs to binary search > the optimum number of threads... maybe I should use the fancy > features to dynamically adjust it via a feedback mechanism. > Here's a basic approach to picking a thread count: 1. Measure a single thread and a small number of threads, and estimate the usage for each critical resource as a function of the number of threads. The critical resources will be things like CPU utilization, memory footprint, and average number of outstanding requests. For most resources, N threads will have N times the utilization of a single thread. Memory footprint will probably show an overhead for shared code plus a component that scales with the number of threads. 2. Decide how much of each resource you want to permit the job to use. This may change with e.g. the hardware configuration on which it is running. It is rarely a good idea to allow one job to use the whole of anything. 3. For each resource, calculate the maximum number of threads that will not overload the resource. 4. Pick the smallest result from step 3. That is the largest number of threads that will not overload any resource. For example, suppose you are willing to have an average of 20 requests outstanding, and each thread averages one request outstanding. Then that resource sets a limit of 20 threads. This approach needs only a couple of measurement runs and some decisions about the total resource levels you want to dedicate to this job. Patricia
[toc] | [prev] | [next] | [standalone]
| From | Paul Cager <paul.cager@googlemail.com> |
|---|---|
| Date | 2011-11-30 07:10 -0800 |
| Message-ID | <2da33860-dc78-4e22-a53b-0b7e0631900a@y7g2000vbe.googlegroups.com> |
| In reply to | #10353 |
On Nov 30, 11:11 am, Patricia Shanahan <p...@acm.org> wrote: > On 11/30/2011 2:32 AM, Roedy Green wrote: > ... > > > I'll be curious to find out the optimal number of threads. I don't > > like to pester the book stores. > > ... > > Here's a basic approach to picking a thread count: > ... In this case it's not just your own resources you have to take into consideration. Some web sites won't like having 20 simultaneous connections from the same IP scraping the site.
[toc] | [prev] | [next] | [standalone]
| From | Patricia Shanahan <pats@acm.org> |
|---|---|
| Date | 2011-11-30 09:34 -0800 |
| Message-ID | <ba6dne3Q6tyq9EvTnZ2dnUVZ_qmdnZ2d@earthlink.com> |
| In reply to | #10364 |
On 11/30/2011 7:10 AM, Paul Cager wrote: > On Nov 30, 11:11 am, Patricia Shanahan<p...@acm.org> wrote: >> On 11/30/2011 2:32 AM, Roedy Green wrote: >> ... >> >>> I'll be curious to find out the optimal number of threads. I don't >>> like to pester the book stores. >>> ... >> >> Here's a basic approach to picking a thread count: >> ... > > In this case it's not just your own resources you have to take into > consideration. Some web sites won't like having 20 simultaneous > connections from the same IP scraping the site. The number of connections, just like the number of outstanding requests, should indeed be considered a critical resource, and will give one of the bounds on the number of threads. I would not be surprised if the external limits gave tighter bounds than the internal ones. Patricia
[toc] | [prev] | [next] | [standalone]
| From | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| Date | 2011-11-30 06:10 -0800 |
| Message-ID | <q5ecd7d0rnk86qeqab0r42qpjrff2007mc@4ax.com> |
| In reply to | #10351 |
On Wed, 30 Nov 2011 02:32:14 -0800, Roedy Green <see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted someone who said : >ThreadPoolExecutor is looking promising, though overly elaborate. I got my code to compile. Then I discovered something quite idiotic about the class, Quoting Goetz et al "There is no predefined saturation policy to make "execute" block when the work queue is full." Huh? is that not the most common case? So it looks like I have to compose a custom blocking policy handler. -- Roedy Green Canadian Mind Products http://mindprod.com For me, the appeal of computer programming is that even though I am quite a klutz, I can still produce something, in a sense perfect, because the computer gives me as many chances as I please to get it right.
[toc] | [prev] | [next] | [standalone]
| From | markspace <-@.> |
|---|---|
| Date | 2011-11-30 06:13 -0800 |
| Message-ID | <jb5dlu$5g5$1@dont-email.me> |
| In reply to | #10361 |
On 11/30/2011 6:10 AM, Roedy Green wrote: > Huh? is that not the most common case? No. > So it looks like I have to compose a custom blocking policy handler. Read it again, or just use a fixed size thread pool from Executors. It'll work fine.
[toc] | [prev] | [next] | [standalone]
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2011-11-30 11:53 -0800 |
| Message-ID | <5RvBq.2561$8O1.1041@newsfe07.iad> |
| In reply to | #10361 |
On 11/30/11 6:10 AM, Roedy Green wrote: > On Wed, 30 Nov 2011 02:32:14 -0800, Roedy Green > <see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted > someone who said : > >> ThreadPoolExecutor is looking promising, though overly elaborate. > > I got my code to compile. Then I discovered something quite idiotic > about the class, Quoting Goetz et al "There is no predefined > saturation policy to make "execute" block when the work queue is > full." > > Huh? is that not the most common case? > So it looks like I have to compose a custom blocking policy handler. > There is a somewhat useful alternative, which is the CallerRunPolicy. Then look at the implementations (in ThreadPoolExecutor) for http://docs.oracle.com/javase/6/docs/api/index.html?java/util/concurrent/RejectedExecutionHandler.html <http://docs.oracle.com/javase/6/docs/api/index.html?java/util/concurrent/ThreadPoolExecutor.html>
[toc] | [prev] | [next] | [standalone]
| From | Eric Sosman <esosman@ieee-dot-org.invalid> |
|---|---|
| Date | 2011-11-30 07:28 -0500 |
| Message-ID | <jb57hc$v6g$1@dont-email.me> |
| In reply to | #10349 |
On 11/30/2011 4:31 AM, Roedy Green wrote:
> The application is I have a list of 1200 books and a list of 21 online
> bookstores. I want to find out which bookstores carry the book. I
> have code now that given a book and bookstore will find out if it
> carries it and records the result. It probes the appropriate page and
> scans for clues, positive and negative.
>
> The process is very slow because it mostly spends its time waiting for
> the bookstore to respond. I figured I could fairly easily make one
> book-x-store probe into a Runnable. Happily there is only very simple
> interactions between Runnables.
>
> You could imagine setting all the Runnables loose at once.
>
> I need two features:
>
> 1. some sort of throttle on releasing them that I don't swamp the JVM.
Use a java.util.concurrent.ExecutorService. There's a large
number of ways to configure them, but I'd suggest starting out
simple, with a fixed-size thread pool plucking tasks from a queue.
> 2. some way of knowing when the last one completed.
A simple counter will do it. Note that you'll likely want to
be able to proceed without waiting for every last probe to complete:
If you've got twenty answers but qcvfl.com[*] appears hung, it may be
best to abandon the lallygagger and proceed with what you've got. So,
consider having a time-out as an alternative "completion" signal.
[*]"Quaint and Curious Volumes of Forgotten Lore."
--
Eric Sosman
esosman@ieee-dot-org.invalid
[toc] | [prev] | [next] | [standalone]
| From | markspace <-@.> |
|---|---|
| Date | 2011-11-30 05:30 -0800 |
| Message-ID | <jb5b6d$kbn$1@dont-email.me> |
| In reply to | #10354 |
On 11/30/2011 4:28 AM, Eric Sosman wrote: > On 11/30/2011 4:31 AM, Roedy Green wrote: >> 2. some way of knowing when the last one completed. > A simple counter will do it. I'd recommend the simple built-in method "ExecutorService::shutdown()". It waits until all tasks have completed.
[toc] | [prev] | [next] | [standalone]
| From | Eric Sosman <esosman@ieee-dot-org.invalid> |
|---|---|
| Date | 2011-11-30 21:04 -0500 |
| Message-ID | <jb6ncf$vae$1@dont-email.me> |
| In reply to | #10358 |
On 11/30/2011 8:30 AM, markspace wrote:
> On 11/30/2011 4:28 AM, Eric Sosman wrote:
>
>> On 11/30/2011 4:31 AM, Roedy Green wrote:
>>> 2. some way of knowing when the last one completed.
>
>> A simple counter will do it.
>
>
> I'd recommend the simple built-in method "ExecutorService::shutdown()".
> It waits until all tasks have completed.
ITYM awaitTermination(), a method I hadn't noticed but which
does as you say *and* has a useful timeout. shutdown(), however,
does not wait: It just "initiates an orderly shutdown" and returns
with the shutdown still (potentially) in progress.
--
Eric Sosman
esosman@ieee-dot-org.invalid
[toc] | [prev] | [next] | [standalone]
| From | markspace <-@.> |
|---|---|
| Date | 2011-11-30 18:28 -0800 |
| Message-ID | <jb6onj$58o$1@dont-email.me> |
| In reply to | #10390 |
On 11/30/2011 6:04 PM, Eric Sosman wrote: > ITYM awaitTermination(), a method I hadn't noticed but which > does as you say *and* has a useful timeout. shutdown(), however, > does not wait: It just "initiates an orderly shutdown" and returns > with the shutdown still (potentially) in progress. > You're probably right, it's been a while since I actually used an executor service.
[toc] | [prev] | [next] | [standalone]
| From | markspace <-@.> |
|---|---|
| Date | 2011-11-30 05:50 -0800 |
| Message-ID | <jb5cbf$t1t$1@dont-email.me> |
| In reply to | #10349 |
On 11/30/2011 1:31 AM, Roedy Green wrote: > 1. some sort of throttle on releasing them that I don't swamp the JVM. I wouldn't bother. It's actually well known that for fast efficient IO you should start as many threads as possible. 21 threads aren't going to swamp anything. If you start to have in excess of say 100 to 1000 threads, then maybe you can think about a throttle. Until there's a chance of that many threads, you're just gold-plating your software. <http://en.wikipedia.org/wiki/Gold_plating_%28software_engineering%29>
[toc] | [prev] | [next] | [standalone]
| From | markspace <-@.> |
|---|---|
| Date | 2011-11-30 06:04 -0800 |
| Message-ID | <jb5d4v$2k4$1@dont-email.me> |
| In reply to | #10359 |
On 11/30/2011 5:50 AM, markspace wrote: > On 11/30/2011 1:31 AM, Roedy Green wrote: > >> 1. some sort of throttle on releasing them that I don't swamp the JVM. > > I wouldn't bother. It's actually well known that for fast efficient IO > you should start as many threads as possible. 21 threads aren't going to > swamp anything. > > If you start to have in excess of say 100 to 1000 threads, then maybe > you can think about a throttle. Until there's a chance of that many > threads, you're just gold-plating your software. > > <http://en.wikipedia.org/wiki/Gold_plating_%28software_engineering%29> > Reading the docs carefully, it looks like Executors::newFixedThreadPool might only allocate new threads as it needs them, up to a maximum. Rather than say allocate the maximum number of specified threads immediately. If it does, this would be ideal for throttling your tasks. Even if it doesn't, a relatively small number of threads, say "newFixedThreadPool(50)" is probably much much easier than trying to invent some throttling mechanism yourself. <http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executors.html#newFixedThreadPool%28int%29>
[toc] | [prev] | [next] | [standalone]
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2011-11-30 11:48 -0800 |
| Message-ID | <ZLvBq.1201$X02.458@newsfe03.iad> |
| In reply to | #10349 |
On 11/30/11 1:31 AM, Roedy Green wrote: > The application is I have a list of 1200 books and a list of 21 online > bookstores. I want to find out which bookstores carry the book. I > have code now that given a book and bookstore will find out if it > carries it and records the result. It probes the appropriate page and > scans for clues, positive and negative. > > The process is very slow because it mostly spends its time waiting for > the bookstore to respond. I figured I could fairly easily make one > book-x-store probe into a Runnable. Happily there is only very simple > interactions between Runnables. > > You could imagine setting all the Runnables loose at once. > > I need two features: > > 1. some sort of throttle on releasing them that I don't swamp the JVM. > 2. some way of knowing when the last one completed. > > I could do this by having my Runnables increment and decrement global > counts, however I suspect there is something built in to handle this > flawlessly. > > There are so many tools. I wonder if anyone would like to point me to > the most appropriate one for this task. > > If I get this working, I would like to add similar logic to the > BrokenLinks link checker. Look into ExecutorService and Executors in java.util.concurrency. They have exactly what you want. <http://docs.oracle.com/javase/6/docs/api/index.html?java/util/concurrent/ExecutorService.html> <http://docs.oracle.com/javase/6/docs/api/index.html?java/util/concurrent/Executors.html> The idea is that you submit jobs to the ExecutorService (which is basically some sort of Thread Pool), and you can wait for the "Future" result of those jobs. You might even consider turning your BookStoreProbe into a Callable: <http://docs.oracle.com/javase/6/docs/api/index.html?java/util/concurrent/Callable.html> More useful things in package summary: <http://docs.oracle.com/javase/6/docs/api/index.html?java/util/concurrent/package-summary.html>
[toc] | [prev] | [next] | [standalone]
| From | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| Date | 2011-11-30 13:23 -0800 |
| Message-ID | <bahcd75d98jf5eog2qjit5dq9qvdnouqgq@4ax.com> |
| In reply to | #10349 |
On Wed, 30 Nov 2011 01:31:23 -0800, Roedy Green <see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted someone who said : I used a semaphore to block when the queue and threads are full. Even with 5 threads it is way faster than before. I see triples finishing a starting b submitting c pause It does not dilly dally recycling a thread. The big problem is the docs. They are aimed at advanced users. They expect you to already understand the basic stuff. I feel irritation about the way you have kludge in blocking. What were they thinking? It is like selling a car without wheels. -- Roedy Green Canadian Mind Products http://mindprod.com For me, the appeal of computer programming is that even though I am quite a klutz, I can still produce something, in a sense perfect, because the computer gives me as many chances as I please to get it right.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.java.programmer
csiph-web