Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #11091 > unrolled thread
| Started by | Tim Arnold <Tim.Arnold@sas.com> |
|---|---|
| First post | 2011-08-09 13:07 -0400 |
| Last post | 2011-08-10 22:26 -0700 |
| Articles | 4 — 3 participants |
Back to article view | Back to comp.lang.python
multiprocessing timing issue Tim Arnold <Tim.Arnold@sas.com> - 2011-08-09 13:07 -0400
Re: multiprocessing timing issue Philip Semanchuk <philip@semanchuk.com> - 2011-08-10 23:36 -0400
Re: multiprocessing timing issue Tim Arnold <Tim.Arnold@sas.com> - 2011-08-11 12:58 -0400
Re: multiprocessing timing issue Tim Roberts <timr@probo.com> - 2011-08-10 22:26 -0700
| From | Tim Arnold <Tim.Arnold@sas.com> |
|---|---|
| Date | 2011-08-09 13:07 -0400 |
| Subject | multiprocessing timing issue |
| Message-ID | <j1rpfm$v6l$1@foggy.unx.sas.com> |
Hi, I'm having problems with an empty Queue using multiprocessing.
The task:
I have a bunch of chapters that I want to gather data on individually
and then update a report database with the results.
I'm using multiprocessing to do the data-gathering simultaneously.
Each chapter report gets put on a Queue in their separate processes.
Then each report gets picked off the queue and the report database is
updated with the results.
My problem is that sometimes the Queue is empty and I guess it's
because the get_data() method takes a lot of time.
I've used multiprocessing before, but never with a Queue like this.
Any notes or suggestions are very welcome.
The task starts off with:
Reporter(chapters).report()
thanks,
--Tim Arnold
from Queue import Empty
from multiprocessing import Process, Queue
def run_mp(objects,fn):
q = Queue()
procs = dict()
for obj in objects:
procs[obj['name']] = Process(target=fn, args=(obj,q))
procs[obj['name']].start()
return q
class Reporter(object):
def __init__(self, chapters):
self.chapters = chapters
def report(self):
q = run_mp(self.chapters, self.get_data)
for i in range(len(self.chapters)):
try:
data = q.get(timeout=30)
except Empty:
print 'Report queue empty at %s' % (i)
else:
self.update_report(data)
def get_data(self, chapter, q):
data = expensive_calculations()
q.put(data)
def update_report(self, data):
db connection, etc.
[toc] | [next] | [standalone]
| From | Philip Semanchuk <philip@semanchuk.com> |
|---|---|
| Date | 2011-08-10 23:36 -0400 |
| Message-ID | <mailman.2144.1313033824.1164.python-list@python.org> |
| In reply to | #11091 |
On Aug 9, 2011, at 1:07 PM, Tim Arnold wrote: > Hi, I'm having problems with an empty Queue using multiprocessing. > > The task: > I have a bunch of chapters that I want to gather data on individually and then update a report database with the results. > I'm using multiprocessing to do the data-gathering simultaneously. > > Each chapter report gets put on a Queue in their separate processes. Then each report gets picked off the queue and the report database is updated with the results. > > My problem is that sometimes the Queue is empty and I guess it's > because the get_data() method takes a lot of time. > > I've used multiprocessing before, but never with a Queue like this. > Any notes or suggestions are very welcome. Hi Tim, THis might be a dumb question, but...why is it a problem if the queue is empty? It sounds like you figured out already that get_data() sometimes takes longer than your timeout. So either increase your timeout or learn to live with the fact that the queue is sometimes empty. I don't mean to be rude, I just don't understand the problem. Cheers Philip
[toc] | [prev] | [next] | [standalone]
| From | Tim Arnold <Tim.Arnold@sas.com> |
|---|---|
| Date | 2011-08-11 12:58 -0400 |
| Message-ID | <j211mn$ks1$1@foggy.unx.sas.com> |
| In reply to | #11176 |
On 8/10/2011 11:36 PM, Philip Semanchuk wrote:
>
> On Aug 9, 2011, at 1:07 PM, Tim Arnold wrote:
>
>> Hi, I'm having problems with an empty Queue using multiprocessing.
>>
>> The task:
>> I have a bunch of chapters that I want to gather data on individually and then update a report database with the results.
>> I'm using multiprocessing to do the data-gathering simultaneously.
>>
>> Each chapter report gets put on a Queue in their separate processes. Then each report gets picked off the queue and the report database is updated with the results.
>>
>> My problem is that sometimes the Queue is empty and I guess it's
>> because the get_data() method takes a lot of time.
>>
>> I've used multiprocessing before, but never with a Queue like this.
>> Any notes or suggestions are very welcome.
>
>
> Hi Tim,
> THis might be a dumb question, but...why is it a problem if the queue is empty? It sounds like you figured out already that get_data() sometimes takes longer than your timeout. So either increase your timeout or learn to live with the fact that the queue is sometimes empty. I don't mean to be rude, I just don't understand the problem.
>
> Cheers
> Philip
>
Hi Philip,
Not a dumb or rude question at all, thanks for thinking about it. When
the queue is empty the report cannot be updated, so that's why I was
concerned--I couldn't figure out how to block. Now that's dumb!
From your response and Tim Roberts too, I see that it's possible to
block until the data comes back. I just should never have put that
timeout in there. I must have assumed it would not block with no timeout
given. Wrong....
From the docs on q.get():
If optional args 'block' is true and 'timeout' is None (the default),
block if necessary until an item is available. If 'timeout' is
a positive number, it blocks at most 'timeout' seconds and raises
the Empty exception if no item was available within that time.
Otherwise ('block' is false), return an item if one is immediately
available, else raise the Empty exception ('timeout' is ignored
in that case).
thanks,
--Tim Arnold
[toc] | [prev] | [next] | [standalone]
| From | Tim Roberts <timr@probo.com> |
|---|---|
| Date | 2011-08-10 22:26 -0700 |
| Message-ID | <hup647dntprqgegidcfc1huur5nhm0qpl6@4ax.com> |
| In reply to | #11091 |
Tim Arnold <Tim.Arnold@sas.com> wrote: > >The task: >I have a bunch of chapters that I want to gather data on individually >and then update a report database with the results. >I'm using multiprocessing to do the data-gathering simultaneously. > >Each chapter report gets put on a Queue in their separate processes. >Then each report gets picked off the queue and the report database is >updated with the results. > >My problem is that sometimes the Queue is empty and I guess it's >because the get_data() method takes a lot of time. > >I've used multiprocessing before, but never with a Queue like this. >Any notes or suggestions are very welcome. The obvious implication is that your timeout is simply not long enough for your common cases. If you know how many chapters to expect, why have a timeout at all? Why not just wait forever? -- Tim Roberts, timr@probo.com Providenza & Boekelheide, Inc.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web