Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #11091 > unrolled thread

multiprocessing timing issue

Started byTim Arnold <Tim.Arnold@sas.com>
First post2011-08-09 13:07 -0400
Last post2011-08-10 22:26 -0700
Articles 4 — 3 participants

Back to article view | Back to comp.lang.python


Contents

  multiprocessing timing issue Tim Arnold <Tim.Arnold@sas.com> - 2011-08-09 13:07 -0400
    Re: multiprocessing timing issue Philip Semanchuk <philip@semanchuk.com> - 2011-08-10 23:36 -0400
      Re: multiprocessing timing issue Tim Arnold <Tim.Arnold@sas.com> - 2011-08-11 12:58 -0400
    Re: multiprocessing timing issue Tim Roberts <timr@probo.com> - 2011-08-10 22:26 -0700

#11091 — multiprocessing timing issue

FromTim Arnold <Tim.Arnold@sas.com>
Date2011-08-09 13:07 -0400
Subjectmultiprocessing timing issue
Message-ID<j1rpfm$v6l$1@foggy.unx.sas.com>
Hi, I'm having problems with an empty Queue using multiprocessing.

The task:
I have a bunch of chapters that I want to gather data on individually 
and then update a report database with the results.
I'm using multiprocessing to do the data-gathering simultaneously.

Each chapter report gets put on a Queue in their separate processes. 
Then each report gets picked off the queue and the report database is 
updated with the results.

My problem is that sometimes the Queue is empty and I guess it's
because the get_data() method takes a lot of time.

I've used multiprocessing before, but never with a Queue like this.
Any notes or suggestions are very welcome.

The task starts off with:
Reporter(chapters).report()

thanks,
--Tim Arnold

from Queue import Empty
from multiprocessing import Process, Queue

def run_mp(objects,fn):
     q = Queue()
     procs = dict()
     for obj in objects:
         procs[obj['name']] = Process(target=fn, args=(obj,q))
         procs[obj['name']].start()

     return q

class Reporter(object):
     def __init__(self, chapters):
         self.chapters = chapters

     def report(self):
         q = run_mp(self.chapters, self.get_data)

         for i in range(len(self.chapters)):
             try:
                 data = q.get(timeout=30)
             except Empty:
                 print 'Report queue empty at %s' % (i)
             else:
                 self.update_report(data)

     def get_data(self, chapter, q):
         data = expensive_calculations()
         q.put(data)

     def update_report(self, data):
         db connection, etc.

[toc] | [next] | [standalone]


#11176

FromPhilip Semanchuk <philip@semanchuk.com>
Date2011-08-10 23:36 -0400
Message-ID<mailman.2144.1313033824.1164.python-list@python.org>
In reply to#11091
On Aug 9, 2011, at 1:07 PM, Tim Arnold wrote:

> Hi, I'm having problems with an empty Queue using multiprocessing.
> 
> The task:
> I have a bunch of chapters that I want to gather data on individually and then update a report database with the results.
> I'm using multiprocessing to do the data-gathering simultaneously.
> 
> Each chapter report gets put on a Queue in their separate processes. Then each report gets picked off the queue and the report database is updated with the results.
> 
> My problem is that sometimes the Queue is empty and I guess it's
> because the get_data() method takes a lot of time.
> 
> I've used multiprocessing before, but never with a Queue like this.
> Any notes or suggestions are very welcome.


Hi Tim,
THis might be a dumb question, but...why is it a problem if the queue is empty? It sounds like you figured out already that get_data() sometimes takes longer than your timeout. So either increase your timeout or learn to live with the fact that the queue is sometimes empty. I don't mean to be rude, I just don't understand the problem. 

Cheers
Philip

[toc] | [prev] | [next] | [standalone]


#11250

FromTim Arnold <Tim.Arnold@sas.com>
Date2011-08-11 12:58 -0400
Message-ID<j211mn$ks1$1@foggy.unx.sas.com>
In reply to#11176
On 8/10/2011 11:36 PM, Philip Semanchuk wrote:
>
> On Aug 9, 2011, at 1:07 PM, Tim Arnold wrote:
>
>> Hi, I'm having problems with an empty Queue using multiprocessing.
>>
>> The task:
>> I have a bunch of chapters that I want to gather data on individually and then update a report database with the results.
>> I'm using multiprocessing to do the data-gathering simultaneously.
>>
>> Each chapter report gets put on a Queue in their separate processes. Then each report gets picked off the queue and the report database is updated with the results.
>>
>> My problem is that sometimes the Queue is empty and I guess it's
>> because the get_data() method takes a lot of time.
>>
>> I've used multiprocessing before, but never with a Queue like this.
>> Any notes or suggestions are very welcome.
>
>
> Hi Tim,
> THis might be a dumb question, but...why is it a problem if the queue is empty? It sounds like you figured out already that get_data() sometimes takes longer than your timeout. So either increase your timeout or learn to live with the fact that the queue is sometimes empty. I don't mean to be rude, I just don't understand the problem.
>
> Cheers
> Philip
>

Hi Philip,
Not a dumb or rude question at all, thanks for thinking about it. When 
the queue is empty the report cannot be updated, so that's why I was 
concerned--I couldn't figure out how to block. Now that's dumb!

 From your response and Tim Roberts too, I see that it's possible to 
block until the data comes back. I just should never have put that 
timeout in there. I must have assumed it would not block with no timeout 
given. Wrong....

 From the docs on q.get():
If optional args 'block' is true and 'timeout' is None (the default),
block if necessary until an item is available. If 'timeout' is
a positive number, it blocks at most 'timeout' seconds and raises
the Empty exception if no item was available within that time.
Otherwise ('block' is false), return an item if one is immediately
available, else raise the Empty exception ('timeout' is ignored
in that case).

thanks,
--Tim Arnold

[toc] | [prev] | [next] | [standalone]


#11185

FromTim Roberts <timr@probo.com>
Date2011-08-10 22:26 -0700
Message-ID<hup647dntprqgegidcfc1huur5nhm0qpl6@4ax.com>
In reply to#11091
Tim Arnold <Tim.Arnold@sas.com> wrote:
>
>The task:
>I have a bunch of chapters that I want to gather data on individually 
>and then update a report database with the results.
>I'm using multiprocessing to do the data-gathering simultaneously.
>
>Each chapter report gets put on a Queue in their separate processes. 
>Then each report gets picked off the queue and the report database is 
>updated with the results.
>
>My problem is that sometimes the Queue is empty and I guess it's
>because the get_data() method takes a lot of time.
>
>I've used multiprocessing before, but never with a Queue like this.
>Any notes or suggestions are very welcome.

The obvious implication is that your timeout is simply not long enough for
your common cases.  If you know how many chapters to expect, why have a
timeout at all?  Why not just wait forever?
-- 
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web