Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.004 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; '(especially': 0.07; '(using': 0.07; '(aka': 0.09; 'caller': 0.09; 'stale': 0.09; 'subject:process': 0.09; 'runs': 0.10; 'missed': 0.12; 'thread': 0.14; 'advance!': 0.16; 'appreciated!': 0.16; 'blocking': 0.16; 'fetch': 0.16; 'subject:class': 0.16; 'all,': 0.19; 'module': 0.19; 'finished': 0.19; "python's": 0.19; 'thoughts': 0.19; 'not,': 0.20; '(in': 0.22; 'task': 0.26; 'function': 0.29; 'external': 0.29; "doesn't": 0.30; 'specified': 0.30; 'message- id:@mail.gmail.com': 0.30; "i'm": 0.30; 'getting': 0.31; 'bunch': 0.31; 'class': 0.32; 'call.': 0.33; 'problem': 0.35; 'subject:with': 0.35; 'received:google.com': 0.35; 'add': 0.35; 'there': 0.35; 'curious': 0.36; 'thanks': 0.36; "i'll": 0.36; 'possible': 0.36; 'handle': 0.38; 'to:addr:python-list': 0.38; 'resource': 0.38; 'to:addr:python.org': 0.39; 'series': 0.66; 'techniques': 0.66; 'due': 0.66; 'believe': 0.68; "it'd": 0.84; 'multicore': 0.84; 'subject:skip:M 10': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=wcyjPldBPpqkhC95iwwVdxVVER7BKNe3nuUwVP5o7kg=; b=m2MH1voylrGjVO0xRMEDakDt2q8kP3q13f0vLGooaSSgZHNtJa46MJ494dwHceBWpa CxjKE0CL8/eKY2kvAn2lhlkfuDJOKO26hH4L+jTMlbW6UxHEs5KvIHKcl7JilHBwHkfF domaSerMGaEsKJafzM1pfVeJOu7jtwNiUVLHzKAFvjvaJGQ5xR7ToNo3N66gLb62F9rB CXZRQUmBxMnxxAw1f5ZfyCnWkgrmz94980Iy1XaOaUo0DM/IAJ+7Mz8AUFCT+dbllk1F AUXGdcaxdXUxvs2nHfSd27aX3X4yopVuRMWYgPMhNjg5zSXebBWF0SwayM5hf816+TDo Q/bw== MIME-Version: 1.0 X-Received: by 10.60.16.230 with SMTP id j6mr7189226oed.47.1387300175017; Tue, 17 Dec 2013 09:09:35 -0800 (PST) Date: Tue, 17 Dec 2013 09:09:34 -0800 Subject: Multiprocessing pool with custom process class From: Sergey Fedorov To: python-list@python.org Content-Type: multipart/alternative; boundary=e89a8f503bb220ff9104edbdfeec X-Mailman-Approved-At: Tue, 17 Dec 2013 20:57:38 +0100 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 87 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1387310260 news.xs4all.nl 2836 [2001:888:2000:d::a6]:59180 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:62234 --e89a8f503bb220ff9104edbdfeec Content-Type: text/plain; charset=ISO-8859-1 Hi All, I have a web-service that needs to handle a bunch of work requests. Each job involves IO call (DB, external web-services to fetch some data), so part of the time is spent on the blocking IO call. On the other side, after getting the data the job involves computational part (using numpy/pandas on time series dataframes). Service runs on multicore machine, so I want to use parallelism as much as possible (especially considering python's GIL) and due to decent number of IO, I want to use multiple threads inside each process so none of CPUs will stale due to IO delays. It'd be the best scenario to use pool of processes and thread pool (because each worker will need to keep some state, like db connections). I already have my own thread pool implementation, that uses some load-balancing and fair-scheduling techniques that are specific to my problem domain. I'm curious if there is any multiprocessing module that I missed and which I can reuse. As it turned out, the on in the multiprocessing module doesn't support custom Process class (if there were, I would be able to derive it and add the functionality I need) ( http://stackoverflow.com/questions/740844/python-multiprocessing-pool-of-custom-processes). Is there any alternative module that I can reuse? If not, what's the best way to notify caller that the task finished its execution (aka multiprocessing.Pool's apply() function behavior)? What primitives are better to use for that purpose (in case I'll have to go with my own implementation of multiprocessing pool)? Any reference to good blog/educational resource will be highly appreciated! If you believe that my solution is not optimal and have better/easier solution (hope I specified my problem good enough), please share your thoughts Thanks in advance! --e89a8f503bb220ff9104edbdfeec Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi All,
=
I have= a web-service that needs to handle a bunch of work requests. Each job invo= lves IO call (DB, external web-services to fetch some data), so part of the= time is spent on the blocking IO call. On the other side, after getting th= e data the job involves computational part (using numpy/pandas on time seri= es dataframes).
Service runs on = multicore machine, so I want to use parallelism as much as possible (especi= ally considering python's GIL) and due to decent number of IO, I want t= o use multiple threads inside each process so none of CPUs will stale due t= o IO delays.

It'd be the best s= cenario to use pool of processes and thread pool (because each worker will = need to keep some state, like db connections). I already have my own thread= pool implementation, that uses some load-balancing and fair-scheduling tec= hniques that are specific to my problem domain.

I'm curious if the= re is any multiprocessing module that I missed and which I can reuse. As it= turned out, the on in the multiprocessing module doesn't support custo= m Process class (if there were, I would be able to derive it and add the fu= nctionality I need) (http://st= ackoverflow.com/questions/740844/python-multiprocessing-pool-of-custom-proc= esses). Is there any alternative module that I can reuse?

If not, what's the= best way to notify caller that the task finished its execution (aka multip= rocessing.Pool's apply() function behavior)? What primitives are better= to use for that purpose (in case I'll have to go with my own implement= ation of multiprocessing pool)? Any reference to good blog/educational reso= urce will be highly appreciated!

If you believe that my= solution is not optimal and have better/easier solution (hope I specified = my problem good enough), please share your thoughts

Thanks in advance!
--e89a8f503bb220ff9104edbdfeec--