Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #68920 > unrolled thread

Re: advice on sub-classing multiprocessing.Process and multiprocessing.BaseManager

Started byChris Angelico <rosuav@gmail.com>
First post2014-03-25 11:19 +1100
Last post2014-03-26 01:00 +1100
Articles 5 — 2 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: advice on sub-classing multiprocessing.Process and multiprocessing.BaseManager Chris Angelico <rosuav@gmail.com> - 2014-03-25 11:19 +1100
    Re: advice on sub-classing multiprocessing.Process and multiprocessing.BaseManager matt.newville@gmail.com - 2014-03-24 20:27 -0700
      Re: advice on sub-classing multiprocessing.Process and multiprocessing.BaseManager Chris Angelico <rosuav@gmail.com> - 2014-03-25 14:44 +1100
        Re: advice on sub-classing multiprocessing.Process and multiprocessing.BaseManager matt.newville@gmail.com - 2014-03-25 06:34 -0700
          Re: advice on sub-classing multiprocessing.Process and multiprocessing.BaseManager Chris Angelico <rosuav@gmail.com> - 2014-03-26 01:00 +1100

#68920 — Re: advice on sub-classing multiprocessing.Process and multiprocessing.BaseManager

FromChris Angelico <rosuav@gmail.com>
Date2014-03-25 11:19 +1100
SubjectRe: advice on sub-classing multiprocessing.Process and multiprocessing.BaseManager
Message-ID<mailman.8474.1395707125.18130.python-list@python.org>
On Tue, Mar 25, 2014 at 7:24 AM, Matt Newville
<newville@cars.uchicago.edu> wrote:
> I'm maintaining a python interface to a C library for a distributed
> control system (EPICS, sort of a SCADA system) that does a large
> amount of relatively light-weight network I/O.   In order to keep many
> connections open and responsive, and to provide a simple interface,
> the python library keeps a global store of connection state.
>
> This works well for single processes and threads, but not so well for
> multiprocessing, where the global state causes trouble.

>From the sound of things, a single process is probably what you want
here. Is there something you can't handle with one process?

ChrisA

[toc] | [next] | [standalone]


#68938

Frommatt.newville@gmail.com
Date2014-03-24 20:27 -0700
Message-ID<80cf8fb7-d0c5-43a9-bc6f-c61ce6214f98@googlegroups.com>
In reply to#68920
On Monday, March 24, 2014 7:19:56 PM UTC-5, Chris Angelico wrote:
> On Tue, Mar 25, 2014 at 7:24 AM, Matt Newville
> 
> > I'm maintaining a python interface to a C library for a distributed
> > control system (EPICS, sort of a SCADA system) that does a large
> > amount of relatively light-weight network I/O.   In order to keep many
> > connections open and responsive, and to provide a simple interface,
> > the python library keeps a global store of connection state.
> >
> > This works well for single processes and threads, but not so well for
> > multiprocessing, where the global state causes trouble.
> 
> 
> From the sound of things, a single process is probably what you want
> here. Is there something you can't handle with one process?

Thanks for the reply.  I find that appreciation is greatly (perhaps infinitely) delayed whenever I reply "X is probably not what you want to do" without further explanation to a question of "can I get some advice on how to do X?". So, I do thank you for your willingness to reply, even such a guaranteed-to-be-under-appreciated reply. 

There are indeed operations that can't be handled with a single process, such as simultaneously using multiple cores.  This is why we want to use multiprocessing instead of (or, in addition to) threading.  We're trying to do real-time collection of scientific data from a variety of data sources, generally within a LAN. The data can get largish and fast, and intermediate processing occasionally requires non-trivial computation time.  So being able to launch worker processes that can run independently on separate cores would be very helpful.  Ideally, we'd like to let sub-processes make calls to the control system too, say, read new data.

I wasn't really asking "is multiprocessing appropriate?" but whether there was a cleaner way to subclass multiprocessing.BaseManager() to use a subclass of Process().  I can believe the answer is No, but thought I'd ask.

Thanks again,

--Matt

[toc] | [prev] | [next] | [standalone]


#68943

FromChris Angelico <rosuav@gmail.com>
Date2014-03-25 14:44 +1100
Message-ID<mailman.8484.1395719089.18130.python-list@python.org>
In reply to#68938
On Tue, Mar 25, 2014 at 2:27 PM,  <matt.newville@gmail.com> wrote:
> Thanks for the reply.  I find that appreciation is greatly (perhaps infinitely) delayed whenever I reply "X is probably not what you want to do" without further explanation to a question of "can I get some advice on how to do X?". So, I do thank you for your willingness to reply, even such a guaranteed-to-be-under-appreciated reply.
>

Heh. I do see that side of it, but the problem is that sometimes a
question will be asked that implies a completely wrong approach. Take
this example:

"I'm having trouble passing a global variable to a function, how can I do it?"

This exact question came up recently (I may have the wording wrong),
and some of the solutions offered were horrendously convoluted messes
involving passing the name of a global to the function which then used
'exec' or 'eval'. While technically that answers the question, it's
much more helpful to take a step back - no, let's take a step forward
- now another step back - and we're cha-cha'ing! - well, unless you're
a real genius, just take the step back, and look at what you're
actually trying to achieve.

I wasn't trying to imply that you absolutely ought to use a single
process, but more that the exact reasons for not using one process are
significant in your style of coding the multi-process method.

> There are indeed operations that can't be handled with a single process, such as simultaneously using multiple cores.  This is why we want to use multiprocessing instead of (or, in addition to) threading.  We're trying to do real-time collection of scientific data from a variety of data sources, generally within a LAN. The data can get largish and fast, and intermediate processing occasionally requires non-trivial computation time.  So being able to launch worker processes that can run independently on separate cores would be very helpful.  Ideally, we'd like to let sub-processes make calls to the control system too, say, read new data.
>
> I wasn't really asking "is multiprocessing appropriate?" but whether there was a cleaner way to subclass multiprocessing.BaseManager() to use a subclass of Process().  I can believe the answer is No, but thought I'd ask.
>

I've never subclassed BaseManager like this. It might be simpler to
spin off one or more workers and not have them do any network
communication at all; that way, you don't need to worry about the
cache. Set up a process tree with one at the top doing only networking
and process management (so it's always fast), and then use a
multiprocessing.Queue or somesuch to pass info to a subprocess and
back. Then your global connection state is all stored within the top
process, and none of the others need care about it. You might have a
bit of extra effort to pass info back to the parent rather than simply
writing it to the connection, but that's a common requirement in other
areas (eg GUI handling - it's common to push all GUI manipulation onto
the main thread), so it's a common enough model.

But if subclassing and tweaking is the easiest way, and if you don't
mind your solution being potentially fragile (which subclassing like
that is), then you could look into monkey-patching Process. Inject
your code into it and then use the original. It's not perfect, but it
may turn out easier than the "subclass everything" technique.

ChrisA

[toc] | [prev] | [next] | [standalone]


#69023

Frommatt.newville@gmail.com
Date2014-03-25 06:34 -0700
Message-ID<8d59e3a4-e6af-4633-869b-53568f6091cd@googlegroups.com>
In reply to#68943
ChrisA -

>> I wasn't really asking "is multiprocessing appropriate?" but whether
>> there was a cleaner way to subclass multiprocessing.BaseManager() to 
>> use a subclass of Process().  I can believe the answer is No, but 
>> thought I'd ask.
> 
> I've never subclassed BaseManager like this. It might be simpler to
> spin off one or more workers and not have them do any network
> communication at all; that way, you don't need to worry about the
> cache. Set up a process tree with one at the top doing only networking
> and process management (so it's always fast), and then use a
> multiprocessing.Queue or somesuch to pass info to a subprocess and
> back. Then your global connection state is all stored within the top
> process, and none of the others need care about it. You might have a
> bit of extra effort to pass info back to the parent rather than simply
> writing it to the connection, but that's a common requirement in other
> areas (eg GUI handling - it's common to push all GUI manipulation onto
> the main thread), so it's a common enough model.
> 
> But if subclassing and tweaking is the easiest way, and if you don't
> mind your solution being potentially fragile (which subclassing like
> that is), then you could look into monkey-patching Process. Inject
> your code into it and then use the original. It's not perfect, but it
> may turn out easier than the "subclass everything" technique.
> 
> ChrisA

Thanks, I agree that restricting network communications to a parent process would be a good recommended solution, but it's hard to enforce and easy to forget such a recommendation.  It seems better to provide lightweight library-specific subclasses of Process (and Pool) and explaining why they should be used.  This library (pyepics) already does similar things for interaction with other libraries (notably providing decorators to avoid issues with wxPython). 

Monkey-patching multiprocessing.Process seems more fragile than subclassing it.  It turned out that multiprocessing.pool.Pool was also very easy to subclass.  But cleanly subclassing the Managers in multiprocessing.managers look much harder.  I'm not sure if this is intentional or not, or if it should be filed as an issue for multiprocessing.   For now, I'm willing to say that the multiprocessing managers are not yet available with the pyepics library.

Thanks again,

--Matt

[toc] | [prev] | [next] | [standalone]


#69030

FromChris Angelico <rosuav@gmail.com>
Date2014-03-26 01:00 +1100
Message-ID<mailman.8522.1395756056.18130.python-list@python.org>
In reply to#69023
On Wed, Mar 26, 2014 at 12:34 AM,  <matt.newville@gmail.com> wrote:
> Monkey-patching multiprocessing.Process seems more fragile than subclassing it.  It turned out that multiprocessing.pool.Pool was also very easy to subclass.  But cleanly subclassing the Managers in multiprocessing.managers look much harder.  I'm not sure if this is intentional or not, or if it should be filed as an issue for multiprocessing.   For now, I'm willing to say that the multiprocessing managers are not yet available with the pyepics library.
>

Subclassing is actually more fragile than you might think. As you've
found, you need to fidget with more and more classes to make your
change "stick", and also, any small change to implementation details
in the superclass could suddenly break things. It's not really any
safer than monkeypatching, despite all the OO fanatics saying how easy
it is to rework by subclassing. At least when you monkeypatch, you
*know* you're fiddling with internals.

ChrisA

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web