Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #52518

Re: Digging into multiprocessing

Path csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <demianbrecht@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'interpreter': 0.05; 'output': 0.05; 'root': 0.05; 'say,': 0.05; '(especially': 0.07; 'correct.': 0.07; 'initialize': 0.07; 'made.': 0.07; '__name__': 0.09; 'assuming': 0.09; 'correct,': 0.09; 'expectation': 0.09; 'global,': 0.09; 'instance.': 0.09; 'objects,': 0.09; 'okay': 0.09; 'postgresql,': 0.09; 'read-only': 0.09; 'subject:into': 0.09; 'url:github': 0.09; 'used.': 0.09; 'windows,': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; 'suggest': 0.14; 'mostly': 0.14; 'thread': 0.14; 'bytecode': 0.16; 'caveats': 0.16; 'clone': 0.16; 'databasing,': 0.16; 'empty.': 0.16; 'fork': 0.16; 'happily': 0.16; 'how,': 0.16; 'line).': 0.16; 'messy': 0.16; "module's": 0.16; 'once.': 0.16; 'osx)': 0.16; 'proceeds': 0.16; 'say.': 0.16; 'simplest': 0.16; 'splits': 0.16; 'subject:skip:m 10': 0.16; 'such,': 0.16; 'targets': 0.16; 'tcp': 0.16; 'wrote:': 0.18; 'obviously': 0.18; 'all,': 0.19; 'module': 0.19; 'file,': 0.19; 'seems': 0.21; 'memory': 0.22; 'aug': 0.22; 'separate': 0.22; 'cc:addr:python.org': 0.22; 'entries': 0.24; 'instance,': 0.24; 'of.': 0.24; 'passes': 0.24; '(or': 0.24; 'cc:2**0': 0.24; "i've": 0.25; 'equivalent': 0.26; 'references': 0.26; 'read,': 0.26; 'header:In-Reply-To:1': 0.27; 'point': 0.28; 'specifically': 0.29; 'chris': 0.29; 'am,': 0.29; 'unix': 0.29; 'related': 0.29; "doesn't": 0.30; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'url:mailman': 0.30; '(which': 0.31; 'code': 0.31; 'url:wiki': 0.31; '13,': 0.31; 'correctly.': 0.31; 'fine,': 0.31; 'initialized': 0.31; 'linux.': 0.31; 'skip:q 20': 0.31; "they'll": 0.31; 'url:wikipedia': 0.31; 'yes.': 0.31; 'file': 0.32; 'run': 0.32; 'another': 0.32; 'url:python': 0.33; 'cases': 0.33; 'sense': 0.34; 'problem': 0.35; 'connection': 0.35; 'objects': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'really': 0.36; 'yield': 0.36; 'done': 0.36; 'url:listinfo': 0.36; 'doing': 0.36; 'thanks': 0.36; 'url:org': 0.36; 'should': 0.36; 'so,': 0.37; 'performance': 0.37; 'being': 0.38; 'connections': 0.38; 'depends': 0.38; 'process,': 0.38; 'work?': 0.38; 'writes': 0.38; 'issue': 0.38; 'rather': 0.38; 'expect': 0.39; 'explain': 0.39; 'does': 0.39; 'simply': 0.61; "you're": 0.61; 'times': 0.62; "you've": 0.63; 'guarantee': 0.63; 'kind': 0.63; 'such': 0.63; 'more': 0.64; 'great': 0.65; 'to:addr:gmail.com': 0.65; 'within': 0.65; 'it!': 0.67; 'believe': 0.68; 'atm': 0.68; 'facilities': 0.69; 'safe': 0.72; 'blogs': 0.78; '(print': 0.84; 'awesome,': 0.84; 'effectively,': 0.84; 'yielded': 0.84; '(running': 0.91; 'processes,': 0.91; 'connection,': 0.95; 'serious': 0.97; '2013': 0.98
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Ce1bsY8SpPs5RGZB6bELKNTRNL2TQeSUffkOE3F8SkM=; b=Fcqkzg6UHDYZ4FU0GPoa3s3EkPZG4u5tR0XhsQOA6/FkTlfsc0+ElGdaneDEoV9BA1 nvuf0ec86NYu7gJoVUEmeXi35MoOBc/Zq45bJ7LU2OvyLJqw6CdXsWR7CuM6Op9C/HfL 1JlSbsFtRjt0b2J9dpUfqZGGh9UvpGxbRAsj6/zXVRGI9o5ZfI8tBUxvp6b/aRZLeuun nhNwlW7pLcD+BV4ZMvf/Lo5DKB+GmctGaTr/JatLmErFkdkhBqyXYJjafvbcUPEhSRXI amWy8Fo4laDU6zk9s4RTkDLizwvZH07H3aKrsOSPcpbkMhW/uH3FyDfkUaAlxJDQuotL WM/g==
MIME-Version 1.0
X-Received by 10.15.64.194 with SMTP id o42mr3663694eex.62.1376490206752; Wed, 14 Aug 2013 07:23:26 -0700 (PDT)
In-Reply-To <CAPTjJmrLkdAMTrEv8Sr-hBJwFHx+E4qHNLYNKS8_-e+Oh6dGig@mail.gmail.com>
References <CAE+T62bvXW0_Qxggso45vOvHkTRmz2B-yCVJCi3PWwU-QJudRg@mail.gmail.com> <CAPTjJmrLkdAMTrEv8Sr-hBJwFHx+E4qHNLYNKS8_-e+Oh6dGig@mail.gmail.com>
Date Wed, 14 Aug 2013 07:23:26 -0700
Subject Re: Digging into multiprocessing
From Demian Brecht <demianbrecht@gmail.com>
To Chris Angelico <rosuav@gmail.com>
Content-Type text/plain; charset=ISO-8859-1
Cc Python <python-list@python.org>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.577.1376490208.1251.python-list@python.org> (permalink)
Lines 97
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1376490208 news.xs4all.nl 15905 [2001:888:2000:d::a6]:37608
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:52518

Show key headers only | View raw


Awesome, thanks for the detailed response Chris.

On Tue, Aug 13, 2013 at 8:03 AM, Chris Angelico <rosuav@gmail.com> wrote:
> On Tue, Aug 13, 2013 at 12:17 AM, Demian Brecht <demianbrecht@gmail.com> wrote:
>> Hi all,
>>
>> Some work that I'm doing atm is in some serious need of
>> parallelization. As such, I've been digging into the multiprocessing
>> module more than I've had to before and I had a few questions come up
>> as a result:
>>
>> (Running 2.7.5+ on OSX)
>>
>> 1. From what I've read, a new Python interpreter instance is kicked
>> off for every worker. My immediate assumption was that the file that
>> the code was in would be reloaded for every instance. After some
>> digging, this is obviously not the case (print __name__ at the top of
>> the file only yield a single output line). So, I'm assuming that
>> there's some optimization that passes of the bytecode within the
>> interpreter? How, exactly does this work? (I couldn't really find much
>> in the docs about it, or am I just not looking in the right place?)
>
> I don't know about OSX specifically, but I believe it forks, same as
> on Linux. That means all your initialization code is done once. Be
> aware that this is NOT the case on Windows.
>
> http://en.wikipedia.org/wiki/Fork_(operating_system)
>
> Effectively, code execution proceeds down a single thread until the
> point of forking, and then the fork call returns twice. Can be messy
> to explain but it makes great sense once you grok it!
>
>> 2. For cases using methods such as map_async/wait, once the bytecode
>> has been passed into the child process, `target` is called `n` times
>> until the current queue is empty. Is this correct?
>
> That would be about right, yes. The intention is that it's equivalent
> to map(), only it splits the work across multiple processes; so the
> expectation is that it will call target for each yielded item in the
> iterable.
>
>> 3. Because __main__ is only run when the root process imports, if
>> using global, READ-ONLY objects, such as, say, a database connection,
>> then it might be better from a performance standpoint to initialize
>> that at main, relying on the interpreter references to be passed
>> around correctly. I've read some blogs and such that suggest that you
>> should create a new database connection within your child process
>> targets (or code called into by the targets). This seems to be less
>> than optimal to me if my assumption is correct.
>
> This depends hugely on the objects you're working with. If your
> database connection uses a TCP socket, for instance, all forked
> processes will share the same socket, which will most likely result in
> interleaved writes and messed-up reads. But with a log file, that
> might be okay (especially if you have some kind of atomicity guarantee
> that ensures that individual log entries don't interleave). The
> problem isn't really the Python objects (which will have been happily
> cloned by the fork() procedure), but the OS-level resources used.
>
> With a good database like PostgreSQL, and reasonable numbers of
> workers (say, 10-50, rather than 1000-5000), you should be able to
> simply establish separate connections for each subprocess without
> worrying about performance. If you really need billions of worker
> processes, it might be best to use one of the multiprocessing module's
> queueing/semaphoring facilities and either have one process that does
> all databasing, or let them all use it but serially. But if you can
> manage with separate connections, that would be the easiest, safest,
> and simplest to debug.
>
>> 4. Related to 3, read-only objects that are initialized prior to being
>> passed into a sub-process are safe to reuse as long as they are
>> treated as being immutable. Any other objects should use one of the
>> shared memory features.
>>
>> Is this more or less correct, or am I just off my rocker?
>
> When you fork, each process will get its own clone of the objects in
> the parent. For read-only objects (module-level constants and such),
> this is fine, as you say. The issue is if you want another process to
> "see" the change you made. That's when you need some form of shared
> data.
>
> So, yes, more or less correct; at least, what you've said is mostly
> right for Unix - there may be some additional caveats for OSX
> specifically that I'm not aware of. But I expect they'll be minor;
> it's mainly Windows, which doesn't *have* fork(2), where there are
> major differences.
>
> ChrisA
> --
> http://mail.python.org/mailman/listinfo/python-list



-- 
Demian Brecht
http://demianbrecht.github.com

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Digging into multiprocessing Demian Brecht <demianbrecht@gmail.com> - 2013-08-14 07:23 -0700

csiph-web