Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #2860 > unrolled thread
| Started by | Heiko Wundram <modelnine@modelnine.org> |
|---|---|
| First post | 2011-04-08 19:26 +0200 |
| Last post | 2011-04-12 14:13 -0700 |
| Articles | 4 — 2 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Copy-on-write when forking a python process Heiko Wundram <modelnine@modelnine.org> - 2011-04-08 19:26 +0200
Re: Copy-on-write when forking a python process jac <john.theman.connor@gmail.com> - 2011-04-08 11:34 -0700
Re: Copy-on-write when forking a python process Heiko Wundram <modelnine@modelnine.org> - 2011-04-08 23:29 +0200
Re: Copy-on-write when forking a python process jac <john.theman.connor@gmail.com> - 2011-04-12 14:13 -0700
| From | Heiko Wundram <modelnine@modelnine.org> |
|---|---|
| Date | 2011-04-08 19:26 +0200 |
| Subject | Re: Copy-on-write when forking a python process |
| Message-ID | <mailman.156.1302284140.9059.python-list@python.org> |
Am 08.04.2011 18:14, schrieb John Connor: > Has anyone else looked into the COW problem? Are there workarounds > and/or other plans to fix it? Does the solution I am proposing sound > reasonable, or does it seem like overkill? Does anyone foresee any > problems with it? Why'd you need a "fix" like this for something that isn't broken? COW doesn't just refer to the object reference-count, but to the object itself, too. _All_ memory of the parent (and, as such, all objects, too) become unrelated to memory in the child once the fork is complete. The initial object reference-count state of the child is guaranteed to be sound for all objects (because the parent's final reference-count state was, before the process image got cloned [remember, COW is just an optimization for a complete clone, and it's up the operating-system to make sure that you don't notice different semantics from a complete copy]), and what you're proposing (opting in/out of reference counting) breaks that. -- --- Heiko.
[toc] | [next] | [standalone]
| From | jac <john.theman.connor@gmail.com> |
|---|---|
| Date | 2011-04-08 11:34 -0700 |
| Message-ID | <9c3a5a09-4fdd-4ef1-ab45-713762b86cec@z31g2000vbs.googlegroups.com> |
| In reply to | #2860 |
Hi Heiko, I just realized I should probably have put a clearer use-case in my previous message. A example use-case would be if you have a parent process which creates a large dictionary (say several gigabytes). The process then forks several worker processes which access this dictionary. The worker processes do not add or remove objects from the dictionary, nor do they alter the individual elements of the dictionary. They simply perform lookups on the dictionary and perform calculations which are then written to files. If I wrote the above program in C, neither the "dictionary" nor its contents would be copied into the memory of the child processes, but in python as soon as you pass the dictionary itself or any of its contents into a function as an argument, its reference count is changed and the page of memory on which its reference count resides is copied into the child process' memory. What I am proposing is to allow the parent process to disable reference counting for this dictionary and its contents so that the child processes can access them in a readonly fashion without them having to be copied. I disagree with your statement that COW is an optimization for a complete clone, it is an optimization that works at the memory page level, not at the memory image level. In other words, if I write to a copy-on-write page, only that page is copied into my process' address space, not the entire parent image. To the best of my knowledge by preventing the child process from altering an object's reference count you can prevent the object from being copied (assuming the object is not altered explicitly of course.) Hopefully this clarifies my previous post, --jac On Apr 8, 12:26 pm, Heiko Wundram <modeln...@modelnine.org> wrote: > Am 08.04.2011 18:14, schrieb John Connor: > > > Has anyone else looked into the COW problem? Are there workarounds > > and/or other plans to fix it? Does the solution I am proposing sound > > reasonable, or does it seem like overkill? Does anyone foresee any > > problems with it? > > Why'd you need a "fix" like this for something that isn't broken? COW > doesn't just refer to the object reference-count, but to the object > itself, too. _All_ memory of the parent (and, as such, all objects, too) > become unrelated to memory in the child once the fork is complete. > > The initial object reference-count state of the child is guaranteed to > be sound for all objects (because the parent's final reference-count > state was, before the process image got cloned [remember, COW is just an > optimization for a complete clone, and it's up the operating-system to > make sure that you don't notice different semantics from a complete > copy]), and what you're proposing (opting in/out of reference counting) > breaks that. > > -- > --- Heiko.
[toc] | [prev] | [next] | [standalone]
| From | Heiko Wundram <modelnine@modelnine.org> |
|---|---|
| Date | 2011-04-08 23:29 +0200 |
| Message-ID | <mailman.163.1302298203.9059.python-list@python.org> |
| In reply to | #2868 |
Am 08.04.2011 20:34, schrieb jac: > I disagree with your statement that COW is an optimization for a > complete clone, it is an optimization that works at the memory page > level, not at the memory image level. In other words, if I write to a > copy-on-write page, only that page is copied into my process' address > space, not the entire parent image. To the best of my knowledge by > preventing the child process from altering an object's reference count > you can prevent the object from being copied (assuming the object is > not altered explicitly of course.) As I said before: COW for "sharing" a processes forked memory is simply an implementation-detail, and an _optimization_ (and of course a sensible one at that) for fork; there is no provision in the semantics of fork that an operating system should use COW memory-pages for implementing the copying (and early UNIXes didn't do that; they explicitly copied the complete process image for the child). The only semantic that is specified for fork is that the parent and the child have independent process images, that are equivalent copies (except for some details) immediately after the fork call has returned successfully (see SUSv4). What you're thinking of (and what's generally useful in the context you're describing) is shared memory; Python supports putting objects into shared memory using e.g. POSH (which is an extension that allows you to place Python objects in shared memory, using the SysV IPC-featureset that most UNIXes implement today). -- --- Heiko.
[toc] | [prev] | [next] | [standalone]
| From | jac <john.theman.connor@gmail.com> |
|---|---|
| Date | 2011-04-12 14:13 -0700 |
| Message-ID | <3488dd0b-f394-43cb-b660-1a370cb4b022@w21g2000yqm.googlegroups.com> |
| In reply to | #2876 |
Heiko, Thank you for pointing out POSH. I have used some of python's other shared memory facilities, but was completely unaware of POSH, it seems nice. Also, I agree that shared memory would solve the use-case I outlined above, but it is not hard to imagine a slightly different case where the child processes do want to mutate the dictionary, but do not want the changes to show up in the parent process' dictionary. If shared memory is used, then a more complicated algorithm would be needed to get the desired behavior. Long story short, shared-memory is great for some things. Copy on write is great for some other things. Sometimes they are easily interchangeable, sometimes they are not. Many, if not most languages, allow the operating system to perform optimizations when a program forks that python does not allow due to the way it counts references. Many programs would be easier to write if python allowed the os to use cow. However, since I have gotten no other feedback in this list, I think that I will post this in python-ideas as well. Thanks, --jac On Apr 8, 4:29 pm, Heiko Wundram <modeln...@modelnine.org> wrote: > Am 08.04.2011 20:34, schrieb jac: > > > I disagree with your statement that COW is an optimization for a > > complete clone, it is an optimization that works at the memory page > > level, not at the memory image level. In other words, if I write to a > > copy-on-write page, only that page is copied into my process' address > > space, not the entire parent image. To the best of my knowledge by > > preventing the child process from altering an object's reference count > > you can prevent the object from being copied (assuming the object is > > not altered explicitly of course.) > > As I said before: COW for "sharing" a processes forked memory is simply > an implementation-detail, and an _optimization_ (and of course a > sensible one at that) for fork; there is no provision in the semantics > of fork that an operating system should use COW memory-pages for > implementing the copying (and early UNIXes didn't do that; they > explicitly copied the complete process image for the child). The only > semantic that is specified for fork is that the parent and the child > have independent process images, that are equivalent copies (except for > some details) immediately after the fork call has returned successfully > (see SUSv4). > > What you're thinking of (and what's generally useful in the context > you're describing) is shared memory; Python supports putting objects > into shared memory using e.g. POSH (which is an extension that allows > you to place Python objects in shared memory, using the SysV > IPC-featureset that most UNIXes implement today). > > -- > --- Heiko.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web