Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #2682 > unrolled thread
| Started by | Chris Angelico <rosuav@gmail.com> |
|---|---|
| First post | 2011-04-06 11:59 +1000 |
| Last post | 2011-04-08 04:59 +1000 |
| Articles | 8 — 3 participants |
Back to article view | Back to comp.lang.python
Sandboxed Python: memory limits? Chris Angelico <rosuav@gmail.com> - 2011-04-06 11:59 +1000
Re: Sandboxed Python: memory limits? "Martin v. Loewis" <martin@v.loewis.de> - 2011-04-06 22:38 +0200
Re: Sandboxed Python: memory limits? Chris Angelico <rosuav@gmail.com> - 2011-04-07 10:06 +1000
Re: Sandboxed Python: memory limits? "Martin v. Loewis" <martin@v.loewis.de> - 2011-04-07 10:01 +0200
Re: Sandboxed Python: memory limits? "Martin v. Loewis" <martin@v.loewis.de> - 2011-04-07 10:01 +0200
Re: Sandboxed Python: memory limits? Chris Angelico <rosuav@gmail.com> - 2011-04-07 10:10 +1000
Re: Sandboxed Python: memory limits? David Bolen <db3l.net@gmail.com> - 2011-04-07 14:36 -0400
Re: Sandboxed Python: memory limits? Chris Angelico <rosuav@gmail.com> - 2011-04-08 04:59 +1000
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2011-04-06 11:59 +1000 |
| Subject | Sandboxed Python: memory limits? |
| Message-ID | <mailman.62.1302055157.9059.python-list@python.org> |
Is it possible, and if so is it easy, to limit the amount of memory an embedded Python interpreter is allowed to allocate? I don't want to ulimit/rlimit the process if I don't have to (or rather, I want the process's limit to be high, and the Python limit much lower), but just to force Python to throw MemoryError sooner than it otherwise would (my code can then graciously deal with the exception). Google turned up this thread: http://stackoverflow.com/questions/1760025/limit-python-vm-memory The answers given include resource.setrlimit (which presumably goes straight back to the API, which will affect the whole process), and a simple counter (invasive to the code). But I want something that I can impose from the outside. I have a vague memory of reading somewhere that it's possible to replace the Python memory allocator. This would be an option, if there's no simple way to say "your maximum is now 16MB", but I now can't find it back. Was I hallucinating? Hoping not to reinvent any wheels today! Thanks! Chris Angelico
[toc] | [next] | [standalone]
| From | "Martin v. Loewis" <martin@v.loewis.de> |
|---|---|
| Date | 2011-04-06 22:38 +0200 |
| Message-ID | <inij0j$v6v$1@online.de> |
| In reply to | #2682 |
> I have a vague memory of reading somewhere that it's possible to > replace the Python memory allocator. This would be an option, if > there's no simple way to say "your maximum is now 16MB", but I now > can't find it back. Was I hallucinating? You can adjust the implementations of PyMem_Malloc and PyObject_Malloc. This would catch many allocations, but not all of them. If you adjust PyMem_MALLOC instead of PyMem_Malloc, you catch even more allocations - but extensions modules which directly call malloc() still would bypass this accounting. Regards, Martin
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2011-04-07 10:06 +1000 |
| Message-ID | <mailman.91.1302134778.9059.python-list@python.org> |
| In reply to | #2721 |
On Thu, Apr 7, 2011 at 6:38 AM, Martin v. Loewis <martin@v.loewis.de> wrote:
> You can adjust the implementations of PyMem_Malloc and PyObject_Malloc.
> This would catch many allocations, but not all of them. If you adjust
> PyMem_MALLOC instead of PyMem_Malloc, you catch even more allocations -
> but extensions modules which directly call malloc() still would bypass
> this accounting.
I'm not too concerned about extensions, here; in any case, I lock most
of them off. I just want to prevent stupid stuff like this:
a='a'
while True:
a+=a
from bringing the entire node to its knees. Obviously that will
eventually bomb with MemoryError, but I'd rather it be some time
*before* the poor computer starts thrashing virtual memory.
(Hmm. I tried the above code in Python 2.6.6 on my scratch box, with
3GB of memory, and it actually died with "OverflowError: strings are
too large to concat" at 1GB. Must be the 32-bit Python on there, heh.
But repeating the exercise in the same Python with a second variable
produces the expected MemoryError.)
If it's too difficult, I'll probably just tell my boss that we need
8GB of physical memory in these things, and then disable virtual
memory. That'll ensure that MemoryError happens before the hard disk
starts grinding performance into dust :)
Chris Angelico
[toc] | [prev] | [next] | [standalone]
| From | "Martin v. Loewis" <martin@v.loewis.de> |
|---|---|
| Date | 2011-04-07 10:01 +0200 |
| Message-ID | <4D9D6F63.9080004@v.loewis.de> |
| In reply to | #2730 |
Am 07.04.2011 02:06, schrieb Chris Angelico: > On Thu, Apr 7, 2011 at 6:38 AM, Martin v. Loewis <martin@v.loewis.de> wrote: >> You can adjust the implementations of PyMem_Malloc and PyObject_Malloc. >> This would catch many allocations, but not all of them. If you adjust >> PyMem_MALLOC instead of PyMem_Malloc, you catch even more allocations - >> but extensions modules which directly call malloc() still would bypass >> this accounting. > > I'm not too concerned about extensions, here; in any case, I lock most > of them off. I just want to prevent stupid stuff like this: > > a='a' > while True: > a+=a That would certainly be caught by instrumenting PyObject_MALLOC. More generally, I believe that if you instrument the functions I mentioned, your use case is likely covered. Regards, Martin
[toc] | [prev] | [next] | [standalone]
| From | "Martin v. Loewis" <martin@v.loewis.de> |
|---|---|
| Date | 2011-04-07 10:01 +0200 |
| Message-ID | <mailman.103.1302163340.9059.python-list@python.org> |
| In reply to | #2730 |
Am 07.04.2011 02:06, schrieb Chris Angelico: > On Thu, Apr 7, 2011 at 6:38 AM, Martin v. Loewis <martin@v.loewis.de> wrote: >> You can adjust the implementations of PyMem_Malloc and PyObject_Malloc. >> This would catch many allocations, but not all of them. If you adjust >> PyMem_MALLOC instead of PyMem_Malloc, you catch even more allocations - >> but extensions modules which directly call malloc() still would bypass >> this accounting. > > I'm not too concerned about extensions, here; in any case, I lock most > of them off. I just want to prevent stupid stuff like this: > > a='a' > while True: > a+=a That would certainly be caught by instrumenting PyObject_MALLOC. More generally, I believe that if you instrument the functions I mentioned, your use case is likely covered. Regards, Martin
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2011-04-07 10:10 +1000 |
| Message-ID | <mailman.92.1302135041.9059.python-list@python.org> |
| In reply to | #2721 |
On Thu, Apr 7, 2011 at 10:06 AM, Chris Angelico <rosuav@gmail.com> wrote: > I'm not too concerned about extensions, here; in any case, I lock most > of them off. I just want to prevent stupid stuff like this: > > a='a' > while True: > a+=a > > from bringing the entire node to its knees. Obviously that will > eventually bomb with MemoryError, but I'd rather it be some time > *before* the poor computer starts thrashing virtual memory. To clarify: One node will be hosting multiple clients' code, and if it runs out of physical memory, performance for everyone else will be severely impacted. So I'm hoping to restrict the script's ability to consume all of memory, without (preferably) ulimit/rlimiting the entire process (which does other things as well). But if it can't be, it can't be. Chris Angelico
[toc] | [prev] | [next] | [standalone]
| From | David Bolen <db3l.net@gmail.com> |
|---|---|
| Date | 2011-04-07 14:36 -0400 |
| Message-ID | <m2mxk129jf.fsf@valheru.db3l.homeip.net> |
| In reply to | #2731 |
Chris Angelico <rosuav@gmail.com> writes: > So I'm hoping to restrict the script's ability to > consume all of memory, without (preferably) ulimit/rlimiting the > entire process (which does other things as well). But if it can't be, > it can't be. Just wondering, but rather than spending the energy to cap Python's allocations internally, could similar effort instead be directed at separating the "other things" the same process is doing? How tightly coupled is it? If you could split off just the piece you need to limit into its own process, then you get all the OS tools at your disposal to restrict the resources of that process. Depending on what the "other" things are, it might not be too hard to split apart, even if you have to utilize some IPC mechanism to coordinate among the two pieces. Certainly might be of the same order of magnitude of tweaking Python to limit memory internally. -- David
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2011-04-08 04:59 +1000 |
| Message-ID | <mailman.119.1302202803.9059.python-list@python.org> |
| In reply to | #2796 |
On Fri, Apr 8, 2011 at 4:36 AM, David Bolen <db3l.net@gmail.com> wrote: > Just wondering, but rather than spending the energy to cap Python's > allocations internally, could similar effort instead be directed at > separating the "other things" the same process is doing? How tightly > coupled is it? If you could split off just the piece you need to > limit into its own process, then you get all the OS tools at your > disposal to restrict the resources of that process. Well, what happens is roughly this: Process begins doing a lengthy operation. Python is called upon to generate data to use in that. C collects the data Python generated, reformats it, stores it in database (on another machine). C then proceeds to use the data, further manipulating it, lots of processing that culminates in another thing going into the database. The obvious way to split it would be to send it to the database twice, separately, as described above (the current code optimizes it down to a single INSERT at the bottom, keeping it in RAM until then). This would work, but it seems like a fair amount of extra effort (including extra load on our central database server) to achieve what I'd have thought would be fairly simple. I think it's going to be simplest to use a hardware solution - throw heaps of RAM at the boxes and then just let them do what they like. We already have measures to ensure that one client's code can't "be evil" repeatedly in a loop, so I'll just not worry too much about this check. (The project's already well past its deadlines - mainly not my fault!, and if I tell my boss "We'd have to tinker with Python's internals to do this", he's going to put the kybosh on it in two seconds flat.) Thanks for the information, all! Chris Angelico
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web