Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #2682 > unrolled thread

Sandboxed Python: memory limits?

Started byChris Angelico <rosuav@gmail.com>
First post2011-04-06 11:59 +1000
Last post2011-04-08 04:59 +1000
Articles 8 — 3 participants

Back to article view | Back to comp.lang.python


Contents

  Sandboxed Python: memory limits? Chris Angelico <rosuav@gmail.com> - 2011-04-06 11:59 +1000
    Re: Sandboxed Python: memory limits? "Martin v. Loewis" <martin@v.loewis.de> - 2011-04-06 22:38 +0200
      Re: Sandboxed Python: memory limits? Chris Angelico <rosuav@gmail.com> - 2011-04-07 10:06 +1000
        Re: Sandboxed Python: memory limits? "Martin v. Loewis" <martin@v.loewis.de> - 2011-04-07 10:01 +0200
        Re: Sandboxed Python: memory limits? "Martin v. Loewis" <martin@v.loewis.de> - 2011-04-07 10:01 +0200
      Re: Sandboxed Python: memory limits? Chris Angelico <rosuav@gmail.com> - 2011-04-07 10:10 +1000
        Re: Sandboxed Python: memory limits? David Bolen <db3l.net@gmail.com> - 2011-04-07 14:36 -0400
          Re: Sandboxed Python: memory limits? Chris Angelico <rosuav@gmail.com> - 2011-04-08 04:59 +1000

#2682 — Sandboxed Python: memory limits?

FromChris Angelico <rosuav@gmail.com>
Date2011-04-06 11:59 +1000
SubjectSandboxed Python: memory limits?
Message-ID<mailman.62.1302055157.9059.python-list@python.org>
Is it possible, and if so is it easy, to limit the amount of memory an
embedded Python interpreter is allowed to allocate? I don't want to
ulimit/rlimit the process if I don't have to (or rather, I want the
process's limit to be high, and the Python limit much lower), but just
to force Python to throw MemoryError sooner than it otherwise would
(my code can then graciously deal with the exception).

Google turned up this thread:
http://stackoverflow.com/questions/1760025/limit-python-vm-memory

The answers given include resource.setrlimit (which presumably goes
straight back to the API, which will affect the whole process), and a
simple counter (invasive to the code). But I want something that I can
impose from the outside.

I have a vague memory of reading somewhere that it's possible to
replace the Python memory allocator. This would be an option, if
there's no simple way to say "your maximum is now 16MB", but I now
can't find it back. Was I hallucinating?

Hoping not to reinvent any wheels today!

Thanks!

Chris Angelico

[toc] | [next] | [standalone]


#2721

From"Martin v. Loewis" <martin@v.loewis.de>
Date2011-04-06 22:38 +0200
Message-ID<inij0j$v6v$1@online.de>
In reply to#2682
> I have a vague memory of reading somewhere that it's possible to
> replace the Python memory allocator. This would be an option, if
> there's no simple way to say "your maximum is now 16MB", but I now
> can't find it back. Was I hallucinating?

You can adjust the implementations of PyMem_Malloc and PyObject_Malloc.
This would catch many allocations, but not all of them. If you adjust
PyMem_MALLOC instead of PyMem_Malloc, you catch even more allocations -
but extensions modules which directly call malloc() still would bypass
this accounting.

Regards,
Martin

[toc] | [prev] | [next] | [standalone]


#2730

FromChris Angelico <rosuav@gmail.com>
Date2011-04-07 10:06 +1000
Message-ID<mailman.91.1302134778.9059.python-list@python.org>
In reply to#2721
On Thu, Apr 7, 2011 at 6:38 AM, Martin v. Loewis <martin@v.loewis.de> wrote:
> You can adjust the implementations of PyMem_Malloc and PyObject_Malloc.
> This would catch many allocations, but not all of them. If you adjust
> PyMem_MALLOC instead of PyMem_Malloc, you catch even more allocations -
> but extensions modules which directly call malloc() still would bypass
> this accounting.

I'm not too concerned about extensions, here; in any case, I lock most
of them off. I just want to prevent stupid stuff like this:

a='a'
while True:
    a+=a

from bringing the entire node to its knees. Obviously that will
eventually bomb with MemoryError, but I'd rather it be some time
*before* the poor computer starts thrashing virtual memory.

(Hmm. I tried the above code in Python 2.6.6 on my scratch box, with
3GB of memory, and it actually died with "OverflowError: strings are
too large to concat" at 1GB. Must be the 32-bit Python on there, heh.
But repeating the exercise in the same Python with a second variable
produces the expected MemoryError.)

If it's too difficult, I'll probably just tell my boss that we need
8GB of physical memory in these things, and then disable virtual
memory. That'll ensure that MemoryError happens before the hard disk
starts grinding performance into dust :)

Chris Angelico

[toc] | [prev] | [next] | [standalone]


#2759

From"Martin v. Loewis" <martin@v.loewis.de>
Date2011-04-07 10:01 +0200
Message-ID<4D9D6F63.9080004@v.loewis.de>
In reply to#2730
Am 07.04.2011 02:06, schrieb Chris Angelico:
> On Thu, Apr 7, 2011 at 6:38 AM, Martin v. Loewis <martin@v.loewis.de> wrote:
>> You can adjust the implementations of PyMem_Malloc and PyObject_Malloc.
>> This would catch many allocations, but not all of them. If you adjust
>> PyMem_MALLOC instead of PyMem_Malloc, you catch even more allocations -
>> but extensions modules which directly call malloc() still would bypass
>> this accounting.
> 
> I'm not too concerned about extensions, here; in any case, I lock most
> of them off. I just want to prevent stupid stuff like this:
> 
> a='a'
> while True:
>     a+=a

That would certainly be caught by instrumenting PyObject_MALLOC. More
generally, I believe that if you instrument the functions I mentioned,
your use case is likely covered.

Regards,
Martin

[toc] | [prev] | [next] | [standalone]


#2760

From"Martin v. Loewis" <martin@v.loewis.de>
Date2011-04-07 10:01 +0200
Message-ID<mailman.103.1302163340.9059.python-list@python.org>
In reply to#2730
Am 07.04.2011 02:06, schrieb Chris Angelico:
> On Thu, Apr 7, 2011 at 6:38 AM, Martin v. Loewis <martin@v.loewis.de> wrote:
>> You can adjust the implementations of PyMem_Malloc and PyObject_Malloc.
>> This would catch many allocations, but not all of them. If you adjust
>> PyMem_MALLOC instead of PyMem_Malloc, you catch even more allocations -
>> but extensions modules which directly call malloc() still would bypass
>> this accounting.
> 
> I'm not too concerned about extensions, here; in any case, I lock most
> of them off. I just want to prevent stupid stuff like this:
> 
> a='a'
> while True:
>     a+=a

That would certainly be caught by instrumenting PyObject_MALLOC. More
generally, I believe that if you instrument the functions I mentioned,
your use case is likely covered.

Regards,
Martin

[toc] | [prev] | [next] | [standalone]


#2731

FromChris Angelico <rosuav@gmail.com>
Date2011-04-07 10:10 +1000
Message-ID<mailman.92.1302135041.9059.python-list@python.org>
In reply to#2721
On Thu, Apr 7, 2011 at 10:06 AM, Chris Angelico <rosuav@gmail.com> wrote:
> I'm not too concerned about extensions, here; in any case, I lock most
> of them off. I just want to prevent stupid stuff like this:
>
> a='a'
> while True:
>    a+=a
>
> from bringing the entire node to its knees. Obviously that will
> eventually bomb with MemoryError, but I'd rather it be some time
> *before* the poor computer starts thrashing virtual memory.

To clarify: One node will be hosting multiple clients' code, and if it
runs out of physical memory, performance for everyone else will be
severely impacted. So I'm hoping to restrict the script's ability to
consume all of memory, without (preferably) ulimit/rlimiting the
entire process (which does other things as well). But if it can't be,
it can't be.

Chris Angelico

[toc] | [prev] | [next] | [standalone]


#2796

FromDavid Bolen <db3l.net@gmail.com>
Date2011-04-07 14:36 -0400
Message-ID<m2mxk129jf.fsf@valheru.db3l.homeip.net>
In reply to#2731
Chris Angelico <rosuav@gmail.com> writes:

>                    So I'm hoping to restrict the script's ability to
> consume all of memory, without (preferably) ulimit/rlimiting the
> entire process (which does other things as well). But if it can't be,
> it can't be.

Just wondering, but rather than spending the energy to cap Python's
allocations internally, could similar effort instead be directed at
separating the "other things" the same process is doing?  How tightly
coupled is it?  If you could split off just the piece you need to
limit into its own process, then you get all the OS tools at your
disposal to restrict the resources of that process.

Depending on what the "other" things are, it might not be too hard to
split apart, even if you have to utilize some IPC mechanism to
coordinate among the two pieces.  Certainly might be of the same order
of magnitude of tweaking Python to limit memory internally.

-- David

[toc] | [prev] | [next] | [standalone]


#2799

FromChris Angelico <rosuav@gmail.com>
Date2011-04-08 04:59 +1000
Message-ID<mailman.119.1302202803.9059.python-list@python.org>
In reply to#2796
On Fri, Apr 8, 2011 at 4:36 AM, David Bolen <db3l.net@gmail.com> wrote:
> Just wondering, but rather than spending the energy to cap Python's
> allocations internally, could similar effort instead be directed at
> separating the "other things" the same process is doing?  How tightly
> coupled is it?  If you could split off just the piece you need to
> limit into its own process, then you get all the OS tools at your
> disposal to restrict the resources of that process.

Well, what happens is roughly this:

Process begins doing a lengthy operation.
Python is called upon to generate data to use in that.
C collects the data Python generated, reformats it, stores it in
database (on another machine).
C then proceeds to use the data, further manipulating it, lots of
processing that culminates in another thing going into the database.

The obvious way to split it would be to send it to the database twice,
separately, as described above (the current code optimizes it down to
a single INSERT at the bottom, keeping it in RAM until then). This
would work, but it seems like a fair amount of extra effort (including
extra load on our central database server) to achieve what I'd have
thought would be fairly simple.

I think it's going to be simplest to use a hardware solution - throw
heaps of RAM at the boxes and then just let them do what they like. We
already have measures to ensure that one client's code can't "be evil"
repeatedly in a loop, so I'll just not worry too much about this
check. (The project's already well past its deadlines - mainly not my
fault!, and if I tell my boss "We'd have to tinker with Python's
internals to do this", he's going to put the kybosh on it in two
seconds flat.)

Thanks for the information, all!

Chris Angelico

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web