Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #67959 > unrolled thread

Re: script uses up all memory

Started byChris Angelico <rosuav@gmail.com>
First post2014-03-07 09:28 +1100
Last post2014-03-07 02:31 +0200
Articles 8 — 2 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: script uses up all memory Chris Angelico <rosuav@gmail.com> - 2014-03-07 09:28 +1100
    Re: script uses up all memory Marko Rauhamaa <marko@pacujo.net> - 2014-03-07 00:34 +0200
      Re: script uses up all memory Chris Angelico <rosuav@gmail.com> - 2014-03-07 09:43 +1100
        Re: script uses up all memory Marko Rauhamaa <marko@pacujo.net> - 2014-03-07 01:12 +0200
          Re: script uses up all memory Chris Angelico <rosuav@gmail.com> - 2014-03-07 10:31 +1100
            Re: script uses up all memory Marko Rauhamaa <marko@pacujo.net> - 2014-03-07 01:53 +0200
              Re: script uses up all memory Chris Angelico <rosuav@gmail.com> - 2014-03-07 11:11 +1100
                Re: script uses up all memory Marko Rauhamaa <marko@pacujo.net> - 2014-03-07 02:31 +0200

#67959 — Re: script uses up all memory

FromChris Angelico <rosuav@gmail.com>
Date2014-03-07 09:28 +1100
SubjectRe: script uses up all memory
Message-ID<mailman.7879.1394144925.18130.python-list@python.org>
On Fri, Mar 7, 2014 at 9:21 AM, Larry Martell <larry.martell@gmail.com> wrote:
> First I added del(self.tools) before the Django call. That did not
> stop the memory consumption. Then I added a call to gc.collect() after
> the del and that did solve it. gc.collect() returns 0 each time, so
> I'm going to declare victory and move on. No time to dig into the
> Django code. Thanks.

Not all problems need to be solved perfectly :) But at very least, I
would put a comment against your collect() call explaining what
happens: that self.tools is involved in a refloop. Most Python code
shouldn't have to call gc.collect(), so it's worth explaining why you
are here.

ChrisA

[toc] | [next] | [standalone]


#67960

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-03-07 00:34 +0200
Message-ID<87ha7bq7ls.fsf@elektro.pacujo.net>
In reply to#67959
Chris Angelico <rosuav@gmail.com>:

> Not all problems need to be solved perfectly :) But at very least, I
> would put a comment against your collect() call explaining what
> happens: that self.tools is involved in a refloop. Most Python code
> shouldn't have to call gc.collect(), so it's worth explaining why you
> are here.

Refloops also are nothing to be avoided. Let GC do its job and forget
about it.


Marko

[toc] | [prev] | [next] | [standalone]


#67961

FromChris Angelico <rosuav@gmail.com>
Date2014-03-07 09:43 +1100
Message-ID<mailman.7880.1394145824.18130.python-list@python.org>
In reply to#67960
On Fri, Mar 7, 2014 at 9:34 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
> Chris Angelico <rosuav@gmail.com>:
>
>> Not all problems need to be solved perfectly :) But at very least, I
>> would put a comment against your collect() call explaining what
>> happens: that self.tools is involved in a refloop. Most Python code
>> shouldn't have to call gc.collect(), so it's worth explaining why you
>> are here.
>
> Refloops also are nothing to be avoided. Let GC do its job and forget
> about it.

I think this thread is proof that they are to be avoided. The GC
wasn't doing its job unless explicitly called on. The true solution is
to break the refloop; the quick fix is to call gc.collect(). I stand
by the recommendation to put an explanatory comment against the
collect call. [1]

ChrisA

[1] Here in Australia, that should be gc.reverse_charges().

[toc] | [prev] | [next] | [standalone]


#67962

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-03-07 01:12 +0200
Message-ID<87d2hyrkf4.fsf@elektro.pacujo.net>
In reply to#67961
Chris Angelico <rosuav@gmail.com>:

> I think this thread is proof that they are to be avoided. The GC
> wasn't doing its job unless explicitly called on. The true solution is
> to break the refloop; the quick fix is to call gc.collect(). I stand
> by the recommendation to put an explanatory comment against the
> collect call.

What I'm saying is that under most circumstances you shouldn't care if
the memory consumption goes up and down. The true solution is to not do
anything about temporary memory consumption. Also, you shouldn't worry
about breaking circular references. That is also often almost impossible
to accomplish as so much modern code builds on closures, which generate
all kinds of circular references under the hood—for your benefit, or
course.


Marko

[toc] | [prev] | [next] | [standalone]


#67963

FromChris Angelico <rosuav@gmail.com>
Date2014-03-07 10:31 +1100
Message-ID<mailman.7881.1394148716.18130.python-list@python.org>
In reply to#67962
On Fri, Mar 7, 2014 at 10:12 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
> Chris Angelico <rosuav@gmail.com>:
>
>> I think this thread is proof that they are to be avoided. The GC
>> wasn't doing its job unless explicitly called on. The true solution is
>> to break the refloop; the quick fix is to call gc.collect(). I stand
>> by the recommendation to put an explanatory comment against the
>> collect call.
>
> What I'm saying is that under most circumstances you shouldn't care if
> the memory consumption goes up and down. The true solution is to not do
> anything about temporary memory consumption. Also, you shouldn't worry
> about breaking circular references. That is also often almost impossible
> to accomplish as so much modern code builds on closures, which generate
> all kinds of circular references under the hood—for your benefit, or
> course.

This isn't a temporary issue, though - see the initial post. After two
hours of five-minutely checks, the computer was wedged. That's a
problem to be solved.

Most of what I do with closures can't create refloops, because the
function isn't referenced from inside itself. You'd need something
like this:

>>> def foo():
    x=1
    y=lambda: (x,y)
    return y
>>> len([foo() for _ in range(1000)])
1000
>>> gc.collect()
4000
>>> len([foo() for _ in range(1000)])
1000
>>> gc.collect()
4000
>>> len([foo() for _ in range(1000)])
1000
>>> gc.collect()
4000

That's repeatably creating garbage. But change the function to not
return itself, and there's no loop:

>>> def foo():
    x=1
    y=lambda: x
    return y
>>> gc.collect()
0
>>> len([foo() for _ in range(1000)])
1000
>>> gc.collect()
0
>>> len([foo() for _ in range(1000)])
1000
>>> gc.collect()
0

The only even reasonably common case that I can think of is a
recursive nested function:

>>> def foo(x):
    def y(f,x=x):
        f()
        for _ in range(x): y(f,x-1)
    return y

It's a function that returns a function that calls its argument some
number of times, where the number is derived in a stupid way from the
argument to the first function. The whole function is garbage, so it's
not surprising that the GC has to collect it.

>>> len([foo(5) for _ in range(1000)])
1000
>>> gc.collect()
3135
>>> len([foo(5) for _ in range(1000)])
1000
>>> gc.collect()
3135
>>> len([foo(5) for _ in range(1000)])
1000
>>> gc.collect()
3135

Can you give a useful example of a closure that does create a refloop?

ChrisA

[toc] | [prev] | [next] | [standalone]


#67964

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-03-07 01:53 +0200
Message-ID<878usmrijk.fsf@elektro.pacujo.net>
In reply to#67963
Chris Angelico <rosuav@gmail.com>:

> Can you give a useful example of a closure that does create a refloop?

Just the other day, I mentioned the state pattern:

   class MyStateMachine:
       def __init__(self):
           sm = self

           class IDLE:
               def ding(self):
                   sm.open_door()
                   sm.state = AT_DOOR()

           class AT_DOOR:
               ...

           self.state = IDLE()

       def ding(self):
           self.state.ding()


So we have:

    MyStateMachine instance
      -> MyStateMachine instance.ding
         -> IDLE instance
            -> IDLE instance.ding
               -> MyStateMachine instance

plus numerous others in this example alone.

In general, event-driven programming produces circular references left
and right, and that might come into wider use with asyncio.

I suspect generators might create circular references as well.

Any tree data structure with parent references creates cycles.

In fact, I would imagine most OO designs create a pretty tight mesh of
back-and-forth references.


Marko

[toc] | [prev] | [next] | [standalone]


#67966

FromChris Angelico <rosuav@gmail.com>
Date2014-03-07 11:11 +1100
Message-ID<mailman.7883.1394151084.18130.python-list@python.org>
In reply to#67964
On Fri, Mar 7, 2014 at 10:53 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
> Chris Angelico <rosuav@gmail.com>:
>
>> Can you give a useful example of a closure that does create a refloop?
>
> Just the other day, I mentioned the state pattern:
>
>    class MyStateMachine:
>        def __init__(self):
>            sm = self
>
>            class IDLE:
>                def ding(self):
>                    sm.open_door()
>                    sm.state = AT_DOOR()

Yeah, that's an extremely unusual way to do things. Why keep on
instantiating objects when you could just reference functions?

> In general, event-driven programming produces circular references left
> and right, and that might come into wider use with asyncio.

Nope; certainly not with closures. I do a whole lot of event-driven
programming (usually in Pike rather than Python, but they work the
same way in this), and there's no reference loop. Properly-done
event-driven programming should have two basic states: a reference
from some invisible thing that can trigger the event (eg a GUI widget)
to a callable, and a reference from that callable to its state. Once
the trigger is gone, the callable is dropped, its state is dropped,
and everything's cleaned up. You don't usually need a reference inside
the function to that function.

Don't forget, a closure need only hang onto the things it actually
uses. It doesn't need all its locals.

> I suspect generators might create circular references as well.

I doubt it.

>>> def foo(x):
    return ("x"*i for i in range(x))
>>> len([foo(5) for _ in range(1000)])
1000
>>> gc.collect()
0
>>> len([foo(5) for _ in range(1000)])
1000
>>> gc.collect()
0

Again, unless it keeps a reference to itself, there's no loop. It'll
need to hang onto some of its locals, but that's all.

> Any tree data structure with parent references creates cycles.

Yes, but how many of those do you actually have and drop? If you
create a GUI, you generally hold your entire widget tree stably. The
only issue is if you create a parent-child subtree and then drop it.
That shouldn't be being done in a tight loop. Most of the classic data
structures like trees are implemented at the C level, so again, your
code shouldn't be concerning itself with that.

> In fact, I would imagine most OO designs create a pretty tight mesh of
> back-and-forth references.

Examples, please? I can think of a handful of situations where I've
created reference loops, and they're sufficiently rare that I can put
comments against them and explicitly break them. For instance, I have
a "Subwindow" that has a "Connection". My window can have multiple
subwindows, a subwindow may or may not have a connection, and the
connection always references its subwindow. The subw->connection->subw
loop is explicitly broken when the connection is terminated. If the
window chooses to drop a subw, it first checks if there's a connection
(and prompts the user to confirm), and then will explicitly
disconnect, which breaks the refloop (as the connection's terminated).
I did a similar thing at work, again with explicit refloop breakage to
ensure clean removal. Apart from those two cases, I can't think of
anything in the last ten years where I've had a data structure with a
loop in it, where the whole loop could be dropped. (My MUD has a loop,
in that a character exists in a room, and the room keeps track of its
contents; but it's not logical to drop a room with characters in it,
and dropping a character is done by moving it to no-room, which breaks
the refloop.)

ChrisA

[toc] | [prev] | [next] | [standalone]


#67969

FromMarko Rauhamaa <marko@pacujo.net>
Date2014-03-07 02:31 +0200
Message-ID<874n3argry.fsf@elektro.pacujo.net>
In reply to#67966
Chris Angelico <rosuav@gmail.com>:

> On Fri, Mar 7, 2014 at 10:53 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
>>    class MyStateMachine:
>>        def __init__(self):
>>            sm = self
>>
>>            class IDLE:
>>                def ding(self):
>>                    sm.open_door()
>>                    sm.state = AT_DOOR()
>
> Yeah, that's an extremely unusual way to do things. Why keep on
> instantiating objects when you could just reference functions?

That's not crucial. Even if the state objects were instantiated and
inner classes not used, you'd get the same circularity:

   class State:
       def __init__(self, sm):
           self.sm = sm

   class Idle(State):
       def ding(self):
           self.sm.open_door()
           self.sm.state = self.sm.AT_DOOR

   class AtDoor(state):
       ...

   class MyStateMachine:
       def __init__(self):
           self.IDLE = Idle(self)
           self.AT_DOOR = AtDoor(self)
           ...
           self.state = self.IDLE


The closure style is more concise and to the point and might perform no
worse.

> Nope; certainly not with closures. I do a whole lot of event-driven
> programming (usually in Pike rather than Python, but they work the
> same way in this), and there's no reference loop. Properly-done
> event-driven programming should have two basic states: a reference
> from some invisible thing that can trigger the event (eg a GUI widget)
> to a callable, and a reference from that callable to its state. Once
> the trigger is gone, the callable is dropped, its state is dropped,
> and everything's cleaned up. You don't usually need a reference inside
> the function to that function.

I'm more familiar with networking. If you need a timer, you need to be
able to start it so you need a reference to it. Ok, maybe you
instantiate a new timer each time, but you may need to cancel the timer
so starting the timer gives you a ticket you can use for canceling.
Similarly, you need a socket (wrapper) to signal an I/O state change,
and you also need to be able to close the socket at a bare minimum.

The task scheduling service (asyncio has one) collects thunks that refer
to your objects and your objects have a reference to the task scheduling
service to be able to schedule new tasks.

> Don't forget, a closure need only hang onto the things it actually
> uses. It doesn't need all its locals.

More importantly, there's nothing bad in circularity. No need to avoid
it. No need to cut cords.


Marko

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web