Groups > comp.lang.python > #64475 > unrolled thread

Self healthcheck

Started by	Asaf Las <roegltd@gmail.com>
First post	2014-01-21 18:51 -0800
Last post	2014-01-23 07:36 +0200
Articles	9 — 5 participants

Back to article view | Back to comp.lang.python

  Self healthcheck Asaf Las <roegltd@gmail.com> - 2014-01-21 18:51 -0800
    Re: Self healthcheck Chris Angelico <rosuav@gmail.com> - 2014-01-22 14:08 +1100
      Re: Self healthcheck Asaf Las <roegltd@gmail.com> - 2014-01-22 00:18 -0800
        Re: Self healthcheck Nicholas Cole <nicholas.cole@gmail.com> - 2014-01-22 08:43 +0000
          Re: Self healthcheck Asaf Las <roegltd@gmail.com> - 2014-01-22 07:51 -0800
        Re: Self healthcheck "Frank Millman" <frank@chagford.com> - 2014-01-22 10:56 +0200
          Re: Self healthcheck Asaf Las <roegltd@gmail.com> - 2014-01-22 08:03 -0800
            Re: Self healthcheck Dave Angel <davea@davea.name> - 2014-01-22 13:40 -0500
            Re: Self healthcheck "Frank Millman" <frank@chagford.com> - 2014-01-23 07:36 +0200

#64475 — Self healthcheck

From	Asaf Las <roegltd@gmail.com>
Date	2014-01-21 18:51 -0800
Subject	Self healthcheck
Message-ID	<0d1fc1a7-c585-45ba-8c1a-0cc468712a48@googlegroups.com>

Hi 

When designing long running background process 
is it feasible to monitor object/memory leakage due 
to improper programming?
If it could be possible to make module which monitor and 
record trends if alive objects then event can be 
generated and logged if noof "zombie" objects 
are to increase in longer run.

Would the gc.count() serve for such purpose?

Thanks

Asaf

[toc] | [next] | [standalone]

#64478

From	Chris Angelico <rosuav@gmail.com>
Date	2014-01-22 14:08 +1100
Message-ID	<mailman.5829.1390360114.18130.python-list@python.org>
In reply to	#64475

On Wed, Jan 22, 2014 at 1:51 PM, Asaf Las <roegltd@gmail.com> wrote:
> When designing long running background process
> is it feasible to monitor object/memory leakage due
> to improper programming?

I assume you're talking about pure Python code, running under CPython.
(If you're writing an extension module, say in C, there are completely
different ways to detect reference leaks; and other Pythons will
behave slightly differently.) There's no way to detect truly
unreferenced objects, because they simply won't exist - not after a
garbage collection run, and usually sooner than that. But if you want
to find objects that you're somehow not using and yet still have live
references to, you'll need to define "using" in a way that makes
sense. Generally there aren't many ways that that can happen, so those
few places are candidates for a weak reference system (maybe you map a
name to the "master object" representing that thing, and you can
recreate the master object from the disk, so when nothing else is
referring to it, you can happily flush it out - that mapping is a good
candidate for weak references).

But for most programs, don't bother. CPython is pretty good at keeping
track of its own references, so chances are you don't need to - and if
you're seeing the process's memory usage going up, it's entirely
possible you can neither detect nor correct the problem in Python code
(eg heap fragmentation).

ChrisA

[toc] | [prev] | [next] | [standalone]

#64482

From	Asaf Las <roegltd@gmail.com>
Date	2014-01-22 00:18 -0800
Message-ID	<58c541ab-c6e1-45a8-b03a-8597ed7ecb48@googlegroups.com>
In reply to	#64478

On Wednesday, January 22, 2014 5:08:25 AM UTC+2, Chris Angelico wrote:
> I assume you're talking about pure Python code, running under CPython.
> (If you're writing an extension module, say in C, there are completely
> different ways to detect reference leaks; and other Pythons will
> behave slightly differently.) There's no way to detect truly
> unreferenced objects, because they simply won't exist - not after a
> garbage collection run, and usually sooner than that. But if you want
> to find objects that you're somehow not using and yet still have live
> references to, you'll need to define "using" in a way that makes
> sense. Generally there aren't many ways that that can happen, so those
> few places are candidates for a weak reference system (maybe you map a
> name to the "master object" representing that thing, and you can
> recreate the master object from the disk, so when nothing else is
> referring to it, you can happily flush it out - that mapping is a good
> candidate for weak references).
> 
> But for most programs, don't bother. CPython is pretty good at keeping
> track of its own references, so chances are you don't need to - and if
> you're seeing the process's memory usage going up, it's entirely
> possible you can neither detect nor correct the problem in Python code
> (eg heap fragmentation).
> ChrisA

Hi Chris

Yes the question was about CPython. But i am not after CPython leaks
though detecting these would be good, but my own mistakes leading to 
accumulation of data in mutable structures.
there will be few processes running python code standalone communicating
across servers and every activity will be spread over time so 
i have to persistently keep record of activity and remove it later when
activity is finished. In addition to checking objects directly i would 
like to analyze also app health indirectly via checking amount of data 
it holds. let say there is permanently 100 activities per second and 
typical object count figure is 1000 (in abstract units averaged over long enough time window), so i would check throughput and memory to see if my program is healthy in terms of leaking resources and generate log if it 
is not.
Input to such module will be traffic events (whatever event significant 
to object creation). 
So i am looking for proper way to detect memory held by CPython app. And 
it would be good if memory can be deduced down to object/class name so 
blamed one could be identified and reported.

Thanks 

Asaf

[toc] | [prev] | [next] | [standalone]

#64486

From	Nicholas Cole <nicholas.cole@gmail.com>
Date	2014-01-22 08:43 +0000
Message-ID	<mailman.5831.1390380226.18130.python-list@python.org>
In reply to	#64482

[Multipart message — attachments visible in raw view] — view raw

On Wednesday, 22 January 2014, Asaf Las <roegltd@gmail.com> wrote:

> On Wednesday, January 22, 2014 5:08:25 AM UTC+2, Chris Angelico wrote:
> > I assume you're talking about pure Python code, running under CPython.
> > (If you're writing an extension module, say in C, there are completely
> > different ways to detect reference leaks; and other Pythons will
> > behave slightly differently.) There's no way to detect truly
> > unreferenced objects, because they simply won't exist - not after a
> > garbage collection run, and usually sooner than that. But if you want
> > to find objects that you're somehow not using and yet still have live
> > references to, you'll need to define "using" in a way that makes
> > sense. Generally there aren't many ways that that can happen, so those
> > few places are candidates for a weak reference system (maybe you map a
> > name to the "master object" representing that thing, and you can
> > recreate the master object from the disk, so when nothing else is
> > referring to it, you can happily flush it out - that mapping is a good
> > candidate for weak references).
> >
> > But for most programs, don't bother. CPython is pretty good at keeping
> > track of its own references, so chances are you don't need to - and if
> > you're seeing the process's memory usage going up, it's entirely
> > possible you can neither detect nor correct the problem in Python code
> > (eg heap fragmentation).
> > ChrisA
>
> Hi Chris
>
> Yes the question was about CPython. But i am not after CPython leaks
> though detecting these would be good, but my own mistakes leading to
> accumulation of data in mutable structures.
> there will be few processes running python code standalone communicating
> across servers and every activity will be spread over time so
> i have to persistently keep record of activity and remove it later when
> activity is finished. In addition to checking objects directly i would
> like to analyze also app health indirectly via checking amount of data
> it holds. let say there is permanently 100 activities per second and
> typical object count figure is 1000 (in abstract units averaged over long
> enough time window), so i would check throughput and memory to see if my
> program is healthy in terms of leaking resources and generate log if it
> is not.
> Input to such module will be traffic events (whatever event significant
> to object creation).
> So i am looking for proper way to detect memory held by CPython app. And
> it would be good if memory can be deduced down to object/class name so
> blamed one could be identified and reported.
>
>
There are some good tools recommended here:

http://stackoverflow.com/questions/110259/which-python-memory-profiler-is-recommended

 But in general: use weak references wherever possible would be my advice.
They not only prevent cycles but will highlight the kinds of bug in your
code that is likely to cause the sort of problem you are worried about.

[toc] | [prev] | [next] | [standalone]

#64505

From	Asaf Las <roegltd@gmail.com>
Date	2014-01-22 07:51 -0800
Message-ID	<a3f15bda-af5b-430d-a376-e0c8aacdab47@googlegroups.com>
In reply to	#64486

On Wednesday, January 22, 2014 10:43:39 AM UTC+2, Nicholas wrote:
> There are some good tools recommended here: 
> http://stackoverflow.com/questions/110259/which-python-memory-profiler-is-recommended
>  But in general: use weak references wherever possible would be 
> my advice. They not only prevent cycles but will highlight the 
> kinds of bug in your code that is likely to cause the sort of 
> problem you are worried about.

Thanks! i will look into these!

[toc] | [prev] | [next] | [standalone]

#64487

From	"Frank Millman" <frank@chagford.com>
Date	2014-01-22 10:56 +0200
Message-ID	<mailman.5832.1390381001.18130.python-list@python.org>
In reply to	#64482

"Asaf Las" <roegltd@gmail.com> wrote in message 
news:58c541ab-c6e1-45a8-b03a-8597ed7ecb48@googlegroups.com...
>
> Yes the question was about CPython. But i am not after CPython leaks
> though detecting these would be good, but my own mistakes leading to
> accumulation of data in mutable structures.
> there will be few processes running python code standalone communicating
> across servers and every activity will be spread over time so
> i have to persistently keep record of activity and remove it later when
> activity is finished.

I had a similar concern. My main worry, which turned out to be well-founded, 
was that I would create an object as a result of some user input, but when 
the user had finished with it, and in theory it could be garbage-collected, 
in practice it would not be due to some obscure circular reference 
somewhere.

For short-running tasks this is not a cause for concern, but for a 
long-running server these can build up over time and end up causing a 
problem.

My solution was to log every time an object was created, with some 
self-identifying piece of information, and then log when it was deleted, 
with the same identifier. After running the program for a while I could then 
analyse the log and ensure that each creation had a corresponding deletion.

The tricky bit was logging the deletion. It is a known gotcha in Python that 
you cannot rely on the __del__ method, and indeed it can cause a circular 
reference in itself which prevents the object from being garbage-collected. 
I found a solution somewhere which explained the use of a 'delwatcher' 
class. This is how it works -

class MainObject:
    def __init__(self, identifier):
        self._del = delwatcher('MainObject', identifier)

class delwatcher:
    def __init__(self, obj_type, identifier):
        self.obj_type = obj_type
        self.identifier = identifier
        log('{}: id={} created'.format(self.obj_type, self.identifier))
    def __del__(self):
        log('{}: id={} deleted'.format(self.obj_type, self.identifier))

In this case calling __del__() is safe, as no reference to the main object 
is held.

If you do find that an object is not being deleted, it is then 
trial-and-error to find the problem and fix it. It is probably a circular 
reference

HTH

Frank Millman

[toc] | [prev] | [next] | [standalone]

#64506

From	Asaf Las <roegltd@gmail.com>
Date	2014-01-22 08:03 -0800
Message-ID	<9729ddaa-5976-4e53-8584-6198b47b6789@googlegroups.com>
In reply to	#64487

On Wednesday, January 22, 2014 10:56:30 AM UTC+2, Frank Millman wrote:
> 
> class MainObject:
>     def __init__(self, identifier):
>          self._del = delwatcher('MainObject', identifier)
> class delwatcher:
>     def __init__(self, obj_type, identifier):
>         self.obj_type = obj_type
>         self.identifier = identifier
>         log('{}: id={} created'.format(self.obj_type, self.identifier))
>     def __del__(self):
>         log('{}: id={} deleted'.format(self.obj_type, self.identifier))
> If you do find that an object is not being deleted, it is then 
> trial-and-error to find the problem and fix it. It is probably a circular 
> reference
> 
> Frank Millman

Thanks Frank. Good approach! 

One question - You could do:
class MainObject:
    def __init__(self, identifier):
         self._del = delwatcher(self)
then later 

class delwatcher:
    def __init__(self, tobject):
        self.obj_type = type(tobject)
        self.identifier = id(tobject)
        ...

when creating delwatcher. Was there special reason to not to use them?
is this because of memory is reused when objects are deleted 
and created again so same reference could be for objects created 
in different time slots?

Thanks 

Asaf

[toc] | [prev] | [next] | [standalone]

#64525

From	Dave Angel <davea@davea.name>
Date	2014-01-22 13:40 -0500
Message-ID	<mailman.5847.1390415890.18130.python-list@python.org>
In reply to	#64506

 Asaf Las <roegltd@gmail.com> Wrote in message:
> On Wednesday, January 22, 2014 10:56:30 AM UTC+2, Frank Millman wrote:
>> 
>> class MainObject:
>>     def __init__(self, identifier):
>>          self._del = delwatcher('MainObject', identifier)
>> class delwatcher:
>>     def __init__(self, obj_type, identifier):
>>         self.obj_type = obj_type
>>         self.identifier = identifier
>>         log('{}: id={} created'.format(self.obj_type, self.identifier))
>>     def __del__(self):
>>         log('{}: id={} deleted'.format(self.obj_type, self.identifier))
>> If you do find that an object is not being deleted, it is then 
>> trial-and-error to find the problem and fix it. It is probably a circular 
>> reference
>> 
>> Frank Millman
> 
> Thanks Frank. Good approach! 
> 
> One question - You could do:
> class MainObject:
>     def __init__(self, identifier):
>          self._del = delwatcher(self)
> then later 
> 
> class delwatcher:
>     def __init__(self, tobject):
>         self.obj_type = type(tobject)
>         self.identifier = id(tobject)
>         ...
> 
> when creating delwatcher. Was there special reason to not to use them?
> is this because of memory is reused when objects are deleted 
> and created again so same reference could be for objects created 
> in different time slots?
> 

I couldn't make sense of most of that.  But an ID only uniquely
 corresponds to an object while that object still exists.  The
 system may,  and will, reuse iD's constantly. 

-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#64572

From	"Frank Millman" <frank@chagford.com>
Date	2014-01-23 07:36 +0200
Message-ID	<mailman.5871.1390455427.18130.python-list@python.org>
In reply to	#64506

"Asaf Las" <roegltd@gmail.com> wrote in message 
news:9729ddaa-5976-4e53-8584-6198b47b6789@googlegroups.com...
> On Wednesday, January 22, 2014 10:56:30 AM UTC+2, Frank Millman wrote:
>>
>> class MainObject:
>>     def __init__(self, identifier):
>>          self._del = delwatcher('MainObject', identifier)
>> class delwatcher:
>>     def __init__(self, obj_type, identifier):
>>         self.obj_type = obj_type
>>         self.identifier = identifier
>>         log('{}: id={} created'.format(self.obj_type, self.identifier))
>>     def __del__(self):
>>         log('{}: id={} deleted'.format(self.obj_type, self.identifier))
>> If you do find that an object is not being deleted, it is then
>> trial-and-error to find the problem and fix it. It is probably a circular
>> reference
>>
>> Frank Millman
>
> Thanks Frank. Good approach!
>
> One question - You could do:
> class MainObject:
>    def __init__(self, identifier):
>         self._del = delwatcher(self)
> then later
>
> class delwatcher:
>    def __init__(self, tobject):
>        self.obj_type = type(tobject)
>        self.identifier = id(tobject)
>        ...
>
> when creating delwatcher. Was there special reason to not to use them?
> is this because of memory is reused when objects are deleted
> and created again so same reference could be for objects created
> in different time slots?
>

I read Dave's reply, and he is correct in saying that id's are frequently 
re-used in python.

However, in this particular case, I think you are right, it is safe to use 
the id to identify the object. An id can only be re-used if the original 
object is deleted, and that is the whole point of this exercise. We expect 
to see the id come up in a 'created' message, and then the same id appear in 
a 'deleted' message. If this happens, we are not concerned if the same id 
reappears in a subsequent 'created' message.

Frank

[toc] | [prev] | [standalone]

csiph-web

Self healthcheck

Contents

#64475 — Self healthcheck

#64478

#64482

#64486

#64505

#64487

#64506

#64525

#64572