Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #64475 > unrolled thread
| Started by | Asaf Las <roegltd@gmail.com> |
|---|---|
| First post | 2014-01-21 18:51 -0800 |
| Last post | 2014-01-23 07:36 +0200 |
| Articles | 9 — 5 participants |
Back to article view | Back to comp.lang.python
Self healthcheck Asaf Las <roegltd@gmail.com> - 2014-01-21 18:51 -0800
Re: Self healthcheck Chris Angelico <rosuav@gmail.com> - 2014-01-22 14:08 +1100
Re: Self healthcheck Asaf Las <roegltd@gmail.com> - 2014-01-22 00:18 -0800
Re: Self healthcheck Nicholas Cole <nicholas.cole@gmail.com> - 2014-01-22 08:43 +0000
Re: Self healthcheck Asaf Las <roegltd@gmail.com> - 2014-01-22 07:51 -0800
Re: Self healthcheck "Frank Millman" <frank@chagford.com> - 2014-01-22 10:56 +0200
Re: Self healthcheck Asaf Las <roegltd@gmail.com> - 2014-01-22 08:03 -0800
Re: Self healthcheck Dave Angel <davea@davea.name> - 2014-01-22 13:40 -0500
Re: Self healthcheck "Frank Millman" <frank@chagford.com> - 2014-01-23 07:36 +0200
| From | Asaf Las <roegltd@gmail.com> |
|---|---|
| Date | 2014-01-21 18:51 -0800 |
| Subject | Self healthcheck |
| Message-ID | <0d1fc1a7-c585-45ba-8c1a-0cc468712a48@googlegroups.com> |
Hi When designing long running background process is it feasible to monitor object/memory leakage due to improper programming? If it could be possible to make module which monitor and record trends if alive objects then event can be generated and logged if noof "zombie" objects are to increase in longer run. Would the gc.count() serve for such purpose? Thanks Asaf
[toc] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-01-22 14:08 +1100 |
| Message-ID | <mailman.5829.1390360114.18130.python-list@python.org> |
| In reply to | #64475 |
On Wed, Jan 22, 2014 at 1:51 PM, Asaf Las <roegltd@gmail.com> wrote: > When designing long running background process > is it feasible to monitor object/memory leakage due > to improper programming? I assume you're talking about pure Python code, running under CPython. (If you're writing an extension module, say in C, there are completely different ways to detect reference leaks; and other Pythons will behave slightly differently.) There's no way to detect truly unreferenced objects, because they simply won't exist - not after a garbage collection run, and usually sooner than that. But if you want to find objects that you're somehow not using and yet still have live references to, you'll need to define "using" in a way that makes sense. Generally there aren't many ways that that can happen, so those few places are candidates for a weak reference system (maybe you map a name to the "master object" representing that thing, and you can recreate the master object from the disk, so when nothing else is referring to it, you can happily flush it out - that mapping is a good candidate for weak references). But for most programs, don't bother. CPython is pretty good at keeping track of its own references, so chances are you don't need to - and if you're seeing the process's memory usage going up, it's entirely possible you can neither detect nor correct the problem in Python code (eg heap fragmentation). ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Asaf Las <roegltd@gmail.com> |
|---|---|
| Date | 2014-01-22 00:18 -0800 |
| Message-ID | <58c541ab-c6e1-45a8-b03a-8597ed7ecb48@googlegroups.com> |
| In reply to | #64478 |
On Wednesday, January 22, 2014 5:08:25 AM UTC+2, Chris Angelico wrote: > I assume you're talking about pure Python code, running under CPython. > (If you're writing an extension module, say in C, there are completely > different ways to detect reference leaks; and other Pythons will > behave slightly differently.) There's no way to detect truly > unreferenced objects, because they simply won't exist - not after a > garbage collection run, and usually sooner than that. But if you want > to find objects that you're somehow not using and yet still have live > references to, you'll need to define "using" in a way that makes > sense. Generally there aren't many ways that that can happen, so those > few places are candidates for a weak reference system (maybe you map a > name to the "master object" representing that thing, and you can > recreate the master object from the disk, so when nothing else is > referring to it, you can happily flush it out - that mapping is a good > candidate for weak references). > > But for most programs, don't bother. CPython is pretty good at keeping > track of its own references, so chances are you don't need to - and if > you're seeing the process's memory usage going up, it's entirely > possible you can neither detect nor correct the problem in Python code > (eg heap fragmentation). > ChrisA Hi Chris Yes the question was about CPython. But i am not after CPython leaks though detecting these would be good, but my own mistakes leading to accumulation of data in mutable structures. there will be few processes running python code standalone communicating across servers and every activity will be spread over time so i have to persistently keep record of activity and remove it later when activity is finished. In addition to checking objects directly i would like to analyze also app health indirectly via checking amount of data it holds. let say there is permanently 100 activities per second and typical object count figure is 1000 (in abstract units averaged over long enough time window), so i would check throughput and memory to see if my program is healthy in terms of leaking resources and generate log if it is not. Input to such module will be traffic events (whatever event significant to object creation). So i am looking for proper way to detect memory held by CPython app. And it would be good if memory can be deduced down to object/class name so blamed one could be identified and reported. Thanks Asaf
[toc] | [prev] | [next] | [standalone]
| From | Nicholas Cole <nicholas.cole@gmail.com> |
|---|---|
| Date | 2014-01-22 08:43 +0000 |
| Message-ID | <mailman.5831.1390380226.18130.python-list@python.org> |
| In reply to | #64482 |
[Multipart message — attachments visible in raw view] — view raw
On Wednesday, 22 January 2014, Asaf Las <roegltd@gmail.com> wrote: > On Wednesday, January 22, 2014 5:08:25 AM UTC+2, Chris Angelico wrote: > > I assume you're talking about pure Python code, running under CPython. > > (If you're writing an extension module, say in C, there are completely > > different ways to detect reference leaks; and other Pythons will > > behave slightly differently.) There's no way to detect truly > > unreferenced objects, because they simply won't exist - not after a > > garbage collection run, and usually sooner than that. But if you want > > to find objects that you're somehow not using and yet still have live > > references to, you'll need to define "using" in a way that makes > > sense. Generally there aren't many ways that that can happen, so those > > few places are candidates for a weak reference system (maybe you map a > > name to the "master object" representing that thing, and you can > > recreate the master object from the disk, so when nothing else is > > referring to it, you can happily flush it out - that mapping is a good > > candidate for weak references). > > > > But for most programs, don't bother. CPython is pretty good at keeping > > track of its own references, so chances are you don't need to - and if > > you're seeing the process's memory usage going up, it's entirely > > possible you can neither detect nor correct the problem in Python code > > (eg heap fragmentation). > > ChrisA > > Hi Chris > > Yes the question was about CPython. But i am not after CPython leaks > though detecting these would be good, but my own mistakes leading to > accumulation of data in mutable structures. > there will be few processes running python code standalone communicating > across servers and every activity will be spread over time so > i have to persistently keep record of activity and remove it later when > activity is finished. In addition to checking objects directly i would > like to analyze also app health indirectly via checking amount of data > it holds. let say there is permanently 100 activities per second and > typical object count figure is 1000 (in abstract units averaged over long > enough time window), so i would check throughput and memory to see if my > program is healthy in terms of leaking resources and generate log if it > is not. > Input to such module will be traffic events (whatever event significant > to object creation). > So i am looking for proper way to detect memory held by CPython app. And > it would be good if memory can be deduced down to object/class name so > blamed one could be identified and reported. > > There are some good tools recommended here: http://stackoverflow.com/questions/110259/which-python-memory-profiler-is-recommended But in general: use weak references wherever possible would be my advice. They not only prevent cycles but will highlight the kinds of bug in your code that is likely to cause the sort of problem you are worried about.
[toc] | [prev] | [next] | [standalone]
| From | Asaf Las <roegltd@gmail.com> |
|---|---|
| Date | 2014-01-22 07:51 -0800 |
| Message-ID | <a3f15bda-af5b-430d-a376-e0c8aacdab47@googlegroups.com> |
| In reply to | #64486 |
On Wednesday, January 22, 2014 10:43:39 AM UTC+2, Nicholas wrote: > There are some good tools recommended here: > http://stackoverflow.com/questions/110259/which-python-memory-profiler-is-recommended > But in general: use weak references wherever possible would be > my advice. They not only prevent cycles but will highlight the > kinds of bug in your code that is likely to cause the sort of > problem you are worried about. Thanks! i will look into these!
[toc] | [prev] | [next] | [standalone]
| From | "Frank Millman" <frank@chagford.com> |
|---|---|
| Date | 2014-01-22 10:56 +0200 |
| Message-ID | <mailman.5832.1390381001.18130.python-list@python.org> |
| In reply to | #64482 |
"Asaf Las" <roegltd@gmail.com> wrote in message
news:58c541ab-c6e1-45a8-b03a-8597ed7ecb48@googlegroups.com...
>
> Yes the question was about CPython. But i am not after CPython leaks
> though detecting these would be good, but my own mistakes leading to
> accumulation of data in mutable structures.
> there will be few processes running python code standalone communicating
> across servers and every activity will be spread over time so
> i have to persistently keep record of activity and remove it later when
> activity is finished.
I had a similar concern. My main worry, which turned out to be well-founded,
was that I would create an object as a result of some user input, but when
the user had finished with it, and in theory it could be garbage-collected,
in practice it would not be due to some obscure circular reference
somewhere.
For short-running tasks this is not a cause for concern, but for a
long-running server these can build up over time and end up causing a
problem.
My solution was to log every time an object was created, with some
self-identifying piece of information, and then log when it was deleted,
with the same identifier. After running the program for a while I could then
analyse the log and ensure that each creation had a corresponding deletion.
The tricky bit was logging the deletion. It is a known gotcha in Python that
you cannot rely on the __del__ method, and indeed it can cause a circular
reference in itself which prevents the object from being garbage-collected.
I found a solution somewhere which explained the use of a 'delwatcher'
class. This is how it works -
class MainObject:
def __init__(self, identifier):
self._del = delwatcher('MainObject', identifier)
class delwatcher:
def __init__(self, obj_type, identifier):
self.obj_type = obj_type
self.identifier = identifier
log('{}: id={} created'.format(self.obj_type, self.identifier))
def __del__(self):
log('{}: id={} deleted'.format(self.obj_type, self.identifier))
In this case calling __del__() is safe, as no reference to the main object
is held.
If you do find that an object is not being deleted, it is then
trial-and-error to find the problem and fix it. It is probably a circular
reference
HTH
Frank Millman
[toc] | [prev] | [next] | [standalone]
| From | Asaf Las <roegltd@gmail.com> |
|---|---|
| Date | 2014-01-22 08:03 -0800 |
| Message-ID | <9729ddaa-5976-4e53-8584-6198b47b6789@googlegroups.com> |
| In reply to | #64487 |
On Wednesday, January 22, 2014 10:56:30 AM UTC+2, Frank Millman wrote:
>
> class MainObject:
> def __init__(self, identifier):
> self._del = delwatcher('MainObject', identifier)
> class delwatcher:
> def __init__(self, obj_type, identifier):
> self.obj_type = obj_type
> self.identifier = identifier
> log('{}: id={} created'.format(self.obj_type, self.identifier))
> def __del__(self):
> log('{}: id={} deleted'.format(self.obj_type, self.identifier))
> If you do find that an object is not being deleted, it is then
> trial-and-error to find the problem and fix it. It is probably a circular
> reference
>
> Frank Millman
Thanks Frank. Good approach!
One question - You could do:
class MainObject:
def __init__(self, identifier):
self._del = delwatcher(self)
then later
class delwatcher:
def __init__(self, tobject):
self.obj_type = type(tobject)
self.identifier = id(tobject)
...
when creating delwatcher. Was there special reason to not to use them?
is this because of memory is reused when objects are deleted
and created again so same reference could be for objects created
in different time slots?
Thanks
Asaf
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2014-01-22 13:40 -0500 |
| Message-ID | <mailman.5847.1390415890.18130.python-list@python.org> |
| In reply to | #64506 |
Asaf Las <roegltd@gmail.com> Wrote in message:
> On Wednesday, January 22, 2014 10:56:30 AM UTC+2, Frank Millman wrote:
>>
>> class MainObject:
>> def __init__(self, identifier):
>> self._del = delwatcher('MainObject', identifier)
>> class delwatcher:
>> def __init__(self, obj_type, identifier):
>> self.obj_type = obj_type
>> self.identifier = identifier
>> log('{}: id={} created'.format(self.obj_type, self.identifier))
>> def __del__(self):
>> log('{}: id={} deleted'.format(self.obj_type, self.identifier))
>> If you do find that an object is not being deleted, it is then
>> trial-and-error to find the problem and fix it. It is probably a circular
>> reference
>>
>> Frank Millman
>
> Thanks Frank. Good approach!
>
> One question - You could do:
> class MainObject:
> def __init__(self, identifier):
> self._del = delwatcher(self)
> then later
>
> class delwatcher:
> def __init__(self, tobject):
> self.obj_type = type(tobject)
> self.identifier = id(tobject)
> ...
>
> when creating delwatcher. Was there special reason to not to use them?
> is this because of memory is reused when objects are deleted
> and created again so same reference could be for objects created
> in different time slots?
>
I couldn't make sense of most of that. But an ID only uniquely
corresponds to an object while that object still exists. The
system may, and will, reuse iD's constantly.
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | "Frank Millman" <frank@chagford.com> |
|---|---|
| Date | 2014-01-23 07:36 +0200 |
| Message-ID | <mailman.5871.1390455427.18130.python-list@python.org> |
| In reply to | #64506 |
"Asaf Las" <roegltd@gmail.com> wrote in message
news:9729ddaa-5976-4e53-8584-6198b47b6789@googlegroups.com...
> On Wednesday, January 22, 2014 10:56:30 AM UTC+2, Frank Millman wrote:
>>
>> class MainObject:
>> def __init__(self, identifier):
>> self._del = delwatcher('MainObject', identifier)
>> class delwatcher:
>> def __init__(self, obj_type, identifier):
>> self.obj_type = obj_type
>> self.identifier = identifier
>> log('{}: id={} created'.format(self.obj_type, self.identifier))
>> def __del__(self):
>> log('{}: id={} deleted'.format(self.obj_type, self.identifier))
>> If you do find that an object is not being deleted, it is then
>> trial-and-error to find the problem and fix it. It is probably a circular
>> reference
>>
>> Frank Millman
>
> Thanks Frank. Good approach!
>
> One question - You could do:
> class MainObject:
> def __init__(self, identifier):
> self._del = delwatcher(self)
> then later
>
> class delwatcher:
> def __init__(self, tobject):
> self.obj_type = type(tobject)
> self.identifier = id(tobject)
> ...
>
> when creating delwatcher. Was there special reason to not to use them?
> is this because of memory is reused when objects are deleted
> and created again so same reference could be for objects created
> in different time slots?
>
I read Dave's reply, and he is correct in saying that id's are frequently
re-used in python.
However, in this particular case, I think you are right, it is safe to use
the id to identify the object. An id can only be re-used if the original
object is deleted, and that is the whole point of this exercise. We expect
to see the id come up in a 'created' message, and then the same id appear in
a 'deleted' message. If this happens, we are not concerned if the same id
reappears in a subsequent 'created' message.
Frank
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web