Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #48223 > unrolled thread

Memory usage steadily going up while pickling objects

Started byGiorgos Tzampanakis <giorgos.tzampanakis@gmail.com>
First post2013-06-14 23:04 +0000
Last post2013-06-15 09:37 +0000
Articles 8 — 4 participants

Back to article view | Back to comp.lang.python


Contents

  Memory usage steadily going up while pickling objects Giorgos Tzampanakis <giorgos.tzampanakis@gmail.com> - 2013-06-14 23:04 +0000
    Re: Memory usage steadily going up while pickling objects Dave Angel <davea@davea.name> - 2013-06-14 21:52 -0400
      Re: Memory usage steadily going up while pickling objects Giorgos Tzampanakis <giorgos.tzampanakis@gmail.com> - 2013-06-15 09:41 +0000
        Re: Memory usage steadily going up while pickling objects Peter Otten <__peter__@web.de> - 2013-06-15 12:18 +0200
          Re: Memory usage steadily going up while pickling objects Giorgos Tzampanakis <giorgos.tzampanakis@gmail.com> - 2013-06-15 11:06 +0000
        Re: Memory usage steadily going up while pickling objects dieter <dieter@handshake.de> - 2013-06-16 08:26 +0200
    Re: Memory usage steadily going up while pickling objects Peter Otten <__peter__@web.de> - 2013-06-15 08:37 +0200
      Re: Memory usage steadily going up while pickling objects Giorgos Tzampanakis <giorgos.tzampanakis@gmail.com> - 2013-06-15 09:37 +0000

#48223 — Memory usage steadily going up while pickling objects

FromGiorgos Tzampanakis <giorgos.tzampanakis@gmail.com>
Date2013-06-14 23:04 +0000
SubjectMemory usage steadily going up while pickling objects
Message-ID<slrnkrn8ki.6mc.giorgos.tzampanakis@brilliance.eternal-september.org>
I have a program that saves lots (about 800k) objects into a shelve
database (I'm using sqlite3dbm for this since all the default python dbm
packages seem to be unreliable and effectively unusable, but this is
another discussion).

The process takes about 10-15 minutes. During that time I see memory usage
steadily rising, sometimes resulting in a MemoryError. Now, there is a
chance that my code is keeping unneeded references to the stored objects,
but I have debugged it thoroughly and haven't found any.

So I'm beginning to suspect that the pickle module might be keeping an
internal cache of objects being pickled. Is this true?

-- 
Real (i.e. statistical) tennis and snooker player rankings and ratings:
http://www.statsfair.com/ 

[toc] | [next] | [standalone]


#48241

FromDave Angel <davea@davea.name>
Date2013-06-14 21:52 -0400
Message-ID<mailman.3352.1371261146.3114.python-list@python.org>
In reply to#48223
On 06/14/2013 07:04 PM, Giorgos Tzampanakis wrote:
> I have a program that saves lots (about 800k) objects into a shelve
> database (I'm using sqlite3dbm for this since all the default python dbm
> packages seem to be unreliable and effectively unusable, but this is
> another discussion).
>
> The process takes about 10-15 minutes. During that time I see memory usage
> steadily rising, sometimes resulting in a MemoryError. Now, there is a
> chance that my code is keeping unneeded references to the stored objects,
> but I have debugged it thoroughly and haven't found any.
>
> So I'm beginning to suspect that the pickle module might be keeping an
> internal cache of objects being pickled. Is this true?
>

You can learn quite a bit by using the  sys.getrefcount() function.  If 
you think a variable has only one reference (if it had none, it'd be 
very hard to test), and you call sys.getrefcount(), you can check if 
your assumption is right.

Note that if the object is part of a complex object, there may be 
several mutual references, so the count may be more than you expect. 
But you can still check the count before and after calling the pickle 
stuff, and see if it has increased.

Note that even if it has not, that doesn't prove you don't have a problem.

Could the problem be the sqlite stuff?  Can you disable that part of the 
logic, and see whether just creating the data still produces the leak?


-- 
DaveA

[toc] | [prev] | [next] | [standalone]


#48275

FromGiorgos Tzampanakis <giorgos.tzampanakis@gmail.com>
Date2013-06-15 09:41 +0000
Message-ID<slrnkrodvp.6mc.giorgos.tzampanakis@brilliance.eternal-september.org>
In reply to#48241
On 2013-06-15, Dave Angel wrote:

> On 06/14/2013 07:04 PM, Giorgos Tzampanakis wrote:
>> I have a program that saves lots (about 800k) objects into a shelve
>> database (I'm using sqlite3dbm for this since all the default python dbm
>> packages seem to be unreliable and effectively unusable, but this is
>> another discussion).
>>
>> The process takes about 10-15 minutes. During that time I see memory usage
>> steadily rising, sometimes resulting in a MemoryError. Now, there is a
>> chance that my code is keeping unneeded references to the stored objects,
>> but I have debugged it thoroughly and haven't found any.
>>
>> So I'm beginning to suspect that the pickle module might be keeping an
>> internal cache of objects being pickled. Is this true?
>>
>
> You can learn quite a bit by using the  sys.getrefcount() function.  If 
> you think a variable has only one reference (if it had none, it'd be 
> very hard to test), and you call sys.getrefcount(), you can check if 
> your assumption is right.
>
> Note that if the object is part of a complex object, there may be 
> several mutual references, so the count may be more than you expect. 
> But you can still check the count before and after calling the pickle 
> stuff, and see if it has increased.
>
> Note that even if it has not, that doesn't prove you don't have a problem.
>
> Could the problem be the sqlite stuff?  Can you disable that part of the 
> logic, and see whether just creating the data still produces the leak?

I tried both with the standard shelve and with sqlite3dbm and
sys.getrefcount() of the stored object (and any of the objects it
references) does not seem to go up after it's stored... I also tried
closing the shelve after storing each object and re-opening it right away
with the "n" flag (which instructs it to start with a new, empty database)
and the memory still rises with the same rate.

So it seems that the pickle module does keep some internal cache or
something like that. I don't want to resort to reading the pickle source
code, but it seems I will have to...

-- 
Real (i.e. statistical) tennis and snooker player rankings and ratings:
http://www.statsfair.com/ 

[toc] | [prev] | [next] | [standalone]


#48279

FromPeter Otten <__peter__@web.de>
Date2013-06-15 12:18 +0200
Message-ID<mailman.3369.1371291493.3114.python-list@python.org>
In reply to#48275
Giorgos Tzampanakis wrote:

> So it seems that the pickle module does keep some internal cache or
> something like that. 

I don't think there's a global cache. The Pickler/Unpickler has a per-
instance cache (the memo dict) that you can clear with the clear_memo() 
method, but that doesn't matter here.

> I don't want to resort to reading the pickle source
> code, but it seems I will have to...
 
I'd look somewhere else...

[toc] | [prev] | [next] | [standalone]


#48283

FromGiorgos Tzampanakis <giorgos.tzampanakis@gmail.com>
Date2013-06-15 11:06 +0000
Message-ID<slrnkroiuk.6mc.giorgos.tzampanakis@brilliance.eternal-september.org>
In reply to#48279
On 2013-06-15, Peter Otten wrote:

> Giorgos Tzampanakis wrote:
>
>> So it seems that the pickle module does keep some internal cache or
>> something like that. 
>
> I don't think there's a global cache. The Pickler/Unpickler has a per-
> instance cache (the memo dict) that you can clear with the clear_memo() 
> method, but that doesn't matter here.
>
>> I don't want to resort to reading the pickle source
>> code, but it seems I will have to...
>  
> I'd look somewhere else...
>

Indeed. The problem was in my code after all. Still, thanks to all for the
memory debugging tips!


-- 
Real (i.e. statistical) tennis and snooker player rankings and ratings:
http://www.statsfair.com/ 

[toc] | [prev] | [next] | [standalone]


#48415

Fromdieter <dieter@handshake.de>
Date2013-06-16 08:26 +0200
Message-ID<mailman.3429.1371364011.3114.python-list@python.org>
In reply to#48275
Giorgos Tzampanakis <giorgos.tzampanakis@gmail.com> writes:
> ...
> So it seems that the pickle module does keep some internal cache or
> something like that.

This is highly unlikely: the "ZODB" (Zope object database)
uses pickle (actually, it is "cPickle", the "C" implementation
of the "pickle" module) for serialization. The "ZODB" is
used in long running Zope processes. Should pickling cause
significant memory leackage, this would have been observed
(and reported).

[toc] | [prev] | [next] | [standalone]


#48255

FromPeter Otten <__peter__@web.de>
Date2013-06-15 08:37 +0200
Message-ID<mailman.3360.1371278248.3114.python-list@python.org>
In reply to#48223
Giorgos Tzampanakis wrote:

> I have a program that saves lots (about 800k) objects into a shelve
> database (I'm using sqlite3dbm for this since all the default python dbm
> packages seem to be unreliable and effectively unusable, but this is
> another discussion).
> 
> The process takes about 10-15 minutes. During that time I see memory usage
> steadily rising, sometimes resulting in a MemoryError. Now, there is a
> chance that my code is keeping unneeded references to the stored objects,
> but I have debugged it thoroughly and haven't found any.
> 
> So I'm beginning to suspect that the pickle module might be keeping an
> internal cache of objects being pickled. Is this true?

Pickler/Unpickler objects use a cache to maintain object identity, but at 
least shelve in the standard library uses a new Pickler/Unpickler for each 
set/get operation. 

I don't have sqlite3dbm, but you can try the following:

>>> import shelve
>>> class A: pass
... 
>>> a = A()
>>> s = shelve.open("tmp.shelve")
>>> s["x"] = s["y"] = a
>>> s["x"] is s["y"]
False

If you are getting True there must be a cache. One way to enable a cache 
yourself is writeback:

>>> s = shelve.open("tmp.shelve", writeback=True)
>>> s["x"] = s["y"] = a
>>> s["x"] is s["y"]
True

You didn't do that, I guess?

[toc] | [prev] | [next] | [standalone]


#48273

FromGiorgos Tzampanakis <giorgos.tzampanakis@gmail.com>
Date2013-06-15 09:37 +0000
Message-ID<slrnkrodmu.6mc.giorgos.tzampanakis@brilliance.eternal-september.org>
In reply to#48255
On 2013-06-15, Peter Otten wrote:

> Giorgos Tzampanakis wrote:
>
>> I have a program that saves lots (about 800k) objects into a shelve
>> database (I'm using sqlite3dbm for this since all the default python dbm
>> packages seem to be unreliable and effectively unusable, but this is
>> another discussion).
>> 
>> The process takes about 10-15 minutes. During that time I see memory usage
>> steadily rising, sometimes resulting in a MemoryError. Now, there is a
>> chance that my code is keeping unneeded references to the stored objects,
>> but I have debugged it thoroughly and haven't found any.
>> 
>> So I'm beginning to suspect that the pickle module might be keeping an
>> internal cache of objects being pickled. Is this true?
>
> Pickler/Unpickler objects use a cache to maintain object identity, but at 
> least shelve in the standard library uses a new Pickler/Unpickler for each 
> set/get operation. 
>
> I don't have sqlite3dbm, but you can try the following:
>
>>>> import shelve
>>>> class A: pass
> ... 
>>>> a = A()
>>>> s = shelve.open("tmp.shelve")
>>>> s["x"] = s["y"] = a
>>>> s["x"] is s["y"]
> False

This returns False in my case.

> If you are getting True there must be a cache. One way to enable a cache 
> yourself is writeback:

No, I haven't enabled writeback.


-- 
Real (i.e. statistical) tennis and snooker player rankings and ratings:
http://www.statsfair.com/ 

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web