Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #48223 > unrolled thread
| Started by | Giorgos Tzampanakis <giorgos.tzampanakis@gmail.com> |
|---|---|
| First post | 2013-06-14 23:04 +0000 |
| Last post | 2013-06-15 09:37 +0000 |
| Articles | 8 — 4 participants |
Back to article view | Back to comp.lang.python
Memory usage steadily going up while pickling objects Giorgos Tzampanakis <giorgos.tzampanakis@gmail.com> - 2013-06-14 23:04 +0000
Re: Memory usage steadily going up while pickling objects Dave Angel <davea@davea.name> - 2013-06-14 21:52 -0400
Re: Memory usage steadily going up while pickling objects Giorgos Tzampanakis <giorgos.tzampanakis@gmail.com> - 2013-06-15 09:41 +0000
Re: Memory usage steadily going up while pickling objects Peter Otten <__peter__@web.de> - 2013-06-15 12:18 +0200
Re: Memory usage steadily going up while pickling objects Giorgos Tzampanakis <giorgos.tzampanakis@gmail.com> - 2013-06-15 11:06 +0000
Re: Memory usage steadily going up while pickling objects dieter <dieter@handshake.de> - 2013-06-16 08:26 +0200
Re: Memory usage steadily going up while pickling objects Peter Otten <__peter__@web.de> - 2013-06-15 08:37 +0200
Re: Memory usage steadily going up while pickling objects Giorgos Tzampanakis <giorgos.tzampanakis@gmail.com> - 2013-06-15 09:37 +0000
| From | Giorgos Tzampanakis <giorgos.tzampanakis@gmail.com> |
|---|---|
| Date | 2013-06-14 23:04 +0000 |
| Subject | Memory usage steadily going up while pickling objects |
| Message-ID | <slrnkrn8ki.6mc.giorgos.tzampanakis@brilliance.eternal-september.org> |
I have a program that saves lots (about 800k) objects into a shelve database (I'm using sqlite3dbm for this since all the default python dbm packages seem to be unreliable and effectively unusable, but this is another discussion). The process takes about 10-15 minutes. During that time I see memory usage steadily rising, sometimes resulting in a MemoryError. Now, there is a chance that my code is keeping unneeded references to the stored objects, but I have debugged it thoroughly and haven't found any. So I'm beginning to suspect that the pickle module might be keeping an internal cache of objects being pickled. Is this true? -- Real (i.e. statistical) tennis and snooker player rankings and ratings: http://www.statsfair.com/
[toc] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-06-14 21:52 -0400 |
| Message-ID | <mailman.3352.1371261146.3114.python-list@python.org> |
| In reply to | #48223 |
On 06/14/2013 07:04 PM, Giorgos Tzampanakis wrote: > I have a program that saves lots (about 800k) objects into a shelve > database (I'm using sqlite3dbm for this since all the default python dbm > packages seem to be unreliable and effectively unusable, but this is > another discussion). > > The process takes about 10-15 minutes. During that time I see memory usage > steadily rising, sometimes resulting in a MemoryError. Now, there is a > chance that my code is keeping unneeded references to the stored objects, > but I have debugged it thoroughly and haven't found any. > > So I'm beginning to suspect that the pickle module might be keeping an > internal cache of objects being pickled. Is this true? > You can learn quite a bit by using the sys.getrefcount() function. If you think a variable has only one reference (if it had none, it'd be very hard to test), and you call sys.getrefcount(), you can check if your assumption is right. Note that if the object is part of a complex object, there may be several mutual references, so the count may be more than you expect. But you can still check the count before and after calling the pickle stuff, and see if it has increased. Note that even if it has not, that doesn't prove you don't have a problem. Could the problem be the sqlite stuff? Can you disable that part of the logic, and see whether just creating the data still produces the leak? -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | Giorgos Tzampanakis <giorgos.tzampanakis@gmail.com> |
|---|---|
| Date | 2013-06-15 09:41 +0000 |
| Message-ID | <slrnkrodvp.6mc.giorgos.tzampanakis@brilliance.eternal-september.org> |
| In reply to | #48241 |
On 2013-06-15, Dave Angel wrote: > On 06/14/2013 07:04 PM, Giorgos Tzampanakis wrote: >> I have a program that saves lots (about 800k) objects into a shelve >> database (I'm using sqlite3dbm for this since all the default python dbm >> packages seem to be unreliable and effectively unusable, but this is >> another discussion). >> >> The process takes about 10-15 minutes. During that time I see memory usage >> steadily rising, sometimes resulting in a MemoryError. Now, there is a >> chance that my code is keeping unneeded references to the stored objects, >> but I have debugged it thoroughly and haven't found any. >> >> So I'm beginning to suspect that the pickle module might be keeping an >> internal cache of objects being pickled. Is this true? >> > > You can learn quite a bit by using the sys.getrefcount() function. If > you think a variable has only one reference (if it had none, it'd be > very hard to test), and you call sys.getrefcount(), you can check if > your assumption is right. > > Note that if the object is part of a complex object, there may be > several mutual references, so the count may be more than you expect. > But you can still check the count before and after calling the pickle > stuff, and see if it has increased. > > Note that even if it has not, that doesn't prove you don't have a problem. > > Could the problem be the sqlite stuff? Can you disable that part of the > logic, and see whether just creating the data still produces the leak? I tried both with the standard shelve and with sqlite3dbm and sys.getrefcount() of the stored object (and any of the objects it references) does not seem to go up after it's stored... I also tried closing the shelve after storing each object and re-opening it right away with the "n" flag (which instructs it to start with a new, empty database) and the memory still rises with the same rate. So it seems that the pickle module does keep some internal cache or something like that. I don't want to resort to reading the pickle source code, but it seems I will have to... -- Real (i.e. statistical) tennis and snooker player rankings and ratings: http://www.statsfair.com/
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2013-06-15 12:18 +0200 |
| Message-ID | <mailman.3369.1371291493.3114.python-list@python.org> |
| In reply to | #48275 |
Giorgos Tzampanakis wrote: > So it seems that the pickle module does keep some internal cache or > something like that. I don't think there's a global cache. The Pickler/Unpickler has a per- instance cache (the memo dict) that you can clear with the clear_memo() method, but that doesn't matter here. > I don't want to resort to reading the pickle source > code, but it seems I will have to... I'd look somewhere else...
[toc] | [prev] | [next] | [standalone]
| From | Giorgos Tzampanakis <giorgos.tzampanakis@gmail.com> |
|---|---|
| Date | 2013-06-15 11:06 +0000 |
| Message-ID | <slrnkroiuk.6mc.giorgos.tzampanakis@brilliance.eternal-september.org> |
| In reply to | #48279 |
On 2013-06-15, Peter Otten wrote: > Giorgos Tzampanakis wrote: > >> So it seems that the pickle module does keep some internal cache or >> something like that. > > I don't think there's a global cache. The Pickler/Unpickler has a per- > instance cache (the memo dict) that you can clear with the clear_memo() > method, but that doesn't matter here. > >> I don't want to resort to reading the pickle source >> code, but it seems I will have to... > > I'd look somewhere else... > Indeed. The problem was in my code after all. Still, thanks to all for the memory debugging tips! -- Real (i.e. statistical) tennis and snooker player rankings and ratings: http://www.statsfair.com/
[toc] | [prev] | [next] | [standalone]
| From | dieter <dieter@handshake.de> |
|---|---|
| Date | 2013-06-16 08:26 +0200 |
| Message-ID | <mailman.3429.1371364011.3114.python-list@python.org> |
| In reply to | #48275 |
Giorgos Tzampanakis <giorgos.tzampanakis@gmail.com> writes: > ... > So it seems that the pickle module does keep some internal cache or > something like that. This is highly unlikely: the "ZODB" (Zope object database) uses pickle (actually, it is "cPickle", the "C" implementation of the "pickle" module) for serialization. The "ZODB" is used in long running Zope processes. Should pickling cause significant memory leackage, this would have been observed (and reported).
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2013-06-15 08:37 +0200 |
| Message-ID | <mailman.3360.1371278248.3114.python-list@python.org> |
| In reply to | #48223 |
Giorgos Tzampanakis wrote:
> I have a program that saves lots (about 800k) objects into a shelve
> database (I'm using sqlite3dbm for this since all the default python dbm
> packages seem to be unreliable and effectively unusable, but this is
> another discussion).
>
> The process takes about 10-15 minutes. During that time I see memory usage
> steadily rising, sometimes resulting in a MemoryError. Now, there is a
> chance that my code is keeping unneeded references to the stored objects,
> but I have debugged it thoroughly and haven't found any.
>
> So I'm beginning to suspect that the pickle module might be keeping an
> internal cache of objects being pickled. Is this true?
Pickler/Unpickler objects use a cache to maintain object identity, but at
least shelve in the standard library uses a new Pickler/Unpickler for each
set/get operation.
I don't have sqlite3dbm, but you can try the following:
>>> import shelve
>>> class A: pass
...
>>> a = A()
>>> s = shelve.open("tmp.shelve")
>>> s["x"] = s["y"] = a
>>> s["x"] is s["y"]
False
If you are getting True there must be a cache. One way to enable a cache
yourself is writeback:
>>> s = shelve.open("tmp.shelve", writeback=True)
>>> s["x"] = s["y"] = a
>>> s["x"] is s["y"]
True
You didn't do that, I guess?
[toc] | [prev] | [next] | [standalone]
| From | Giorgos Tzampanakis <giorgos.tzampanakis@gmail.com> |
|---|---|
| Date | 2013-06-15 09:37 +0000 |
| Message-ID | <slrnkrodmu.6mc.giorgos.tzampanakis@brilliance.eternal-september.org> |
| In reply to | #48255 |
On 2013-06-15, Peter Otten wrote:
> Giorgos Tzampanakis wrote:
>
>> I have a program that saves lots (about 800k) objects into a shelve
>> database (I'm using sqlite3dbm for this since all the default python dbm
>> packages seem to be unreliable and effectively unusable, but this is
>> another discussion).
>>
>> The process takes about 10-15 minutes. During that time I see memory usage
>> steadily rising, sometimes resulting in a MemoryError. Now, there is a
>> chance that my code is keeping unneeded references to the stored objects,
>> but I have debugged it thoroughly and haven't found any.
>>
>> So I'm beginning to suspect that the pickle module might be keeping an
>> internal cache of objects being pickled. Is this true?
>
> Pickler/Unpickler objects use a cache to maintain object identity, but at
> least shelve in the standard library uses a new Pickler/Unpickler for each
> set/get operation.
>
> I don't have sqlite3dbm, but you can try the following:
>
>>>> import shelve
>>>> class A: pass
> ...
>>>> a = A()
>>>> s = shelve.open("tmp.shelve")
>>>> s["x"] = s["y"] = a
>>>> s["x"] is s["y"]
> False
This returns False in my case.
> If you are getting True there must be a cache. One way to enable a cache
> yourself is writeback:
No, I haven't enabled writeback.
--
Real (i.e. statistical) tennis and snooker player rankings and ratings:
http://www.statsfair.com/
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web