Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #10715 > unrolled thread

RE: python reading file memory cost

Started by张彤 <tzhang@sinap.ac.cn>
First post2011-08-02 19:00 +0800
Last post2011-08-02 19:00 +0800
Articles 1 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  RE: python reading file memory cost 张彤 <tzhang@sinap.ac.cn> - 2011-08-02 19:00 +0800

#10715 — RE: python reading file memory cost

From张彤 <tzhang@sinap.ac.cn>
Date2011-08-02 19:00 +0800
SubjectRE: python reading file memory cost
Message-ID<mailman.1761.1312282920.1164.python-list@python.org>
Thanks Peter! Your explanation is great!
And one more question:
Why it is still keeping the memory even when I del the large array in
interactive python mode?

-----Original Message-----
From: Peter Otten [mailto:__peter__@web.de] 
Sent: Tuesday, August 02, 2011 4:26 PM
To: python-list@python.org
Subject: Re: python reading file memory cost

Chris Rebert wrote:

>> The running result was that read a 500M file consume almost 2GB RAM, 
>> I cannot figure it out, somebody help!
> 
> If you could store the floats themselves, rather than their string 
> representations, that would be more space-efficient. You could then 
> also use the `array` module, which is more space-efficient than lists 
> (http://docs.python.org/library/array.html ). Numpy would also be 
> worth investigating since multidimensional arrays are involved.
> 
> The next obvious question would then be: do you /really/ need /all/ of 
> the data in memory at once?

This is what you (OP) should think about really hard before resorting to the
optimizations mentioned above. Perhaps you can explain what you are doing
with the data once you've loaded it into memory?

> Also, just so you're aware:
> http://docs.python.org/library/sys.html#sys.getsizeof

To give you an idea how memory usage explodes:

>>> line = "1.23 4.56 7.89 0.12\n"
>>> len(line) # size in the file
20
>>> sys.getsizeof(line)
60
>>> formatted = ["%2.6E" % float(x) for x in line.split()]
>>> sys.getsizeof(formatted) + sum(sys.getsizeof(s) for s in formatted)
312



[toc] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web