Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #10672

Re: python reading file memory cost

Path csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!feeder.news-service.com!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!194.109.133.84.MISMATCH!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <t@jollybox.de>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.061
X-Spam-Evidence '*H*': 0.88; '*S*': 0.00; 'everyone!': 0.05; 'ram': 0.05; 'exec': 0.07; 'python': 0.08; 'csv': 0.09; 'iterate': 0.09; 'storing': 0.09; 'subject:python': 0.12; 'subject:file': 0.13; 'wrote:': 0.15; 'module?': 0.16; 'received:192.168.1.40': 0.16; 'subject:memory': 0.16; 'memory': 0.21; 'file,': 0.22; 'header:In- Reply-To:1': 0.22; 'trying': 0.23; 'similar,': 0.23; 'monitor': 0.23; '(or': 0.25; '(and': 0.27; 'guess': 0.28; 'knowing': 0.28; 'object': 0.30; 'times.': 0.30; 'this.': 0.31; 'actually': 0.33; 'to:addr:python-list': 0.34; 'header:User-Agent:1': 0.34; "can't": 0.34; 'things': 0.34; 'probably': 0.35; '(for': 0.36; 'hold': 0.36; 'file': 0.36; 'doing': 0.37; 'using': 0.37; 'received:192': 0.38; 'subject:: ': 0.38; 'holding': 0.38; 'hello,': 0.38; 'two': 0.38; 'perhaps': 0.39; 'received:192.168.1': 0.39; 'data': 0.39; 'help': 0.39; 'to:addr:python.org': 0.39; 'skip:. 10': 0.40; 'you.': 0.62; 'order': 0.62; 'increase': 0.64; 'cost': 0.65; 'received:62': 0.67; 'memory,': 0.67; 'records': 0.73; 'time?': 0.73; 'cost,': 0.84; 'from:addr:t': 0.84; 'gb)': 0.84; 'meanwhile,': 0.84; 'questions:': 0.84; 'streams': 0.84; 'zhang': 0.84; 'subject:cost': 0.91; 'tremendous': 0.91
Date Mon, 01 Aug 2011 17:29:08 +0200
From Thomas Jollans <t@jollybox.de>
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20110628 Thunderbird/5.0
MIME-Version 1.0
To python-list@python.org
Subject Re: python reading file memory cost
References <000f01cc505c$74e01e80$5ea05b80$@com>
In-Reply-To <000f01cc505c$74e01e80$5ea05b80$@com>
X-Enigmail-Version 1.2
OpenPGP id=5C8691ED
Content-Type text/plain; charset=ISO-8859-1
Content-Transfer-Encoding 7bit
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.1729.1312212539.1164.python-list@python.org> (permalink)
Lines 32
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1312212539 news.xs4all.nl 23844 [2001:888:2000:d::a6]:49957
X-Complaints-To abuse@xs4all.nl
Xref x330-a1.tempe.blueboxinc.net comp.lang.python:10672

Show key headers only | View raw


On 01/08/11 17:05, Tong Zhang wrote:
> Hello, everyone!
> 
>  
> 
> I am trying to read a little big txt file (~1 GB) by python2.7, what I
> want to do is to read these data into a array, meanwhile, I monitor the
> memory cost, I found that it cost more than 6 GB RAM! So I have two
> questions:
> 
> 1: How to estimate memory cost before exec python script?
> 
> 2: How to save RAM while do not increase exec time?

How are you reading the file? If you are using file_object.read(),
.readlines(), or similar, to read the whole file at once: don't. This is
a tremendous waste of memory, and probably slows things down. Usually,
the best approach is to iterate over the file object itself (for line in
file_object: # process line)

Without knowing what you're doing with the data (or, what "data" is
here), we can't really do much to help you. My best guess would be that
you're unnecessarily storing the data multiple times.

Perhaps you can use the csv module? Do you really need to hold all the
data in memory all the time, or can you process the data in the order it
is in the file, never actually holding more than one (or a few) records
in memory? With generators, Python has excellent support for working
with streams of data like this. (and it would save you a lot of RAM)

 - Thomas

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: python reading file memory cost Thomas Jollans <t@jollybox.de> - 2011-08-01 17:29 +0200

csiph-web