Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!feeder.news-service.com!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!194.109.133.84.MISMATCH!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.061 X-Spam-Evidence: '*H*': 0.88; '*S*': 0.00; 'everyone!': 0.05; 'ram': 0.05; 'exec': 0.07; 'python': 0.08; 'csv': 0.09; 'iterate': 0.09; 'storing': 0.09; 'subject:python': 0.12; 'subject:file': 0.13; 'wrote:': 0.15; 'module?': 0.16; 'received:192.168.1.40': 0.16; 'subject:memory': 0.16; 'memory': 0.21; 'file,': 0.22; 'header:In- Reply-To:1': 0.22; 'trying': 0.23; 'similar,': 0.23; 'monitor': 0.23; '(or': 0.25; '(and': 0.27; 'guess': 0.28; 'knowing': 0.28; 'object': 0.30; 'times.': 0.30; 'this.': 0.31; 'actually': 0.33; 'to:addr:python-list': 0.34; 'header:User-Agent:1': 0.34; "can't": 0.34; 'things': 0.34; 'probably': 0.35; '(for': 0.36; 'hold': 0.36; 'file': 0.36; 'doing': 0.37; 'using': 0.37; 'received:192': 0.38; 'subject:: ': 0.38; 'holding': 0.38; 'hello,': 0.38; 'two': 0.38; 'perhaps': 0.39; 'received:192.168.1': 0.39; 'data': 0.39; 'help': 0.39; 'to:addr:python.org': 0.39; 'skip:. 10': 0.40; 'you.': 0.62; 'order': 0.62; 'increase': 0.64; 'cost': 0.65; 'received:62': 0.67; 'memory,': 0.67; 'records': 0.73; 'time?': 0.73; 'cost,': 0.84; 'from:addr:t': 0.84; 'gb)': 0.84; 'meanwhile,': 0.84; 'questions:': 0.84; 'streams': 0.84; 'zhang': 0.84; 'subject:cost': 0.91; 'tremendous': 0.91 Date: Mon, 01 Aug 2011 17:29:08 +0200 From: Thomas Jollans User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20110628 Thunderbird/5.0 MIME-Version: 1.0 To: python-list@python.org Subject: Re: python reading file memory cost References: <000f01cc505c$74e01e80$5ea05b80$@com> In-Reply-To: <000f01cc505c$74e01e80$5ea05b80$@com> X-Enigmail-Version: 1.2 OpenPGP: id=5C8691ED Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 32 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1312212539 news.xs4all.nl 23844 [2001:888:2000:d::a6]:49957 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:10672 On 01/08/11 17:05, Tong Zhang wrote: > Hello, everyone! > > > > I am trying to read a little big txt file (~1 GB) by python2.7, what I > want to do is to read these data into a array, meanwhile, I monitor the > memory cost, I found that it cost more than 6 GB RAM! So I have two > questions: > > 1: How to estimate memory cost before exec python script? > > 2: How to save RAM while do not increase exec time? How are you reading the file? If you are using file_object.read(), .readlines(), or similar, to read the whole file at once: don't. This is a tremendous waste of memory, and probably slows things down. Usually, the best approach is to iterate over the file object itself (for line in file_object: # process line) Without knowing what you're doing with the data (or, what "data" is here), we can't really do much to help you. My best guess would be that you're unnecessarily storing the data multiple times. Perhaps you can use the csv module? Do you really need to hold all the data in memory all the time, or can you process the data in the order it is in the file, never actually holding more than one (or a few) records in memory? With generators, Python has excellent support for working with streams of data like this. (and it would save you a lot of RAM) - Thomas