Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #10693

Re: python reading file memory cost

Path csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!feeder.news-service.com!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <warriorlance@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.016
X-Spam-Evidence '*H*': 0.97; '*S*': 0.00; 'python': 0.08; 'consume': 0.09; 'eof': 0.09; 'subject:python': 0.12; 'skip:f 30': 0.13; 'subject:file': 0.13; 'creat': 0.16; 'i.e': 0.16; 'pause': 0.16; 'subject:memory': 0.16; 'much!': 0.16; 'this:': 0.16; 'thanks!': 0.17; 'figure': 0.21; 'header:In-Reply-To:1': 0.22; 'parse': 0.23; 'extract': 0.25; 'skip:b 20': 0.26; 'somebody': 0.28; 'cc:addr:gmail.com': 0.30; 'thanks': 0.31; 'print': 0.32; 'list': 0.32; 'file.': 0.32; 'break': 0.33; 'to:addr:python-list': 0.34; 'running': 0.35; 'file': 0.36; 'cc:2**1': 0.37; 'received:google.com': 0.38; 'received:209.85': 0.38; 'subject:: ': 0.38; 'header:Mime-Version:1': 0.39; 'data': 0.39; 'list,': 0.39; 'to:addr:python.org': 0.39; 'received:209': 0.40; 'help!': 0.40; '1000': 0.62; 'received:10.10': 0.68; 'blank': 0.71; 'useless...': 0.84; 'subject:cost': 0.91
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer:content-transfer-encoding; bh=uDYn4+JkWsx3MSa2YSj9QgKICeQ5zlgYN6PPAkBERy0=; b=ecYN5QQjUHsUGuvkt6xY7rMBxWwzj1wQ9ucNCRqcMK5qoPkEiZCFFDddnsAcbHPCcp QmszHj+3MSp+47J3gPk4GjRz6uzULhUr0YpNBVV2C3TfVvaVVTsWPi7BbkCQULyuQIIE aaPwvffzEl73m90IW25haf6XaHExQfgQpoLDg=
Subject Re: python reading file memory cost
From Tony Zhang <warriorlance@gmail.com>
To python-list@python.org
In-Reply-To <CAGGBd_quXLPkr0kNDPJhOwcdrf2BxELQmXcq7_RKW-e9Ox=yFg@mail.gmail.com>
References <000f01cc505c$74e01e80$5ea05b80$@com> <CAGGBd_quXLPkr0kNDPJhOwcdrf2BxELQmXcq7_RKW-e9Ox=yFg@mail.gmail.com>
Content-Type text/plain; charset="UTF-8"
Date Tue, 02 Aug 2011 11:22:50 +0800
Mime-Version 1.0
X-Mailer Evolution 2.32.2 (2.32.2-1.fc14)
Content-Transfer-Encoding 7bit
Cc Thomas 'PointedEars' Lahn <PointedEars@web.de>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.1746.1312255378.1164.python-list@python.org> (permalink)
Lines 48
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1312255378 news.xs4all.nl 23958 [2001:888:2000:d::a6]:37088
X-Complaints-To abuse@xs4all.nl
Xref x330-a1.tempe.blueboxinc.net comp.lang.python:10693

Show key headers only | View raw


Thanks!

Actually, I used .readline() to parse file line by line, because I need
to find out the start position to extract data into list, and the end
point to pause extracting, then repeat until the end of file.
My file to read is formatted like this:

blabla...useless....
useless...

/sign/
data block(e.g. 10 cols x 1000 rows)
...
blank line
/sign/
data block(e.g. 10 cols x 1000 rows)
...
blank line
...
...
EOF
let's call this file 'myfile'
and my python snippet:

f=open('myfile','r')
blocknum=0 #number the data block
data=[]
while True"
	# find the extract begnning
	while not f.readline().startswith('/a1/'):pass
	# creat multidimensional list to store data block	
	data=append([])
	blocknum +=1
	line=f.readline()

	while line.strip():  
	# check if the line is a blank line, i.e the end of one block
		data[blocknum-1].append(["2.6E" %float(x) for x in line.split()])
		line = f.readline()
	print "Read Block %d" %blocknum
	if not f.readline(): break 

The running result was that read a 500M file consume almost 2GB RAM, I
cannot figure it out, somebody help!
Thanks very much!

--Tony

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: python reading file memory cost Tony Zhang <warriorlance@gmail.com> - 2011-08-02 11:22 +0800

csiph-web