Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!feeder.news-service.com!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.016 X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'python': 0.08; 'consume': 0.09; 'eof': 0.09; 'subject:python': 0.12; 'skip:f 30': 0.13; 'subject:file': 0.13; 'creat': 0.16; 'i.e': 0.16; 'pause': 0.16; 'subject:memory': 0.16; 'much!': 0.16; 'this:': 0.16; 'thanks!': 0.17; 'figure': 0.21; 'header:In-Reply-To:1': 0.22; 'parse': 0.23; 'extract': 0.25; 'skip:b 20': 0.26; 'somebody': 0.28; 'cc:addr:gmail.com': 0.30; 'thanks': 0.31; 'print': 0.32; 'list': 0.32; 'file.': 0.32; 'break': 0.33; 'to:addr:python-list': 0.34; 'running': 0.35; 'file': 0.36; 'cc:2**1': 0.37; 'received:google.com': 0.38; 'received:209.85': 0.38; 'subject:: ': 0.38; 'header:Mime-Version:1': 0.39; 'data': 0.39; 'list,': 0.39; 'to:addr:python.org': 0.39; 'received:209': 0.40; 'help!': 0.40; '1000': 0.62; 'received:10.10': 0.68; 'blank': 0.71; 'useless...': 0.84; 'subject:cost': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer:content-transfer-encoding; bh=uDYn4+JkWsx3MSa2YSj9QgKICeQ5zlgYN6PPAkBERy0=; b=ecYN5QQjUHsUGuvkt6xY7rMBxWwzj1wQ9ucNCRqcMK5qoPkEiZCFFDddnsAcbHPCcp QmszHj+3MSp+47J3gPk4GjRz6uzULhUr0YpNBVV2C3TfVvaVVTsWPi7BbkCQULyuQIIE aaPwvffzEl73m90IW25haf6XaHExQfgQpoLDg= Subject: Re: python reading file memory cost From: Tony Zhang To: python-list@python.org In-Reply-To: References: <000f01cc505c$74e01e80$5ea05b80$@com> Content-Type: text/plain; charset="UTF-8" Date: Tue, 02 Aug 2011 11:22:50 +0800 Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 (2.32.2-1.fc14) Content-Transfer-Encoding: 7bit Cc: Thomas 'PointedEars' Lahn X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 48 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1312255378 news.xs4all.nl 23958 [2001:888:2000:d::a6]:37088 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:10693 Thanks! Actually, I used .readline() to parse file line by line, because I need to find out the start position to extract data into list, and the end point to pause extracting, then repeat until the end of file. My file to read is formatted like this: blabla...useless.... useless... /sign/ data block(e.g. 10 cols x 1000 rows) ... blank line /sign/ data block(e.g. 10 cols x 1000 rows) ... blank line ... ... EOF let's call this file 'myfile' and my python snippet: f=open('myfile','r') blocknum=0 #number the data block data=[] while True" # find the extract begnning while not f.readline().startswith('/a1/'):pass # creat multidimensional list to store data block data=append([]) blocknum +=1 line=f.readline() while line.strip(): # check if the line is a blank line, i.e the end of one block data[blocknum-1].append(["2.6E" %float(x) for x in line.split()]) line = f.readline() print "Read Block %d" %blocknum if not f.readline(): break The running result was that read a 500M file consume almost 2GB RAM, I cannot figure it out, somebody help! Thanks very much! --Tony