Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.dougwise.org!news-transit.tcx.org.uk!feeder.news-service.com!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'sys': 0.04; 'memory.': 0.05; 'dictionary': 0.07; 'terry': 0.07; 'csv': 0.09; 'dict': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:80.91.229.12': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'received:lo.gmane.org': 0.09; 'pm,': 0.11; 'skip:[ 20': 0.12; 'wrote:': 0.14; 'reedy': 0.16; 'subject:memory': 0.16; 'subject:usage': 0.16; 'intermediate': 0.16; 'jan': 0.22; 'header:In-Reply-To:1': 0.22; 'memory': 0.24; 'expect': 0.26; 'instead': 0.26; 'sort': 0.30; 'key,': 0.31; 'does': 0.31; 'import': 0.32; 'to:addr:python-list': 0.32; 'matters': 0.33; 'header:X-Complaints-To:1': 0.34; 'file': 0.35; 'header:User-Agent:1': 0.35; 'data': 0.37; 'but': 0.38; 'received:org': 0.38; 'to:addr:python.org': 0.39; 'header:Mime- Version:1': 0.39; 'header:Received:5': 0.40; 'subject:value': 0.84 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Terry Reedy Subject: Re: memory usage multi value hash Date: Thu, 14 Apr 2011 13:28:47 -0400 References: <9e79c6fe-ea6c-4849-bf7a-1b596ff37ecc@r35g2000prj.googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Gmane-NNTP-Posting-Host: rain.gmane.org User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.15) Gecko/20110303 Lightning/1.0b2 Thunderbird/3.1.9 In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 26 NNTP-Posting-Host: 82.94.164.166 X-Trace: 1302802138 news.xs4all.nl 81481 [::ffff:82.94.164.166]:36252 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:3208 On 4/14/2011 12:55 PM, Peter Otten wrote: > I don't expect that it matters much, but you don't need to sort your data if > you use a dictionary anyway: Which means that one can build the dict line by line, as each is read, instead of reading the entire file into memory. So it does matter for intermediate memory use. > import csv > import sys > > infile, outfile = sys.argv[1:] > > d = {} > with open(infile, "rb") as instream: > for key, value in csv.reader(instream, delimiter=';'): > d.setdefault(key, [key]).append(value) > > with open(outfile, "wb") as outstream: > csv.writer(outstream).writerows(d.itervalues()) -- Terry Jan Reedy