Re: memory usage multi value hash

Path	csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!news2.arglkargh.de!news.wiretrip.org!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path	<python-python-list@m.gmane.org>
X-Original-To	python-list@python.org
Delivered-To	python-list@mail.python.org
X-Spam-Status	OK 0.000
X-Spam-Evidence	'H': 1.00; 'S': 0.00; 'sys': 0.04; 'python.': 0.05; 'operator': 0.05; 'dictionary': 0.07; 'csv': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:80.91.229.12': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'received:lo.gmane.org': 0.09; 'skip:[ 20': 0.12; 'advance': 0.14; 'wrote:': 0.14; '"\\n")': 0.16; "'w')": 0.16; 'f.close()': 0.16; 'key)': 0.16; 'received:dip.t-dialin.net': 0.16; 'received:t-dialin.net': 0.16; 'subject:memory': 0.16; 'subject:usage': 0.16; 'maybe': 0.21; 'column': 0.22; 'memory': 0.24; 'expect': 0.26; "i'm": 0.26; 'thanks': 0.29; 'sort': 0.30; 'from:addr:web.de': 0.31; 'key,': 0.31; 'second': 0.31; 'random': 0.31; 'import': 0.32; 'to:addr :python-list': 0.32; 'matters': 0.33; 'header:X-Complaints-To:1': 0.34; 'there': 0.35; 'file': 0.35; 'christian': 0.35; 'skip:f 40': 0.35; 'doing': 0.36; 'hello,': 0.36; 'data': 0.37; 'skip:z 20': 0.38; 'but': 0.38; 'received:org': 0.38; 'to:addr:python.org': 0.39; 'header:Mime-Version:1': 0.39; 'header:Received:5': 0.40; 'unique': 0.63; 'below': 0.63; 'subject:value': 0.84
X-Injected-Via-Gmane	http://gmane.org/
To	python-list@python.org
From	Peter Otten <__peter__@web.de>
Subject	Re: memory usage multi value hash
Date	Thu, 14 Apr 2011 18:55:09 +0200
Organization	None
References	<9e79c6fe-ea6c-4849-bf7a-1b596ff37ecc@r35g2000prj.googlegroups.com>
Mime-Version	1.0
Content-Type	text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding	7Bit
X-Gmane-NNTP-Posting-Host	p5084b3f5.dip.t-dialin.net
X-BeenThere	python-list@python.org
X-Mailman-Version	2.1.12
Precedence	list
List-Id	General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe	<http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive	<http://mail.python.org/pipermail/python-list>
List-Post	<mailto:python-list@python.org>
List-Help	<mailto:python-list-request@python.org?subject=help>
List-Subscribe	<http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups	comp.lang.python
Message-ID	<mailman.365.1302800084.9059.python-list@python.org> (permalink)
Lines	56
NNTP-Posting-Host	82.94.164.166
X-Trace	1302800084 news.xs4all.nl 81485 [::ffff:82.94.164.166]:56960
X-Complaints-To	abuse@xs4all.nl
Xref	x330-a1.tempe.blueboxinc.net comp.lang.python:3207

Show key headers only | View raw

christian wrote:

> Hello,
> 
> i'm not very experienced in python. Is there a way doing below more
> memory efficient and maybe faster.
> I import a  2-column file and  then concat for every unique value in
> the first column ( key) the value from the second
> columns.
> 
> So The ouptut is something like that.
> A,1,2,3
> B,3,4
> C,9,10,11,12,90,34,322,21
> 
> 
> Thanks for advance & regards,
> Christian
> 
> 
> import csv
> import random
> import sys
> from itertools import groupby
> from operator import itemgetter
> 
> f=csv.reader(open(sys.argv[1]),delimiter=';')
> z=[[i[0],i[1]] for i in f]
> z.sort(key=itemgetter(0))
> mydict = dict((k,','.join(map(itemgetter(1), it)))
>            for k, it in groupby(z, itemgetter(0)))
> del(z)
> 
> f = open(sys.argv[2], 'w')
> for k,v in mydict.iteritems():
>     f.write(v + "\n")
> 
> f.close()

I don't expect that it matters much, but you don't need to sort your data if 
you use a dictionary anyway:

import csv
import sys

infile, outfile = sys.argv[1:]

d = {}
with open(infile, "rb") as instream:
    for key, value in csv.reader(instream, delimiter=';'):
        d.setdefault(key, [key]).append(value)

with open(outfile, "wb") as outstream:
    csv.writer(outstream).writerows(d.itervalues())

Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar

Thread

memory usage multi value hash christian <ozric@web.de> - 2011-04-14 09:13 -0700
  Re: memory usage multi value hash Peter Otten <__peter__@web.de> - 2011-04-14 18:55 +0200
  Re: memory usage multi value hash Terry Reedy <tjreedy@udel.edu> - 2011-04-14 13:28 -0400
    Re: memory usage multi value hash Peter Otten <__peter__@web.de> - 2011-04-15 10:15 +0200
  Re: memory usage multi value hash Algis Kabaila <akabaila@pcug.org.au> - 2011-04-15 18:01 +1000

csiph-web