Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!feeder.news-service.com!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'sys': 0.04; 'python.': 0.05; 'operator': 0.05; '34,': 0.07; 'ram': 0.07; 'received:localnet': 0.07; 'python': 0.07; "'''": 0.09; 'csv': 0.09; 'generators': 0.09; 'output': 0.12; 'skip:[ 20': 0.12; 'advance': 0.14; 'wrote:': 0.14; '"\\n")': 0.16; "'w')": 0.16; 'different,': 0.16; 'f.close()': 0.16; 'key)': 0.16; 'subject:memory': 0.16; 'subject:usage': 0.16; 'usage': 0.20; 'cc:2**0': 0.20; 'maybe': 0.21; 'header:In-Reply-To:1': 0.22; 'column': 0.22; 'values': 0.23; 'example': 0.24; 'memory': 0.24; "i'm": 0.26; 'thanks': 0.29; 'effect': 0.29; 'probably': 0.30; 'list': 0.30; 'skip:( 20': 0.31; 'second': 0.31; 'determine': 0.31; 'random': 0.31; "skip:' 10": 0.32; 'import': 0.32; 'to:addr :python-list': 0.32; 'there': 0.35; 'file': 0.35; 'christian': 0.35; 'follows:': 0.35; 'print': 0.35; 'header:User-Agent:1': 0.35; 'skip:f 40': 0.35; 'doing': 0.36; 'hello,': 0.36; 'received:au': 0.36; 'data': 0.37; 'two': 0.37; 'sequence': 0.38; 'skip:z 20': 0.38; 'thread': 0.38; 'url:org': 0.38; 'url:au': 0.39; 'set': 0.39; 'to:addr:python.org': 0.39; 'could': 0.39; 'solution': 0.40; 'header:Message-Id:1': 0.62; '2011': 0.62; 'unique': 0.63; 'below': 0.63; 'friday': 0.65; 'alternative': 0.69; '11,': 0.77; 'subject:value': 0.84 From: Algis Kabaila Organization: PCUG - Users Helping Users To: python-list@python.org Subject: Re: memory usage multi value hash Date: Fri, 15 Apr 2011 18:01:45 +1000 User-Agent: KMail/1.13.5 (Linux/2.6.35-25-generic-pae; KDE/4.5.1; i686; ; ) References: <9e79c6fe-ea6c-4849-bf7a-1b596ff37ecc@r35g2000prj.googlegroups.com> In-Reply-To: <9e79c6fe-ea6c-4849-bf7a-1b596ff37ecc@r35g2000prj.googlegroups.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Cc: christian X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 104 NNTP-Posting-Host: 82.94.164.166 X-Trace: 1302854926 news.xs4all.nl 34849 [::ffff:82.94.164.166]:34806 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:3249 On Friday 15 April 2011 02:13:51 christian wrote: > Hello, > > i'm not very experienced in python. Is there a way doing > below more memory efficient and maybe faster. > I import a 2-column file and then concat for every unique > value in the first column ( key) the value from the second > columns. > > So The ouptut is something like that. > A,1,2,3 > B,3,4 > C,9,10,11,12,90,34,322,21 > > > Thanks for advance & regards, > Christian > > > import csv > import random > import sys > from itertools import groupby > from operator import itemgetter > > f=csv.reader(open(sys.argv[1]),delimiter=';') > z=[[i[0],i[1]] for i in f] > z.sort(key=itemgetter(0)) > mydict = dict((k,','.join(map(itemgetter(1), it))) > for k, it in groupby(z, itemgetter(0))) > del(z) > > f = open(sys.argv[2], 'w') > for k,v in mydict.iteritems(): > f.write(v + "\n") > > f.close() Two alternative solutions - the second one with generators is probably the most economical as far as RAM usage is concerned. For you example data1.txt is taken as follows: A, 1 B, 3 C, 9 A, 2 B, 4 C, 10 A, 3 C, 11 C, 12 C, 90 C, 34 C, 322 C, 21 The "two in one" program is: #!/usr/bin python '''generate.py - Example of reading long two column csv list and sorting. Thread "memory usage multi value hash" ''' # Determine a set of unique column 1 values unique_set = set() with open('data1.txt') as f: for line in f: unique_set.add(line.split(',')[0]) print(unique_set) with open('data1.txt') as f: for x in unique_set: ls = [line.split(',')[1].rstrip() for line in f if line.split(',')[0].rstrip() == x] print(x.rstrip(), ','.join(ls)) f.seek(0) print ('\n Alternative solution with generators') with open('data1.txt') as f: for x in unique_set: gs = (line.split(',')[1].rstrip() for line in f if line.split(',')[0].rstrip() == x) s = '' for ds in gs: s = s + ds print(x.rstrip(), s) f.seek(0) The output is: {'A', 'C', 'B'} A 1, 2, 3 C 9, 10, 11, 12, 90, 34, 322, 21 B 3, 4 Alternative solution with generators A 1 2 3 C 9 10 11 12 90 34 322 21 B 3 4 Notice that data sequence could be different, without any effect on output. OldAl. -- Algis http://akabaila.pcug.org.au/StructuralAnalysis.pdf