Re: memory usage multi value hash

Path	csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!feeder.news-service.com!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path	<akabaila@pcug.org.au>
X-Original-To	python-list@python.org
Delivered-To	python-list@mail.python.org
X-Spam-Status	OK 0.000
X-Spam-Evidence	'H': 1.00; 'S': 0.00; 'sys': 0.04; 'python.': 0.05; 'operator': 0.05; '34,': 0.07; 'ram': 0.07; 'received:localnet': 0.07; 'python': 0.07; "'''": 0.09; 'csv': 0.09; 'generators': 0.09; 'output': 0.12; 'skip:[ 20': 0.12; 'advance': 0.14; 'wrote:': 0.14; '"\\n")': 0.16; "'w')": 0.16; 'different,': 0.16; 'f.close()': 0.16; 'key)': 0.16; 'subject:memory': 0.16; 'subject:usage': 0.16; 'usage': 0.20; 'cc:2**0': 0.20; 'maybe': 0.21; 'header:In-Reply-To:1': 0.22; 'column': 0.22; 'values': 0.23; 'example': 0.24; 'memory': 0.24; "i'm": 0.26; 'thanks': 0.29; 'effect': 0.29; 'probably': 0.30; 'list': 0.30; 'skip:( 20': 0.31; 'second': 0.31; 'determine': 0.31; 'random': 0.31; "skip:' 10": 0.32; 'import': 0.32; 'to:addr :python-list': 0.32; 'there': 0.35; 'file': 0.35; 'christian': 0.35; 'follows:': 0.35; 'print': 0.35; 'header:User-Agent:1': 0.35; 'skip:f 40': 0.35; 'doing': 0.36; 'hello,': 0.36; 'received:au': 0.36; 'data': 0.37; 'two': 0.37; 'sequence': 0.38; 'skip:z 20': 0.38; 'thread': 0.38; 'url:org': 0.38; 'url:au': 0.39; 'set': 0.39; 'to:addr:python.org': 0.39; 'could': 0.39; 'solution': 0.40; 'header:Message-Id:1': 0.62; '2011': 0.62; 'unique': 0.63; 'below': 0.63; 'friday': 0.65; 'alternative': 0.69; '11,': 0.77; 'subject:value': 0.84
From	Algis Kabaila <akabaila@pcug.org.au>
Organization	PCUG - Users Helping Users
To	python-list@python.org
Subject	Re: memory usage multi value hash
Date	Fri, 15 Apr 2011 18:01:45 +1000
User-Agent	KMail/1.13.5 (Linux/2.6.35-25-generic-pae; KDE/4.5.1; i686; ; )
References	<9e79c6fe-ea6c-4849-bf7a-1b596ff37ecc@r35g2000prj.googlegroups.com>
In-Reply-To	<9e79c6fe-ea6c-4849-bf7a-1b596ff37ecc@r35g2000prj.googlegroups.com>
MIME-Version	1.0
Content-Type	Text/Plain; charset="iso-8859-1"
Content-Transfer-Encoding	7bit
Cc	christian <ozric@web.de>
X-BeenThere	python-list@python.org
X-Mailman-Version	2.1.12
Precedence	list
List-Id	General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe	<http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive	<http://mail.python.org/pipermail/python-list>
List-Post	<mailto:python-list@python.org>
List-Help	<mailto:python-list-request@python.org?subject=help>
List-Subscribe	<http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups	comp.lang.python
Message-ID	<mailman.384.1302854925.9059.python-list@python.org> (permalink)
Lines	104
NNTP-Posting-Host	82.94.164.166
X-Trace	1302854926 news.xs4all.nl 34849 [::ffff:82.94.164.166]:34806
X-Complaints-To	abuse@xs4all.nl
Xref	x330-a1.tempe.blueboxinc.net comp.lang.python:3249

Show key headers only | View raw

On Friday 15 April 2011 02:13:51 christian wrote:
> Hello,
> 
> i'm not very experienced in python. Is there a way doing
> below more memory efficient and maybe faster.
> I import a  2-column file and  then concat for every unique
> value in the first column ( key) the value from the second
> columns.
> 
> So The ouptut is something like that.
> A,1,2,3
> B,3,4
> C,9,10,11,12,90,34,322,21
> 
> 
> Thanks for advance & regards,
> Christian
> 
> 
> import csv
> import random
> import sys
> from itertools import groupby
> from operator import itemgetter
> 
> f=csv.reader(open(sys.argv[1]),delimiter=';')
> z=[[i[0],i[1]] for i in f]
> z.sort(key=itemgetter(0))
> mydict = dict((k,','.join(map(itemgetter(1), it)))
>            for k, it in groupby(z, itemgetter(0)))
> del(z)
> 
> f = open(sys.argv[2], 'w')
> for k,v in mydict.iteritems():
>     f.write(v + "\n")
> 
> f.close()
Two alternative solutions - the second one with generators is 
probably the  most economical as far as RAM usage is concerned.

For  you example data1.txt is taken as follows:
A, 1
B, 3
C, 9
A, 2
B, 4
C, 10
A, 3
C, 11
C, 12
C, 90
C, 34
C, 322
C, 21

The "two in one" program is:
#!/usr/bin python
'''generate.py - Example of reading long two column csv list and
sorting. Thread "memory usage multi value hash"
'''

# Determine a set of unique column 1 values
unique_set = set()
with open('data1.txt') as f:
    for line in f:
        unique_set.add(line.split(',')[0])
    print(unique_set)
with open('data1.txt') as f:
    for x in unique_set:
        ls = [line.split(',')[1].rstrip() for line in f if 
line.split(',')[0].rstrip() == x]
        print(x.rstrip(), ','.join(ls))
        f.seek(0)

print ('\n Alternative solution with generators')
with open('data1.txt') as f:
    for x in unique_set:
        gs = (line.split(',')[1].rstrip() for line in f if 
line.split(',')[0].rstrip() == x)
        s = ''
        for ds in gs:
            s = s + ds
        print(x.rstrip(), s)
        f.seek(0)

The output is:
{'A', 'C', 'B'}
A  1, 2, 3
C  9, 10, 11, 12, 90, 34, 322, 21
B  3, 4

 Alternative solution with generators
A  1 2 3
C  9 10 11 12 90 34 322 21
B  3 4

Notice that data sequence could be different, without any effect 
on output.

OldAl.

-- 
Algis
http://akabaila.pcug.org.au/StructuralAnalysis.pdf

Back to comp.lang.python | Previous | Next — Previous in thread | Find similar | Unroll thread

Thread

memory usage multi value hash christian <ozric@web.de> - 2011-04-14 09:13 -0700
  Re: memory usage multi value hash Peter Otten <__peter__@web.de> - 2011-04-14 18:55 +0200
  Re: memory usage multi value hash Terry Reedy <tjreedy@udel.edu> - 2011-04-14 13:28 -0400
    Re: memory usage multi value hash Peter Otten <__peter__@web.de> - 2011-04-15 10:15 +0200
  Re: memory usage multi value hash Algis Kabaila <akabaila@pcug.org.au> - 2011-04-15 18:01 +1000

csiph-web