Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #3249
| Path | csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!feeder.news-service.com!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <akabaila@pcug.org.au> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'sys': 0.04; 'python.': 0.05; 'operator': 0.05; '34,': 0.07; 'ram': 0.07; 'received:localnet': 0.07; 'python': 0.07; "'''": 0.09; 'csv': 0.09; 'generators': 0.09; 'output': 0.12; 'skip:[ 20': 0.12; 'advance': 0.14; 'wrote:': 0.14; '"\\n")': 0.16; "'w')": 0.16; 'different,': 0.16; 'f.close()': 0.16; 'key)': 0.16; 'subject:memory': 0.16; 'subject:usage': 0.16; 'usage': 0.20; 'cc:2**0': 0.20; 'maybe': 0.21; 'header:In-Reply-To:1': 0.22; 'column': 0.22; 'values': 0.23; 'example': 0.24; 'memory': 0.24; "i'm": 0.26; 'thanks': 0.29; 'effect': 0.29; 'probably': 0.30; 'list': 0.30; 'skip:( 20': 0.31; 'second': 0.31; 'determine': 0.31; 'random': 0.31; "skip:' 10": 0.32; 'import': 0.32; 'to:addr :python-list': 0.32; 'there': 0.35; 'file': 0.35; 'christian': 0.35; 'follows:': 0.35; 'print': 0.35; 'header:User-Agent:1': 0.35; 'skip:f 40': 0.35; 'doing': 0.36; 'hello,': 0.36; 'received:au': 0.36; 'data': 0.37; 'two': 0.37; 'sequence': 0.38; 'skip:z 20': 0.38; 'thread': 0.38; 'url:org': 0.38; 'url:au': 0.39; 'set': 0.39; 'to:addr:python.org': 0.39; 'could': 0.39; 'solution': 0.40; 'header:Message-Id:1': 0.62; '2011': 0.62; 'unique': 0.63; 'below': 0.63; 'friday': 0.65; 'alternative': 0.69; '11,': 0.77; 'subject:value': 0.84 |
| From | Algis Kabaila <akabaila@pcug.org.au> |
| Organization | PCUG - Users Helping Users |
| To | python-list@python.org |
| Subject | Re: memory usage multi value hash |
| Date | Fri, 15 Apr 2011 18:01:45 +1000 |
| User-Agent | KMail/1.13.5 (Linux/2.6.35-25-generic-pae; KDE/4.5.1; i686; ; ) |
| References | <9e79c6fe-ea6c-4849-bf7a-1b596ff37ecc@r35g2000prj.googlegroups.com> |
| In-Reply-To | <9e79c6fe-ea6c-4849-bf7a-1b596ff37ecc@r35g2000prj.googlegroups.com> |
| MIME-Version | 1.0 |
| Content-Type | Text/Plain; charset="iso-8859-1" |
| Content-Transfer-Encoding | 7bit |
| Cc | christian <ozric@web.de> |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.12 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.384.1302854925.9059.python-list@python.org> (permalink) |
| Lines | 104 |
| NNTP-Posting-Host | 82.94.164.166 |
| X-Trace | 1302854926 news.xs4all.nl 34849 [::ffff:82.94.164.166]:34806 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | x330-a1.tempe.blueboxinc.net comp.lang.python:3249 |
Show key headers only | View raw
On Friday 15 April 2011 02:13:51 christian wrote:
> Hello,
>
> i'm not very experienced in python. Is there a way doing
> below more memory efficient and maybe faster.
> I import a 2-column file and then concat for every unique
> value in the first column ( key) the value from the second
> columns.
>
> So The ouptut is something like that.
> A,1,2,3
> B,3,4
> C,9,10,11,12,90,34,322,21
>
>
> Thanks for advance & regards,
> Christian
>
>
> import csv
> import random
> import sys
> from itertools import groupby
> from operator import itemgetter
>
> f=csv.reader(open(sys.argv[1]),delimiter=';')
> z=[[i[0],i[1]] for i in f]
> z.sort(key=itemgetter(0))
> mydict = dict((k,','.join(map(itemgetter(1), it)))
> for k, it in groupby(z, itemgetter(0)))
> del(z)
>
> f = open(sys.argv[2], 'w')
> for k,v in mydict.iteritems():
> f.write(v + "\n")
>
> f.close()
Two alternative solutions - the second one with generators is
probably the most economical as far as RAM usage is concerned.
For you example data1.txt is taken as follows:
A, 1
B, 3
C, 9
A, 2
B, 4
C, 10
A, 3
C, 11
C, 12
C, 90
C, 34
C, 322
C, 21
The "two in one" program is:
#!/usr/bin python
'''generate.py - Example of reading long two column csv list and
sorting. Thread "memory usage multi value hash"
'''
# Determine a set of unique column 1 values
unique_set = set()
with open('data1.txt') as f:
for line in f:
unique_set.add(line.split(',')[0])
print(unique_set)
with open('data1.txt') as f:
for x in unique_set:
ls = [line.split(',')[1].rstrip() for line in f if
line.split(',')[0].rstrip() == x]
print(x.rstrip(), ','.join(ls))
f.seek(0)
print ('\n Alternative solution with generators')
with open('data1.txt') as f:
for x in unique_set:
gs = (line.split(',')[1].rstrip() for line in f if
line.split(',')[0].rstrip() == x)
s = ''
for ds in gs:
s = s + ds
print(x.rstrip(), s)
f.seek(0)
The output is:
{'A', 'C', 'B'}
A 1, 2, 3
C 9, 10, 11, 12, 90, 34, 322, 21
B 3, 4
Alternative solution with generators
A 1 2 3
C 9 10 11 12 90 34 322 21
B 3 4
Notice that data sequence could be different, without any effect
on output.
OldAl.
--
Algis
http://akabaila.pcug.org.au/StructuralAnalysis.pdf
Back to comp.lang.python | Previous | Next — Previous in thread | Find similar | Unroll thread
memory usage multi value hash christian <ozric@web.de> - 2011-04-14 09:13 -0700
Re: memory usage multi value hash Peter Otten <__peter__@web.de> - 2011-04-14 18:55 +0200
Re: memory usage multi value hash Terry Reedy <tjreedy@udel.edu> - 2011-04-14 13:28 -0400
Re: memory usage multi value hash Peter Otten <__peter__@web.de> - 2011-04-15 10:15 +0200
Re: memory usage multi value hash Algis Kabaila <akabaila@pcug.org.au> - 2011-04-15 18:01 +1000
csiph-web