Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #3202 > unrolled thread
| Started by | christian <ozric@web.de> |
|---|---|
| First post | 2011-04-14 09:13 -0700 |
| Last post | 2011-04-15 18:01 +1000 |
| Articles | 5 — 4 participants |
Back to article view | Back to comp.lang.python
memory usage multi value hash christian <ozric@web.de> - 2011-04-14 09:13 -0700
Re: memory usage multi value hash Peter Otten <__peter__@web.de> - 2011-04-14 18:55 +0200
Re: memory usage multi value hash Terry Reedy <tjreedy@udel.edu> - 2011-04-14 13:28 -0400
Re: memory usage multi value hash Peter Otten <__peter__@web.de> - 2011-04-15 10:15 +0200
Re: memory usage multi value hash Algis Kabaila <akabaila@pcug.org.au> - 2011-04-15 18:01 +1000
| From | christian <ozric@web.de> |
|---|---|
| Date | 2011-04-14 09:13 -0700 |
| Subject | memory usage multi value hash |
| Message-ID | <9e79c6fe-ea6c-4849-bf7a-1b596ff37ecc@r35g2000prj.googlegroups.com> |
Hello,
i'm not very experienced in python. Is there a way doing below more
memory efficient and maybe faster.
I import a 2-column file and then concat for every unique value in
the first column ( key) the value from the second
columns.
So The ouptut is something like that.
A,1,2,3
B,3,4
C,9,10,11,12,90,34,322,21
Thanks for advance & regards,
Christian
import csv
import random
import sys
from itertools import groupby
from operator import itemgetter
f=csv.reader(open(sys.argv[1]),delimiter=';')
z=[[i[0],i[1]] for i in f]
z.sort(key=itemgetter(0))
mydict = dict((k,','.join(map(itemgetter(1), it)))
for k, it in groupby(z, itemgetter(0)))
del(z)
f = open(sys.argv[2], 'w')
for k,v in mydict.iteritems():
f.write(v + "\n")
f.close()
[toc] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2011-04-14 18:55 +0200 |
| Message-ID | <mailman.365.1302800084.9059.python-list@python.org> |
| In reply to | #3202 |
christian wrote:
> Hello,
>
> i'm not very experienced in python. Is there a way doing below more
> memory efficient and maybe faster.
> I import a 2-column file and then concat for every unique value in
> the first column ( key) the value from the second
> columns.
>
> So The ouptut is something like that.
> A,1,2,3
> B,3,4
> C,9,10,11,12,90,34,322,21
>
>
> Thanks for advance & regards,
> Christian
>
>
> import csv
> import random
> import sys
> from itertools import groupby
> from operator import itemgetter
>
> f=csv.reader(open(sys.argv[1]),delimiter=';')
> z=[[i[0],i[1]] for i in f]
> z.sort(key=itemgetter(0))
> mydict = dict((k,','.join(map(itemgetter(1), it)))
> for k, it in groupby(z, itemgetter(0)))
> del(z)
>
> f = open(sys.argv[2], 'w')
> for k,v in mydict.iteritems():
> f.write(v + "\n")
>
> f.close()
I don't expect that it matters much, but you don't need to sort your data if
you use a dictionary anyway:
import csv
import sys
infile, outfile = sys.argv[1:]
d = {}
with open(infile, "rb") as instream:
for key, value in csv.reader(instream, delimiter=';'):
d.setdefault(key, [key]).append(value)
with open(outfile, "wb") as outstream:
csv.writer(outstream).writerows(d.itervalues())
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2011-04-14 13:28 -0400 |
| Message-ID | <mailman.366.1302802138.9059.python-list@python.org> |
| In reply to | #3202 |
On 4/14/2011 12:55 PM, Peter Otten wrote:
> I don't expect that it matters much, but you don't need to sort your data if
> you use a dictionary anyway:
Which means that one can build the dict line by line, as each is read,
instead of reading the entire file into memory. So it does matter for
intermediate memory use.
> import csv
> import sys
>
> infile, outfile = sys.argv[1:]
>
> d = {}
> with open(infile, "rb") as instream:
> for key, value in csv.reader(instream, delimiter=';'):
> d.setdefault(key, [key]).append(value)
>
> with open(outfile, "wb") as outstream:
> csv.writer(outstream).writerows(d.itervalues())
--
Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2011-04-15 10:15 +0200 |
| Message-ID | <io8uq5$5id$1@solani.org> |
| In reply to | #3208 |
Terry Reedy wrote: > On 4/14/2011 12:55 PM, Peter Otten wrote: > >> I don't expect that it matters much, but you don't need to sort your data >> if you use a dictionary anyway: > > Which means that one can build the dict line by line, as each is read, > instead of reading the entire file into memory. So it does matter for > intermediate memory use. Yes, sorry, that was a bit too much handwaving.
[toc] | [prev] | [next] | [standalone]
| From | Algis Kabaila <akabaila@pcug.org.au> |
|---|---|
| Date | 2011-04-15 18:01 +1000 |
| Message-ID | <mailman.384.1302854925.9059.python-list@python.org> |
| In reply to | #3202 |
On Friday 15 April 2011 02:13:51 christian wrote:
> Hello,
>
> i'm not very experienced in python. Is there a way doing
> below more memory efficient and maybe faster.
> I import a 2-column file and then concat for every unique
> value in the first column ( key) the value from the second
> columns.
>
> So The ouptut is something like that.
> A,1,2,3
> B,3,4
> C,9,10,11,12,90,34,322,21
>
>
> Thanks for advance & regards,
> Christian
>
>
> import csv
> import random
> import sys
> from itertools import groupby
> from operator import itemgetter
>
> f=csv.reader(open(sys.argv[1]),delimiter=';')
> z=[[i[0],i[1]] for i in f]
> z.sort(key=itemgetter(0))
> mydict = dict((k,','.join(map(itemgetter(1), it)))
> for k, it in groupby(z, itemgetter(0)))
> del(z)
>
> f = open(sys.argv[2], 'w')
> for k,v in mydict.iteritems():
> f.write(v + "\n")
>
> f.close()
Two alternative solutions - the second one with generators is
probably the most economical as far as RAM usage is concerned.
For you example data1.txt is taken as follows:
A, 1
B, 3
C, 9
A, 2
B, 4
C, 10
A, 3
C, 11
C, 12
C, 90
C, 34
C, 322
C, 21
The "two in one" program is:
#!/usr/bin python
'''generate.py - Example of reading long two column csv list and
sorting. Thread "memory usage multi value hash"
'''
# Determine a set of unique column 1 values
unique_set = set()
with open('data1.txt') as f:
for line in f:
unique_set.add(line.split(',')[0])
print(unique_set)
with open('data1.txt') as f:
for x in unique_set:
ls = [line.split(',')[1].rstrip() for line in f if
line.split(',')[0].rstrip() == x]
print(x.rstrip(), ','.join(ls))
f.seek(0)
print ('\n Alternative solution with generators')
with open('data1.txt') as f:
for x in unique_set:
gs = (line.split(',')[1].rstrip() for line in f if
line.split(',')[0].rstrip() == x)
s = ''
for ds in gs:
s = s + ds
print(x.rstrip(), s)
f.seek(0)
The output is:
{'A', 'C', 'B'}
A 1, 2, 3
C 9, 10, 11, 12, 90, 34, 322, 21
B 3, 4
Alternative solution with generators
A 1 2 3
C 9 10 11 12 90 34 322 21
B 3 4
Notice that data sequence could be different, without any effect
on output.
OldAl.
--
Algis
http://akabaila.pcug.org.au/StructuralAnalysis.pdf
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web