Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #3822
| From | Jon Clements <joncle@googlemail.com> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: Finding empty columns. Is there a faster way? |
| Date | 2011-04-21 13:32 -0700 |
| Organization | http://groups.google.com |
| Message-ID | <1d1fb6bb-0f09-42f6-b543-fed032efe8cb@z31g2000vbs.googlegroups.com> (permalink) |
| References | <e6f7d142-0691-4a6f-91fe-401dcbe291c9@u12g2000vbf.googlegroups.com> |
On Apr 21, 5:40 pm, nn <prueba...@latinmail.com> wrote:
> time head -1000000 myfile >/dev/null
>
> real 0m4.57s
> user 0m3.81s
> sys 0m0.74s
>
> time ./repnullsalt.py '|' myfile
> 0 1 Null columns:
> 11, 20, 21, 22, 23, 24, 25, 26, 27, 30, 31, 33, 45, 50, 68
>
> real 1m28.94s
> user 1m28.11s
> sys 0m0.72s
>
> import sys
> def main():
> with open(sys.argv[2],'rb') as inf:
> limit = sys.argv[3] if len(sys.argv)>3 else 1
> dlm = sys.argv[1].encode('latin1')
> nulls = [x==b'' for x in next(inf)[:-1].split(dlm)]
> enum = enumerate
> split = bytes.split
> out = sys.stdout
> prn = print
> for j, r in enum(inf):
> if j%1000000==0:
> prn(j//1000000,end=' ')
> out.flush()
> if j//1000000>=limit:
> break
> for i, cur in enum(split(r[:-1],dlm)):
> nulls[i] |= cur==b''
> print('Null columns:')
> print(', '.join(str(i+1) for i,val in enumerate(nulls) if val))
>
> if not (len(sys.argv)>2):
> sys.exit("Usage: "+sys.argv[0]+
> " <delimiter> <filename> <limit>")
>
> main()
What's with the aliasing enumerate and print??? And on heavy disk IO I
can hardly see that name lookups are going to be any problem at all?
And why the time stats with /dev/null ???
I'd probably go for something like:
import csv
with open('somefile') as fin:
nulls = set()
for row in csv.reader(fin, delimiter='|'):
nulls.update(idx for idx,val in enumerate(row, start=1) if not
val)
print 'nulls =', sorted(nulls)
hth
Jon
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar
Finding empty columns. Is there a faster way? nn <pruebauno@latinmail.com> - 2011-04-21 09:40 -0700
Re: Finding empty columns. Is there a faster way? Jon Clements <joncle@googlemail.com> - 2011-04-21 13:32 -0700
Re: Finding empty columns. Is there a faster way? nn <pruebauno@latinmail.com> - 2011-04-22 07:46 -0700
csiph-web