Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #3822

Re: Finding empty columns. Is there a faster way?

Path csiph.com!x330-a1.tempe.blueboxinc.net!feeder1.hal-mli.net!nx02.iad01.newshosting.com!newshosting.com!69.16.185.16.MISMATCH!npeer02.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!postnews.google.com!z31g2000vbs.googlegroups.com!not-for-mail
From Jon Clements <joncle@googlemail.com>
Newsgroups comp.lang.python
Subject Re: Finding empty columns. Is there a faster way?
Date Thu, 21 Apr 2011 13:32:47 -0700 (PDT)
Organization http://groups.google.com
Lines 61
Message-ID <1d1fb6bb-0f09-42f6-b543-fed032efe8cb@z31g2000vbs.googlegroups.com> (permalink)
References <e6f7d142-0691-4a6f-91fe-401dcbe291c9@u12g2000vbf.googlegroups.com>
NNTP-Posting-Host 188.29.53.234
Mime-Version 1.0
Content-Type text/plain; charset=ISO-8859-1
Content-Transfer-Encoding quoted-printable
X-Trace posting.google.com 1303417967 3389 127.0.0.1 (21 Apr 2011 20:32:47 GMT)
X-Complaints-To groups-abuse@google.com
NNTP-Posting-Date Thu, 21 Apr 2011 20:32:47 +0000 (UTC)
Complaints-To groups-abuse@google.com
Injection-Info z31g2000vbs.googlegroups.com; posting-host=188.29.53.234; posting-account=HLD_OAoAAAD-0RilNRZUjdKEwXt97Q9q
User-Agent G2/1.0
X-HTTP-UserAgent Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.9.2.16) Gecko/20110323 Ubuntu/10.10 (maverick) Firefox/3.6.16,gzip(gfe)
Xref x330-a1.tempe.blueboxinc.net comp.lang.python:3822

Show key headers only | View raw


On Apr 21, 5:40 pm, nn <prueba...@latinmail.com> wrote:
> time head -1000000 myfile  >/dev/null
>
> real    0m4.57s
> user    0m3.81s
> sys     0m0.74s
>
> time ./repnullsalt.py '|' myfile
> 0 1 Null columns:
> 11, 20, 21, 22, 23, 24, 25, 26, 27, 30, 31, 33, 45, 50, 68
>
> real    1m28.94s
> user    1m28.11s
> sys     0m0.72s
>
> import sys
> def main():
>     with open(sys.argv[2],'rb') as inf:
>         limit = sys.argv[3] if len(sys.argv)>3 else 1
>         dlm = sys.argv[1].encode('latin1')
>         nulls = [x==b'' for x in next(inf)[:-1].split(dlm)]
>         enum = enumerate
>         split = bytes.split
>         out = sys.stdout
>         prn = print
>         for j, r in enum(inf):
>             if j%1000000==0:
>                 prn(j//1000000,end=' ')
>                 out.flush()
>                 if j//1000000>=limit:
>                     break
>             for i, cur in enum(split(r[:-1],dlm)):
>                 nulls[i] |= cur==b''
>     print('Null columns:')
>     print(', '.join(str(i+1) for i,val in enumerate(nulls) if val))
>
> if not (len(sys.argv)>2):
>     sys.exit("Usage: "+sys.argv[0]+
>          " <delimiter> <filename> <limit>")
>
> main()


What's with the aliasing enumerate and print??? And on heavy disk IO I
can hardly see that name lookups are going to be any problem at all?
And why the time stats with /dev/null ???


I'd probably go for something like:

import csv

with open('somefile') as fin:
    nulls = set()
    for row in csv.reader(fin, delimiter='|'):
        nulls.update(idx for idx,val in enumerate(row, start=1) if not
val)
    print 'nulls =', sorted(nulls)

hth
Jon

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Finding empty columns. Is there a faster way? nn <pruebauno@latinmail.com> - 2011-04-21 09:40 -0700
  Re: Finding empty columns. Is there a faster way? Jon Clements <joncle@googlemail.com> - 2011-04-21 13:32 -0700
    Re: Finding empty columns. Is there a faster way? nn <pruebauno@latinmail.com> - 2011-04-22 07:46 -0700

csiph-web