Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #97415

Re: Finding Blank Columns in CSV

Date 2015-10-05 09:06 -0500
From Tim Chase <python.list@tim.thechases.com>
Subject Re: Finding Blank Columns in CSV
References <muttuu$pc5$1@ger.gmane.org> <CAPTjJmrdE-n=VazddU7PZS6E=rbBf1XVoz_g2ArxK6454D5qxw@mail.gmail.com> <CAPTjJmqYwD0=KNLaHu_w_sOjUoacQRXFhb4Zxdv6n++5A-Y8iQ@mail.gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.393.1444054086.28679.python-list@python.org> (permalink)

Show all headers | View raw


On 2015-10-06 00:51, Chris Angelico wrote:
> fn = "tmp1.csv"
> fin = open(fn, 'rb')
> rdr = csv.DictReader(fin, delimiter=',')
> # all the same down to here
> blanks = set(rdr.fieldnames)
> for row in rdr:
>     blanks = {col for col in blanks if not row[col]}
> mt = [col for col in rdr.fieldnames if col not in blanks]
> print mt

My only other modification would be to add a check that, if you no
longer have any blank columns, bail early from the loop:

  from cStringIO import StringIO
  import csv

  s = StringIO("""Name,Surname,Age,Sex
  abc,def,,M
  ,ghi,,F
  jkl,mno,,
  pqr,,,F
  """)

  dr = csv.DictReader(s)
  header_set = set(dr.fieldnames)
  for row in dr:
    header_set = set(h for h in header_set if not row[h])
    if not header_set:
      # we no longer have any headers, bail early
      break
  ordered_headers = [h for h in dr.fieldnames if h in header_set]
  print(header_set)
  print(ordered_headers)

That way, if you determine by line 3 that your million-row CSV file
has no blank columns, you can get away with not processing all
million rows.

-tkc


Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Finding Blank Columns in CSV Tim Chase <python.list@tim.thechases.com> - 2015-10-05 09:06 -0500

csiph-web