Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #97415 > unrolled thread
| Started by | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| First post | 2015-10-05 09:06 -0500 |
| Last post | 2015-10-05 09:06 -0500 |
| Articles | 1 — 1 participant |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Finding Blank Columns in CSV Tim Chase <python.list@tim.thechases.com> - 2015-10-05 09:06 -0500
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2015-10-05 09:06 -0500 |
| Subject | Re: Finding Blank Columns in CSV |
| Message-ID | <mailman.393.1444054086.28679.python-list@python.org> |
On 2015-10-06 00:51, Chris Angelico wrote:
> fn = "tmp1.csv"
> fin = open(fn, 'rb')
> rdr = csv.DictReader(fin, delimiter=',')
> # all the same down to here
> blanks = set(rdr.fieldnames)
> for row in rdr:
> blanks = {col for col in blanks if not row[col]}
> mt = [col for col in rdr.fieldnames if col not in blanks]
> print mt
My only other modification would be to add a check that, if you no
longer have any blank columns, bail early from the loop:
from cStringIO import StringIO
import csv
s = StringIO("""Name,Surname,Age,Sex
abc,def,,M
,ghi,,F
jkl,mno,,
pqr,,,F
""")
dr = csv.DictReader(s)
header_set = set(dr.fieldnames)
for row in dr:
header_set = set(h for h in header_set if not row[h])
if not header_set:
# we no longer have any headers, bail early
break
ordered_headers = [h for h in dr.fieldnames if h in header_set]
print(header_set)
print(ordered_headers)
That way, if you determine by line 3 that your million-row CSV file
has no blank columns, you can get away with not processing all
million rows.
-tkc
Back to top | Article view | comp.lang.python
csiph-web