Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #97449 > unrolled thread

Re: Finding Blank Columns in CSV

Started byMRAB <python@mrabarnett.plus.com>
First post2015-10-06 16:03 +0100
Last post2015-10-06 16:03 +0100
Articles 1 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Finding Blank Columns in CSV MRAB <python@mrabarnett.plus.com> - 2015-10-06 16:03 +0100

#97449 — Re: Finding Blank Columns in CSV

FromMRAB <python@mrabarnett.plus.com>
Date2015-10-06 16:03 +0100
SubjectRe: Finding Blank Columns in CSV
Message-ID<mailman.428.1444144027.28679.python-list@python.org>
On 2015-10-06 12:24, Jaydip Chakrabarty wrote:
> On Tue, 06 Oct 2015 01:34:17 +1100, Chris Angelico wrote:
>
>> On Tue, Oct 6, 2015 at 1:06 AM, Tim Chase
>> <python.list@tim.thechases.com> wrote:
>>> That way, if you determine by line 3 that your million-row CSV file has
>>> no blank columns, you can get away with not processing all million
>>> rows.
>>
>> Sure, although that effectively means the entire job is moot. I kinda
>> assume that the OP knows that there are some blank columns (maybe lots
>> of them). The extra check is unnecessary unless it's actually plausible
>> that there'll be no blanks whatsoever.
>>
>> Incidentally, you have an ordered_headers list which is the blank
>> columns in order; I think the OP was looking for a list of the
>> _non_blank columns. But that's a trivial difference, easy to tweak.
>>
>> ChrisA
>
> Thanks to you all. I got it this far. But while writing back to another
> csv file, I got this error - "ValueError: dict contains fields not in
> fieldnames: None". Here is my code.
>
> rdr = csv.DictReader(fin, delimiter=',')
> header_set = set(rdr.fieldnames)

Initially, header_set contains all of the field names.

> for r in rdr:
>      header_set = set(h for h in header_set if not r[h])

Keeping the field name if the field is empty.

>      if not header_set:
>          break
>
At this point, header_set will contain the field names where none of
its values are empty.

Wasn't the original question about excluding columns where all of the
values are empty? You're excluding columns where _any_ of the values
are empty.

> for r in rdr:
>      data = list(r[i] for i in header_set)

data will contain each processed row in turn. Because of the
indentation, only the final data (row) will be would be written out.

>
> dw = csv.DictWriter(fout, header_set)
> dw.writeheader()
> dw.writerows(data)
>
> Also, there is difference between len(header_set) and len(data[0].keys).
> Why is so?
> Thanks again for all your help.
>
> Thanks.
>

[toc] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web