Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #97449
| Subject | Re: Finding Blank Columns in CSV |
|---|---|
| References | (1 earlier) <CAPTjJmrdE-n=VazddU7PZS6E=rbBf1XVoz_g2ArxK6454D5qxw@mail.gmail.com> <CAPTjJmqYwD0=KNLaHu_w_sOjUoacQRXFhb4Zxdv6n++5A-Y8iQ@mail.gmail.com> <20151005090652.1c9faed7@bigbox.christie.dr> <CAPTjJmpv1OOEomjjaqQ5qiGQDHpxAR4RpxLxn1Rt+j=TQBz4Yg@mail.gmail.com> <mv0b1h$5rd$1@ger.gmane.org> |
| From | MRAB <python@mrabarnett.plus.com> |
| Date | 2015-10-06 16:03 +0100 |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.428.1444144027.28679.python-list@python.org> (permalink) |
On 2015-10-06 12:24, Jaydip Chakrabarty wrote: > On Tue, 06 Oct 2015 01:34:17 +1100, Chris Angelico wrote: > >> On Tue, Oct 6, 2015 at 1:06 AM, Tim Chase >> <python.list@tim.thechases.com> wrote: >>> That way, if you determine by line 3 that your million-row CSV file has >>> no blank columns, you can get away with not processing all million >>> rows. >> >> Sure, although that effectively means the entire job is moot. I kinda >> assume that the OP knows that there are some blank columns (maybe lots >> of them). The extra check is unnecessary unless it's actually plausible >> that there'll be no blanks whatsoever. >> >> Incidentally, you have an ordered_headers list which is the blank >> columns in order; I think the OP was looking for a list of the >> _non_blank columns. But that's a trivial difference, easy to tweak. >> >> ChrisA > > Thanks to you all. I got it this far. But while writing back to another > csv file, I got this error - "ValueError: dict contains fields not in > fieldnames: None". Here is my code. > > rdr = csv.DictReader(fin, delimiter=',') > header_set = set(rdr.fieldnames) Initially, header_set contains all of the field names. > for r in rdr: > header_set = set(h for h in header_set if not r[h]) Keeping the field name if the field is empty. > if not header_set: > break > At this point, header_set will contain the field names where none of its values are empty. Wasn't the original question about excluding columns where all of the values are empty? You're excluding columns where _any_ of the values are empty. > for r in rdr: > data = list(r[i] for i in header_set) data will contain each processed row in turn. Because of the indentation, only the final data (row) will be would be written out. > > dw = csv.DictWriter(fout, header_set) > dw.writeheader() > dw.writerows(data) > > Also, there is difference between len(header_set) and len(data[0].keys). > Why is so? > Thanks again for all your help. > > Thanks. >
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Re: Finding Blank Columns in CSV MRAB <python@mrabarnett.plus.com> - 2015-10-06 16:03 +0100
csiph-web