Path: csiph.com!eternal-september.org!feeder.eternal-september.org!newsfeed1.swip.net!uio.no!news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!nzpost1.xs4all.net!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.013 X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; '""")': 0.09; "'rb')": 0.09; 'csv': 0.09; 'stringio': 0.09; 'subject:CSV': 0.09; 'modification': 0.15; '-tkc': 0.16; 'col': 0.16; 'cstringio': 0.16; 'from:addr:python.list': 0.16; 'from:addr:tim.thechases.com': 0.16; 'from:name:tim chase': 0.16; 'headers,': 0.16; 'row': 0.16; 'wrote:': 0.16; 'import': 0.24; 'header:In-Reply-To:1': 0.24; 'chris': 0.26; 'print': 0.30; 'skip:s 30': 0.31; 'file': 0.34; 'add': 0.34; 'that,': 0.34; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'received:10': 0.37; 'charset:us-ascii': 0.37; 'skip:p 20': 0.38; 'to:addr:python.org': 0.40; 'your': 0.60; 'determine': 0.61; 'here': 0.66; 'million': 0.74; 'fin': 0.84; 'received:10.235': 0.84; 'received:23': 0.84 X-Sender-Id: wwwh|x-authuser|tim@thechases.com X-Sender-Id: wwwh|x-authuser|tim@thechases.com X-MC-Relay: Neutral X-MailChannels-SenderId: wwwh|x-authuser|tim@thechases.com X-MailChannels-Auth-Id: wwwh X-MC-Loop-Signature: 1444054074421:1400611808 X-MC-Ingress-Time: 1444054074420 Date: Mon, 5 Oct 2015 09:06:52 -0500 From: Tim Chase To: python-list@python.org Subject: Re: Finding Blank Columns in CSV In-Reply-To: References: X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.25; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-AuthUser: tim@thechases.com X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 43 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1444054086 news.xs4all.nl 23821 [2001:888:2000:d::a6]:47496 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:97415 On 2015-10-06 00:51, Chris Angelico wrote: > fn = "tmp1.csv" > fin = open(fn, 'rb') > rdr = csv.DictReader(fin, delimiter=',') > # all the same down to here > blanks = set(rdr.fieldnames) > for row in rdr: > blanks = {col for col in blanks if not row[col]} > mt = [col for col in rdr.fieldnames if col not in blanks] > print mt My only other modification would be to add a check that, if you no longer have any blank columns, bail early from the loop: from cStringIO import StringIO import csv s = StringIO("""Name,Surname,Age,Sex abc,def,,M ,ghi,,F jkl,mno,, pqr,,,F """) dr = csv.DictReader(s) header_set = set(dr.fieldnames) for row in dr: header_set = set(h for h in header_set if not row[h]) if not header_set: # we no longer have any headers, bail early break ordered_headers = [h for h in dr.fieldnames if h in header_set] print(header_set) print(ordered_headers) That way, if you determine by line 3 that your million-row CSV file has no blank columns, you can get away with not processing all million rows. -tkc