Path: csiph.com!goblin1!goblin.stu.neva.ru!uio.no!news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!nzpost1.xs4all.net!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.003 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'encoded': 0.05; 'csv': 0.09; 'indicates': 0.09; 'subject:CSV': 0.09; 'file,': 0.15; 'skip:f 30': 0.15; "'rb').read()": 0.16; '_do_': 0.16; 'file?': 0.16; 'fn)': 0.16; 'from:addr:mrabarnett.plus.com': 0.16; 'from:addr:python': 0.16; 'from:name:mrab': 0.16; 'message- id:@mrabarnett.plus.com': 0.16; 'received:192.168.1.4': 0.16; 'wrote:': 0.16; 'found,': 0.18; '2015': 0.20; 'file.': 0.22; 'trying': 0.22; 'tried': 0.24; 'downloaded': 0.24; 'header:In- Reply-To:1': 0.24; 'header:User-Agent:1': 0.26; '+0200,': 0.27; 'looks': 0.29; 'you?': 0.30; 'another': 0.32; 'null': 0.33; 'open': 0.33; 'tue,': 0.34; 'file': 0.34; 'so,': 0.35; 'there': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'google': 0.39; 'why': 0.39; 'data': 0.39; 'received:192': 0.39; 'to:addr:python.org': 0.40; 'still': 0.40; 'some': 0.40; 'special': 0.73; 'otten': 0.84; 'contacts': 0.97 X-CM-Score: 0.00 X-CNFS-Analysis: v=2.1 cv=CvRCCSMD c=1 sm=1 tr=0 a=0nF1XD0wxitMEM03M9B4ZQ==:117 a=0nF1XD0wxitMEM03M9B4ZQ==:17 a=0Bzu9jTXAAAA:8 a=EBOSESyhAAAA:8 a=IkcTkHD0fZMA:10 a=o_rhbB355_oXZ7SOAjEA:9 a=QEXdDO2ut3YA:10 X-AUTH: mrabarnett@:2500 Subject: Re: Finding Blank Columns in CSV To: python-list@python.org References: <20151005090652.1c9faed7@bigbox.christie.dr> From: MRAB Date: Tue, 6 Oct 2015 19:25:12 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 40 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1444155917 news.xs4all.nl 23845 [2001:888:2000:d::a6]:55424 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:97455 On 2015-10-06 18:23, Jaydip Chakrabarty wrote: > On Tue, 06 Oct 2015 14:33:51 +0200, Peter Otten wrote: > [snip] > > I downloaded gmail contacts in google csv format. There are so many > columns. So I was trying to create another csv with the required columns. > Now when I tried to open the gmail csv file with csv DictReader, it said > the file contained NULL characters. Why would there be nulls in a CSV file? > So first I did - > > data = open(fn, 'rb').read() > fout = open(ofn, 'wb') > fout.write(data.replace('\x00', '')) > fout.close() > shutil.move(ofn, fn) > > Then I found, there were some special characters in the file. So, once > again I opened the file and did - > > data = open(fn, 'rb').read() > fout = open(ofn, 'wb') > fout.write(data.replace('\xff\xfe', '')) > fout.close() > shutil.move(ofn, fn) > b'\xff\xfe' looks like a BOM. If it's at the start of the file, it indicates that the file is encoded in 'UTF16-LE'. So, apparently, the original file was CSV encoded in 'UTF16-LE'. You _do_ still have the original file, don't you? [snip]