Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'case.': 0.05; 'repeated': 0.07; 'python': 0.09; '"a"': 0.09; 'stringio': 0.09; 'dec': 0.15; 'finished': 0.15; '-tkc': 0.16; 'cstringio': 0.16; 'csv': 0.16; 'from:addr:python.list': 0.16; 'from:addr:tim.thechases.com': 0.16; 'from:name:tim chase': 0.16; 'i.e.,': 0.16; 'message- id:@tim.thechases.com': 0.16; 'received:70.251': 0.16; 'received:dsl.rcsntx.swbell.net': 0.16; 'received:rcsntx.swbell.net': 0.16; 'received:swbell.net': 0.16; 'stringio()': 0.16; 'stumbled': 0.16; 'wrote:': 0.17; 'typing': 0.17; 'tim': 0.18; '>>>': 0.18; 'appears': 0.18; 'module': 0.19; 'import': 0.21; 'example': 0.23; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'first,': 0.27; 'guess': 0.27; '>>>>': 0.29; 'chase': 0.29; 'thinks': 0.29; 'source': 0.29; 'file': 0.32; 'to:addr:python-list': 0.33; "can't": 0.34; 'community': 0.35; 'doing': 0.35; 'something': 0.35; 'subject:with': 0.36; 'subject:: ': 0.38; '2010,': 0.38; 'to:addr:python.org': 0.39; 'takes': 0.39; 'subject:-': 0.40; 'think': 0.40; 'back': 0.62; 'more': 0.63; '"what': 0.84; 'received:50.22': 0.84; 'to:name:python': 0.84 Date: Thu, 19 Jul 2012 08:04:55 -0500 From: Tim Chase User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111120 Icedove/3.1.16 MIME-Version: 1.0 To: Python Subject: Re: Odd csv column-name truncation with only one column References: <5007EDD6.5020903@tim.thechases.com> In-Reply-To: <5007EDD6.5020903@tim.thechases.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - boston.accountservergroup.com X-AntiAbuse: Original Domain - python.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - tim.thechases.com X-Source: X-Source-Args: X-Source-Dir: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 56 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1342703032 news.xs4all.nl 6918 [2001:888:2000:d::a6]:40930 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:25622 On 07/19/12 06:21, Tim Chase wrote: > tim@laptop:~/tmp$ python > Python 2.6.6 (r266:84292, Dec 26 2010, 22:31:48) > [GCC 4.4.5] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> import csv >>>> from cStringIO import StringIO >>>> s = StringIO('Email\nfoo@example.com\nbar@example.org\n') >>>> s.seek(0) >>>> d = csv.Sniffer().sniff(s.read()) >>>> s.seek(0) >>>> r = csv.DictReader(s, dialect=d) >>>> r.fieldnames > ['Emai', ''] I think I may have stumbled across the "what the heck is happening" factor: >>> import csv >>> from cStringIO import StringIO >>> s = StringIO('Email\nfoo@example.org\nbar@test.test\n') >>> d = csv.Sniffer().sniff(s.read()) >>> s.seek(0) >>> r = csv.DictReader(s, dialect=d) >>> r.fieldnames ['Em', 'il'] It appears that it's finding something repeated [ed: Peter's & Steven's replies came in as I finished typing this]. In my first, it was the "l" appearing on each line, and in the 2nd example here, it's the "a" on each line, so the csv module thinks that's the delimiter. The source file comes from an Excel-dialect generation: >>> s = StringIO() >>> w = csv.writer(s) >>> w.writerows([["email"], ["foo@example.com"], ["bar@example.org"]]) >>> s.seek(0) >>> d = csv.Sniffer().sniff(s.read()) >>> d.delimiter 'l' >>> s.seek(0) >>> r = csv.DictReader(s, dialect=d) >>> r.fieldnames ['emai', ''] I guess it then takes the Python community to make the call on whether the csv module is doing the right thing in the degenerate case. I.e., you can't get back out what you put in when you try to sniff. -tkc