Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #25622

Re: Odd csv column-name truncation with only one column

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <python.list@tim.thechases.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'case.': 0.05; 'repeated': 0.07; 'python': 0.09; '"a"': 0.09; 'stringio': 0.09; 'dec': 0.15; 'finished': 0.15; '-tkc': 0.16; 'cstringio': 0.16; 'csv': 0.16; 'from:addr:python.list': 0.16; 'from:addr:tim.thechases.com': 0.16; 'from:name:tim chase': 0.16; 'i.e.,': 0.16; 'message- id:@tim.thechases.com': 0.16; 'received:70.251': 0.16; 'received:dsl.rcsntx.swbell.net': 0.16; 'received:rcsntx.swbell.net': 0.16; 'received:swbell.net': 0.16; 'stringio()': 0.16; 'stumbled': 0.16; 'wrote:': 0.17; 'typing': 0.17; 'tim': 0.18; '>>>': 0.18; 'appears': 0.18; 'module': 0.19; 'import': 0.21; 'example': 0.23; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'first,': 0.27; 'guess': 0.27; '>>>>': 0.29; 'chase': 0.29; 'thinks': 0.29; 'source': 0.29; 'file': 0.32; 'to:addr:python-list': 0.33; "can't": 0.34; 'community': 0.35; 'doing': 0.35; 'something': 0.35; 'subject:with': 0.36; 'subject:: ': 0.38; '2010,': 0.38; 'to:addr:python.org': 0.39; 'takes': 0.39; 'subject:-': 0.40; 'think': 0.40; 'back': 0.62; 'more': 0.63; '"what': 0.84; 'received:50.22': 0.84; 'to:name:python': 0.84
Date Thu, 19 Jul 2012 08:04:55 -0500
From Tim Chase <python.list@tim.thechases.com>
User-Agent Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111120 Icedove/3.1.16
MIME-Version 1.0
To Python <python-list@python.org>
Subject Re: Odd csv column-name truncation with only one column
References <5007EDD6.5020903@tim.thechases.com>
In-Reply-To <5007EDD6.5020903@tim.thechases.com>
Content-Type text/plain; charset=ISO-8859-1
Content-Transfer-Encoding 7bit
X-AntiAbuse This header was added to track abuse, please include it with any abuse report
X-AntiAbuse Primary Hostname - boston.accountservergroup.com
X-AntiAbuse Original Domain - python.org
X-AntiAbuse Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse Sender Address Domain - tim.thechases.com
X-Source
X-Source-Args
X-Source-Dir
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.2299.1342703031.4697.python-list@python.org> (permalink)
Lines 56
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1342703032 news.xs4all.nl 6918 [2001:888:2000:d::a6]:40930
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:25622

Show key headers only | View raw


On 07/19/12 06:21, Tim Chase wrote:
> tim@laptop:~/tmp$ python
> Python 2.6.6 (r266:84292, Dec 26 2010, 22:31:48)
> [GCC 4.4.5] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import csv
>>>> from cStringIO import StringIO
>>>> s = StringIO('Email\nfoo@example.com\nbar@example.org\n')
>>>> s.seek(0)
>>>> d = csv.Sniffer().sniff(s.read())
>>>> s.seek(0)
>>>> r = csv.DictReader(s, dialect=d)
>>>> r.fieldnames
> ['Emai', '']

I think I may have stumbled across the "what the heck is happening"
factor:

>>> import csv
>>> from cStringIO import StringIO
>>> s = StringIO('Email\nfoo@example.org\nbar@test.test\n')
>>> d = csv.Sniffer().sniff(s.read())
>>> s.seek(0)
>>> r = csv.DictReader(s, dialect=d)
>>> r.fieldnames
['Em', 'il']

It appears that it's finding something repeated [ed: Peter's &
Steven's replies came in as I finished typing this].  In my first,
it was the "l" appearing on each line, and in the 2nd example here,
it's the "a" on each line, so the csv module thinks that's the
delimiter.  The source file comes from an Excel-dialect generation:

>>> s = StringIO()
>>> w = csv.writer(s)
>>> w.writerows([["email"], ["foo@example.com"], ["bar@example.org"]])
>>> s.seek(0)
>>> d = csv.Sniffer().sniff(s.read())
>>> d.delimiter
'l'
>>> s.seek(0)
>>> r = csv.DictReader(s, dialect=d)
>>> r.fieldnames
['emai', '']


I guess it then takes the Python community to make the call on
whether the csv module is doing the right thing in the degenerate
case.  I.e., you can't get back out what you put in when you try to
sniff.

-tkc



Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Odd csv column-name truncation with only one column Tim Chase <python.list@tim.thechases.com> - 2012-07-19 08:04 -0500

csiph-web