Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #37120

Re: RE Help splitting CVS data

Date 2013-01-20 17:14 -0500
From Mitya Sirenef <msirenef@lightbird.net>
Subject Re: RE Help splitting CVS data
References <3e1e8567-b9f4-446a-8a59-75f45367d2ac@googlegroups.com>
Newsgroups comp.lang.python
Message-ID <mailman.707.1358720081.2939.python-list@python.org> (permalink)

Show all headers | View raw


On 01/20/2013 05:04 PM, Garry wrote:
> I'm trying to manipulate family tree data using Python.
> I'm using linux and Python 2.7.3 and have data files saved as Linux formatted cvs files
> The data appears in this format:
>
> Marriage,Husband,Wife,Date,Place,Source,Note0x0a
> Note: the Source field or the Note field can contain quoted data (same as the Place field)
>
> Actual data:
> [F0244],[I0690],[I0354],1916-06-08,"Neely's Landing, Cape Gir. Co, MO",,0x0a
> [F0245],[I0692],[I0355],1919-09-04,"Cape Girardeau Co, MO",,0x0a
>
> code snippet follows:
>
> import os
> import re
> #I'm using the following regex in an attempt to decode the data:
> RegExp2 = "^(\[[A-Z]\d{1,}\])\,(\[[A-Z]\d{1,}\])\,(\[[A-Z]\d{1,}\])\,(\d{,4}\-\d{,2}\-\d{,2})\,(.*|\".*\")\,(.*|\".*\")\,(.*|\".*\")"
> #
> line = "[F0244],[I0690],[I0354],1916-06-08,\"Neely's Landing, Cape Gir. Co, MO\",,"
> #
> (Marriage,Husband,Wife,Date,Place,Source,Note) = re.split(RegExp2,line)
> #
> #However, this does not decode the 7 fields.
> # The following error is displayed:
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
> ValueError: too many values to unpack
> #
> # When I use xx the fields apparently get unpacked.
> xx = re.split(RegExp2,line)
> #
>>>> print xx[0]
>>>> print xx[1]
> [F0244]
>>>> print xx[5]
> "Neely's Landing, Cape Gir. Co, MO"
>>>> print xx[6]
>>>> print xx[7]
>>>> print xx[8]
> Why is there an extra NULL field before and after my record contents?
> I'm stuck, comments and solutions greatly appreciated.
>
> Garry
>


Gosh, you really don't want to use regex to split csv lines like that....

Use csv module:

 >>> s
'[F0244],[I0690],[I0354],1916-06-08,"Neely\'s Landing, Cape Gir. Co, 
MO",,0x0a'
 >>> import csv
 >>> r = csv.reader([s])
 >>> for l in r: print(l)
...
['[F0244]', '[I0690]', '[I0354]', '1916-06-08', "Neely's Landing, Cape 
Gir. Co, MO", '', '0x0a']


the arg to csv.reader can be the file object (or a list of lines).

  - mitya


-- 
Lark's Tongue Guide to Python: http://lightbird.net/larks/

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

RE Help splitting CVS data Garry <ggkraemer@gmail.com> - 2013-01-20 14:04 -0800
  Re: RE Help splitting CVS data Mitya Sirenef <msirenef@lightbird.net> - 2013-01-20 17:14 -0500
  Re: RE Help splitting CVS data Terry Reedy <tjreedy@udel.edu> - 2013-01-20 17:16 -0500
  Re: Help splitting CVS data Dave Angel <d@davea.name> - 2013-01-20 17:21 -0500
  Re: RE Help splitting CVS data Roy Smith <roy@panix.com> - 2013-01-20 19:00 -0500
  Re: RE Help splitting CVS data Tim Chase <python.list@tim.thechases.com> - 2013-01-20 18:10 -0600
  Re: RE Help splitting CVS data Garry <ggkraemer@gmail.com> - 2013-01-20 16:41 -0800
    Re: RE Help splitting CVS data Chris Angelico <rosuav@gmail.com> - 2013-01-21 12:30 +1100
    Re: RE Help splitting CVS data Alister <alister.ware@ntlworld.com> - 2013-01-21 08:28 +0000
    Re: RE Help splitting CVS data Neil Cerutti <neilc@norwich.edu> - 2013-01-21 14:12 +0000

csiph-web