Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #37120
| Path | csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <msirenef@lightbird.net> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'python.': 0.02; "'',": 0.07; 'data:': 0.07; 'valueerror:': 0.07; 'python': 0.09; '(same': 0.09; 'arg': 0.09; 'cvs': 0.09; 'field)': 0.09; 'format:': 0.09; 'formatted': 0.09; 'python:': 0.09; 'skip:[ 40': 0.09; 'snippet': 0.09; 'unpack': 0.09; 'subject:Help': 0.10; '2.7.3': 0.16; 'co,': 0.16; 'csv': 0.16; 'decode': 0.16; 'module:': 0.16; 'received:74.55.86': 0.16; 'received:74.55.86.74': 0.16; 'received:smtp.webfaction.com': 0.16; 'received:webfaction.com': 0.16; 'unpacked.': 0.16; 'wrote:': 0.17; '>>>': 0.18; 'appears': 0.18; '(or': 0.18; 'skip:" 40': 0.20; 'trying': 0.21; 'import': 0.21; '"",': 0.22; "skip:' 40": 0.22; 'split': 0.23; 'linux': 0.24; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'values': 0.26; 'appreciated.': 0.26; '(most': 0.27; 'tree': 0.27; 'lines': 0.28; 'actual': 0.28; 'record': 0.28; '>>>>': 0.29; 'source': 0.29; "i'm": 0.29; "skip:' 10": 0.30; 'skip:( 40': 0.30; 'error': 0.30; 'code': 0.31; 'file': 0.32; 'print': 0.32; 'comments': 0.33; 'null': 0.33; 'subject:data': 0.33; 'traceback': 0.33; 'to:addr:python-list': 0.33; 'list': 0.35; 'follows:': 0.35; 'saved': 0.35; 'pm,': 0.35; 'there': 0.35; 'really': 0.36; 'skip:m 40': 0.36; 'too': 0.36; 'does': 0.37; 'why': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'files': 0.38; 'object': 0.38; 'to:addr:python.org': 0.39; 'received:192': 0.39; 'received:192.168': 0.40; 'note:': 0.64; 'family': 0.68; 'cape': 0.91 |
| Date | Sun, 20 Jan 2013 17:14:30 -0500 |
| From | Mitya Sirenef <msirenef@lightbird.net> |
| User-Agent | Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121011 Thunderbird/16.0.1 |
| MIME-Version | 1.0 |
| To | python-list@python.org |
| Subject | Re: RE Help splitting CVS data |
| References | <3e1e8567-b9f4-446a-8a59-75f45367d2ac@googlegroups.com> |
| In-Reply-To | <3e1e8567-b9f4-446a-8a59-75f45367d2ac@googlegroups.com> |
| Content-Type | text/plain; charset=ISO-8859-1; format=flowed |
| Content-Transfer-Encoding | 7bit |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.707.1358720081.2939.python-list@python.org> (permalink) |
| Lines | 70 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1358720081 news.xs4all.nl 6893 [2001:888:2000:d::a6]:39045 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:37120 |
Show key headers only | View raw
On 01/20/2013 05:04 PM, Garry wrote:
> I'm trying to manipulate family tree data using Python.
> I'm using linux and Python 2.7.3 and have data files saved as Linux formatted cvs files
> The data appears in this format:
>
> Marriage,Husband,Wife,Date,Place,Source,Note0x0a
> Note: the Source field or the Note field can contain quoted data (same as the Place field)
>
> Actual data:
> [F0244],[I0690],[I0354],1916-06-08,"Neely's Landing, Cape Gir. Co, MO",,0x0a
> [F0245],[I0692],[I0355],1919-09-04,"Cape Girardeau Co, MO",,0x0a
>
> code snippet follows:
>
> import os
> import re
> #I'm using the following regex in an attempt to decode the data:
> RegExp2 = "^(\[[A-Z]\d{1,}\])\,(\[[A-Z]\d{1,}\])\,(\[[A-Z]\d{1,}\])\,(\d{,4}\-\d{,2}\-\d{,2})\,(.*|\".*\")\,(.*|\".*\")\,(.*|\".*\")"
> #
> line = "[F0244],[I0690],[I0354],1916-06-08,\"Neely's Landing, Cape Gir. Co, MO\",,"
> #
> (Marriage,Husband,Wife,Date,Place,Source,Note) = re.split(RegExp2,line)
> #
> #However, this does not decode the 7 fields.
> # The following error is displayed:
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ValueError: too many values to unpack
> #
> # When I use xx the fields apparently get unpacked.
> xx = re.split(RegExp2,line)
> #
>>>> print xx[0]
>>>> print xx[1]
> [F0244]
>>>> print xx[5]
> "Neely's Landing, Cape Gir. Co, MO"
>>>> print xx[6]
>>>> print xx[7]
>>>> print xx[8]
> Why is there an extra NULL field before and after my record contents?
> I'm stuck, comments and solutions greatly appreciated.
>
> Garry
>
Gosh, you really don't want to use regex to split csv lines like that....
Use csv module:
>>> s
'[F0244],[I0690],[I0354],1916-06-08,"Neely\'s Landing, Cape Gir. Co,
MO",,0x0a'
>>> import csv
>>> r = csv.reader([s])
>>> for l in r: print(l)
...
['[F0244]', '[I0690]', '[I0354]', '1916-06-08', "Neely's Landing, Cape
Gir. Co, MO", '', '0x0a']
the arg to csv.reader can be the file object (or a list of lines).
- mitya
--
Lark's Tongue Guide to Python: http://lightbird.net/larks/
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
RE Help splitting CVS data Garry <ggkraemer@gmail.com> - 2013-01-20 14:04 -0800
Re: RE Help splitting CVS data Mitya Sirenef <msirenef@lightbird.net> - 2013-01-20 17:14 -0500
Re: RE Help splitting CVS data Terry Reedy <tjreedy@udel.edu> - 2013-01-20 17:16 -0500
Re: Help splitting CVS data Dave Angel <d@davea.name> - 2013-01-20 17:21 -0500
Re: RE Help splitting CVS data Roy Smith <roy@panix.com> - 2013-01-20 19:00 -0500
Re: RE Help splitting CVS data Tim Chase <python.list@tim.thechases.com> - 2013-01-20 18:10 -0600
Re: RE Help splitting CVS data Garry <ggkraemer@gmail.com> - 2013-01-20 16:41 -0800
Re: RE Help splitting CVS data Chris Angelico <rosuav@gmail.com> - 2013-01-21 12:30 +1100
Re: RE Help splitting CVS data Alister <alister.ware@ntlworld.com> - 2013-01-21 08:28 +0000
Re: RE Help splitting CVS data Neil Cerutti <neilc@norwich.edu> - 2013-01-21 14:12 +0000
csiph-web