Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #73953
| Path | csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <anthra.norell@bluewin.ch> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'skip:[ 20': 0.04; 'elif': 0.05; 'indices': 0.07; 'sys': 0.07; 'string': 0.09; '[2]:': 0.09; '[3]:': 0.09; '__name__': 0.09; 'filename': 0.09; 'friday,': 0.09; 'iterate': 0.09; 'overflow': 0.09; 'throws': 0.09; 'def': 0.12; "'__main__':": 0.16; "'w')": 0.16; '[4]:': 0.16; 'csv': 0.16; 'parser.': 0.16; 'received:195.186': 0.16; 'received:bluewin.ch': 0.16; 'reinvent': 0.16; 'script,': 0.16; 'seconds,': 0.16; 'simplest': 0.16; 'undocumented': 0.16; 'unexpected': 0.16; 'wayne': 0.16; 'wrote:': 0.18; 'code.': 0.18; 'file,': 0.19; 'possible,': 0.19; 'skip:f 30': 0.19; 'stack': 0.19; 'starts': 0.20; 'input': 0.22; 'import': 0.22; 'print': 0.22; 'this?': 0.23; 'header:User-Agent:1': 0.23; 'tend': 0.24; 'meeting': 0.26; 'pass': 0.26; 'header:In-Reply-To:1': 0.27; 'record': 0.27; 'point': 0.28; 'correct': 0.29; 'code': 0.31; '"")': 0.31; 'along.': 0.31; 'names.': 0.31; 'quotes': 0.31; 'race,': 0.31; 'file': 0.32; 'advice': 0.35; 'but': 0.35; 'there': 0.35; 'module.': 0.36; 'seconds': 0.37; 'list': 0.37; 'list.': 0.37; 'being': 0.38; 'skip:o 20': 0.38; 'mine': 0.38; 'skip:[ 10': 0.38; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'little': 0.38; 'to:addr:python.org': 0.39; 'how': 0.40; 'easy': 0.60; 'number,': 0.60; 'gone': 0.61; 'john': 0.61; 'name': 0.63; 'july': 0.63; 'myself': 0.63; 'due': 0.66; 'here': 0.66; 'believe': 0.68; 'date,': 0.68; 'results': 0.69; 'further,': 0.74; 'protect': 0.79; 'horse': 0.84; 'trainer': 0.84; 'poorly': 0.93; 'race': 0.95 |
| Date | Fri, 04 Jul 2014 15:24:31 +0200 |
| From | "F.R." <anthra.norell@bluewin.ch> |
| User-Agent | Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 |
| MIME-Version | 1.0 |
| To | python-list@python.org |
| Subject | Re: fixing an horrific formatted csv file. |
| References | <47e2e29d-b5c3-4aa6-abf9-3b1e46eb0dec@googlegroups.com> <mailman.11385.1404247829.18130.python-list@python.org> <0d3871c6-81d4-4168-9408-ad85299b0955@googlegroups.com> <mailman.11392.1404264061.18130.python-list@python.org> <a84826ea-4018-40bc-88c1-812be5417e6b@googlegroups.com> <mailman.11411.1404316334.18130.python-list@python.org> <11ecf009-6f81-4fa5-bee9-b52b9407f0af@googlegroups.com> <c783c959-081d-42d1-8bd0-96342bd2557c@googlegroups.com> |
| In-Reply-To | <c783c959-081d-42d1-8bd0-96342bd2557c@googlegroups.com> |
| Content-Type | text/plain; charset=ISO-8859-1; format=flowed |
| Content-Transfer-Encoding | 7bit |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.11491.1404480345.18130.python-list@python.org> (permalink) |
| Lines | 180 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1404480345 news.xs4all.nl 2864 [2001:888:2000:d::a6]:39605 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:73953 |
Show key headers only | View raw
On 07/04/2014 12:28 PM, flebber wrote:
> On Friday, 4 July 2014 14:12:15 UTC+10, flebber wrote:
>> I have taken the code and gone a little further, but I need to be able to protect myself against commas and single quotes in names.
>>
>>
>>
>> How is it the best to do this?
>>
>>
>>
>> so in my file I had on line 44 this trainer name.
>>
>>
>>
>> "Michael, Wayne & John Hawkes"
>>
>>
>>
>> and in line 95 this horse name.
>>
>> Inz'n'out
>>
>>
>>
>> this throws of my capturing correct item 9. How do I protect against this?
>>
>>
>>
>> Here is current code.
>>
>>
>>
>> import re
>>
>> from sys import argv
>>
>> SCRIPT, FILENAME = argv
>>
>>
>>
>>
>>
>> def out_file_name(file_name):
>>
>> """take an input file and keep the name with appended _clean"""
>>
>> file_parts = file_name.split(".",)
>>
>> output_file = file_parts[0] + '_clean.' + file_parts[1]
>>
>> return output_file
>>
>>
>>
>>
>>
>> def race_table(text_file):
>>
>> """utility to reorganise poorly made csv entry"""
>>
>> input_table = [[item.strip(' "') for item in record.split(',')]
>>
>> for record in text_file.splitlines()]
>>
>> # At this point look at input_table to find the record indices
>>
>> output_table = []
>>
>> for record in input_table:
>>
>> if record[0] == 'Meeting':
>>
>> meeting = record[3]
>>
>> elif record[0] == 'Race':
>>
>> date = record[13]
>>
>> race = record[1]
>>
>> elif record[0] == 'Horse':
>>
>> number = record[1]
>>
>> name = record[2]
>>
>> results = record[9]
>>
>> res_split = re.split('[- ]', results)
>>
>> starts = res_split[0]
>>
>> wins = res_split[1]
>>
>> seconds = res_split[2]
>>
>> thirds = res_split[3]
>>
>> prizemoney = res_split[4]
>>
>> trainer = record[4]
>>
>> location = record[5]
>>
>> print(name, wins, seconds)
>>
>> output_table.append((meeting, date, race, number, name,
>>
>> starts, wins, seconds, thirds, prizemoney,
>>
>> trainer, location))
>>
>> return output_table
>>
>>
>>
>> MY_FILE = out_file_name(FILENAME)
>>
>>
>>
>> # with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out:
>>
>> # for line in race_table(f_in.readline()):
>>
>> # new_row = line
>>
>> with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out:
>>
>> CONTENT = f_in.read()
>>
>> # print(content)
>>
>> FILE_CONTENTS = race_table(CONTENT)
>>
>> # print new_name
>>
>> f_out.write(str(FILE_CONTENTS))
>>
>>
>>
>>
>>
>> if __name__ == '__main__':
>>
>> pass
> So I found this on stack overflow
>
> In [2]: import string
>
> In [3]: identity = string.maketrans("", "")
>
> In [4]: x = ['+5556', '-1539', '-99', '+1500']
>
> In [5]: x = [s.translate(identity, "+-") for s in x]
>
> In [6]: x
> Out[6]: ['5556', '1539', '99', '1500']
>
> but it fails in my file, due to I believe mine being a list of list. Is there an easy way to iterate the sublists without flattening?
>
> Current code.
>
> input_table = [[item.strip(' "') for item in record.split(',')]
> for record in text_file.splitlines()]
> # At this point look at input_table to find the record indices
> identity = string.maketrans("", "")
> print(input_table)
> input_table = [s.translate(identity, ",'") for s
> in input_table]
>
> Sayth
Take Gregory's advice and use the csv module. Don't reinvent a csv
parser. My "csv" splitter was the simplest approach possible, which I
tend to use with undocumented formats, tweaking for unexpected features
as they come along.
Frederic
Back to comp.lang.python | Previous | Next — Previous in thread | Find similar | Unroll thread
fixing an horrific formatted csv file. flebber <flebber.crue@gmail.com> - 2014-07-01 07:04 -0700
Re: fixing an horrific formatted csv file. MRAB <python@mrabarnett.plus.com> - 2014-07-01 15:32 +0100
Re: fixing an horrific formatted csv file. "F.R." <anthra.norell@bluewin.ch> - 2014-07-01 22:49 +0200
Re: fixing an horrific formatted csv file. flebber <flebber.crue@gmail.com> - 2014-07-01 14:41 -0700
Re: fixing an horrific formatted csv file. Chris Angelico <rosuav@gmail.com> - 2014-07-02 11:20 +1000
Re: fixing an horrific formatted csv file. flebber <flebber.crue@gmail.com> - 2014-07-02 02:13 -0700
Re: fixing an horrific formatted csv file. "F.R." <anthra.norell@bluewin.ch> - 2014-07-02 17:51 +0200
Re: fixing an horrific formatted csv file. flebber <flebber.crue@gmail.com> - 2014-07-03 21:12 -0700
Re: fixing an horrific formatted csv file. Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-07-04 18:19 +1200
Re: fixing an horrific formatted csv file. flebber <flebber.crue@gmail.com> - 2014-07-04 03:48 -0700
Re: fixing an horrific formatted csv file. flebber <flebber.crue@gmail.com> - 2014-07-04 03:28 -0700
Re: fixing an horrific formatted csv file. "F.R." <anthra.norell@bluewin.ch> - 2014-07-04 15:24 +0200
csiph-web