Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #73936

Re: fixing an horrific formatted csv file.

X-Received by 10.66.169.79 with SMTP id ac15mr4488698pac.48.1404447135762; Thu, 03 Jul 2014 21:12:15 -0700 (PDT)
X-Received by 10.50.131.198 with SMTP id oo6mr318508igb.1.1404447135592; Thu, 03 Jul 2014 21:12:15 -0700 (PDT)
Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!hn18no2751372igb.0!news-out.google.com!a8ni7663qaq.1!nntp.google.com!hn18no2751370igb.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups comp.lang.python
Date Thu, 3 Jul 2014 21:12:15 -0700 (PDT)
In-Reply-To <mailman.11411.1404316334.18130.python-list@python.org>
Complaints-To groups-abuse@google.com
Injection-Info glegroupsg2000goo.googlegroups.com; posting-host=101.161.167.40; posting-account=5Cd8QAoAAAC6AxpkrISTgUBJ9ktgwNBm
NNTP-Posting-Host 101.161.167.40
References <47e2e29d-b5c3-4aa6-abf9-3b1e46eb0dec@googlegroups.com> <mailman.11385.1404247829.18130.python-list@python.org> <0d3871c6-81d4-4168-9408-ad85299b0955@googlegroups.com> <mailman.11392.1404264061.18130.python-list@python.org> <a84826ea-4018-40bc-88c1-812be5417e6b@googlegroups.com> <mailman.11411.1404316334.18130.python-list@python.org>
User-Agent G2/1.0
MIME-Version 1.0
Message-ID <11ecf009-6f81-4fa5-bee9-b52b9407f0af@googlegroups.com> (permalink)
Subject Re: fixing an horrific formatted csv file.
From flebber <flebber.crue@gmail.com>
Injection-Date Fri, 04 Jul 2014 04:12:15 +0000
Content-Type text/plain; charset=ISO-8859-1
Xref csiph.com comp.lang.python:73936

Show key headers only | View raw


I have taken the code and gone a little further, but I need to be able to protect myself against commas and single quotes in names.

How is it the best to do this?

so in my file I had on line 44 this trainer name.

"Michael, Wayne & John Hawkes" 

and in line 95 this horse name.
Inz'n'out

this throws of my capturing correct item 9. How do I protect against this?

Here is current code.

import re
from sys import argv
SCRIPT, FILENAME = argv


def out_file_name(file_name):
    """take an input file and keep the name with appended _clean"""
    file_parts = file_name.split(".",)
    output_file = file_parts[0] + '_clean.' + file_parts[1]
    return output_file


def race_table(text_file):
    """utility to reorganise poorly made csv entry"""
    input_table = [[item.strip(' "') for item in record.split(',')]
                   for record in text_file.splitlines()]
    # At this point look at input_table to find the record indices
    output_table = []
    for record in input_table:
        if record[0] == 'Meeting':
            meeting = record[3]
        elif record[0] == 'Race':
            date = record[13]
            race = record[1]
        elif record[0] == 'Horse':
            number = record[1]
            name = record[2]
            results = record[9]
            res_split = re.split('[- ]', results)
            starts = res_split[0]
            wins = res_split[1]
            seconds = res_split[2]
            thirds = res_split[3]
            prizemoney = res_split[4]
            trainer = record[4]
            location = record[5]
            print(name, wins, seconds)
            output_table.append((meeting, date, race, number, name,
                                 starts, wins, seconds, thirds, prizemoney,
                                 trainer, location))
    return output_table

MY_FILE = out_file_name(FILENAME)

# with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out:
#     for line in race_table(f_in.readline()):
#         new_row = line
with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out:
    CONTENT = f_in.read()
    # print(content)
    FILE_CONTENTS = race_table(CONTENT)
    # print new_name
    f_out.write(str(FILE_CONTENTS))


if __name__ == '__main__':
    pass

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

fixing an horrific formatted csv file. flebber <flebber.crue@gmail.com> - 2014-07-01 07:04 -0700
  Re: fixing an horrific formatted csv file. MRAB <python@mrabarnett.plus.com> - 2014-07-01 15:32 +0100
  Re: fixing an horrific formatted csv file. "F.R." <anthra.norell@bluewin.ch> - 2014-07-01 22:49 +0200
    Re: fixing an horrific formatted csv file. flebber <flebber.crue@gmail.com> - 2014-07-01 14:41 -0700
      Re: fixing an horrific formatted csv file. Chris Angelico <rosuav@gmail.com> - 2014-07-02 11:20 +1000
        Re: fixing an horrific formatted csv file. flebber <flebber.crue@gmail.com> - 2014-07-02 02:13 -0700
          Re: fixing an horrific formatted csv file. "F.R." <anthra.norell@bluewin.ch> - 2014-07-02 17:51 +0200
            Re: fixing an horrific formatted csv file. flebber <flebber.crue@gmail.com> - 2014-07-03 21:12 -0700
              Re: fixing an horrific formatted csv file. Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-07-04 18:19 +1200
                Re: fixing an horrific formatted csv file. flebber <flebber.crue@gmail.com> - 2014-07-04 03:48 -0700
              Re: fixing an horrific formatted csv file. flebber <flebber.crue@gmail.com> - 2014-07-04 03:28 -0700
                Re: fixing an horrific formatted csv file. "F.R." <anthra.norell@bluewin.ch> - 2014-07-04 15:24 +0200

csiph-web