Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #32652

Re: csv read clean up and write out to csv

From Neil Cerutti <neilc@norwich.edu>
Newsgroups comp.lang.python
Subject Re: csv read clean up and write out to csv
Date 2012-11-02 19:33 +0000
Organization Norwich University
Message-ID <afilglFndcnU1@mid.individual.net> (permalink)
References <10133786-7f4c-4b19-8678-a50781ff453d@googlegroups.com>

Show all headers | View raw


On 2012-11-02, Sacha Rook <sacharook@gmail.com> wrote:
> Hi
>
> I have a problem with a csv file from a supplier, so they
> export data to csv however the last column in the record is a
> description which is marked up with html.
>
> trying to automate the processing of this csv to upload
> elsewhere in a useable format. If i open the csv with csved it
> looks like all the records aren't escaped correctly as after a
> while i find html tags and text on the next line/record.

Maybe compose a simple parter to disambiguate the lines from the
file.

Something like (you'll have to write is_html, and my Python 2 is
mighty rusty, you'll have to fix up. Note that infile doesn't
have to be in binary mode with this scheme, but it would fail on
bizarre newlines in the file):

def parse_records(iter):
    for line in iter:
        if is_html(line):
	    yield ('html', line)
	else:
	    yield ('csv', csv.reader([line.strip()]).next())

infile = open('c:\data\input.csv')
outfile = open('c:\data\output.csv', 'wb')

writer = csv.writer(outfile)

for tag, rec in parse_record(infile):
    if tag == 'html':
        print rec
    elif tag == 'csv':
        writer.writerow(rec)
    else:
        raise ValueError("Unknown record type %s" % tag)

-- 
Neil Cerutti

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

csv read clean up and write out to csv Sacha Rook <sacharook@gmail.com> - 2012-11-02 10:25 -0700
  Re: csv read clean up and write out to csv Neil Cerutti <neilc@norwich.edu> - 2012-11-02 19:33 +0000
  Re: csv read clean up and write out to csv Hans Mulder <hansmu@xs4all.nl> - 2012-11-02 20:51 +0100

csiph-web