Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #32652
| From | Neil Cerutti <neilc@norwich.edu> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: csv read clean up and write out to csv |
| Date | 2012-11-02 19:33 +0000 |
| Organization | Norwich University |
| Message-ID | <afilglFndcnU1@mid.individual.net> (permalink) |
| References | <10133786-7f4c-4b19-8678-a50781ff453d@googlegroups.com> |
On 2012-11-02, Sacha Rook <sacharook@gmail.com> wrote:
> Hi
>
> I have a problem with a csv file from a supplier, so they
> export data to csv however the last column in the record is a
> description which is marked up with html.
>
> trying to automate the processing of this csv to upload
> elsewhere in a useable format. If i open the csv with csved it
> looks like all the records aren't escaped correctly as after a
> while i find html tags and text on the next line/record.
Maybe compose a simple parter to disambiguate the lines from the
file.
Something like (you'll have to write is_html, and my Python 2 is
mighty rusty, you'll have to fix up. Note that infile doesn't
have to be in binary mode with this scheme, but it would fail on
bizarre newlines in the file):
def parse_records(iter):
for line in iter:
if is_html(line):
yield ('html', line)
else:
yield ('csv', csv.reader([line.strip()]).next())
infile = open('c:\data\input.csv')
outfile = open('c:\data\output.csv', 'wb')
writer = csv.writer(outfile)
for tag, rec in parse_record(infile):
if tag == 'html':
print rec
elif tag == 'csv':
writer.writerow(rec)
else:
raise ValueError("Unknown record type %s" % tag)
--
Neil Cerutti
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
csv read clean up and write out to csv Sacha Rook <sacharook@gmail.com> - 2012-11-02 10:25 -0700 Re: csv read clean up and write out to csv Neil Cerutti <neilc@norwich.edu> - 2012-11-02 19:33 +0000 Re: csv read clean up and write out to csv Hans Mulder <hansmu@xs4all.nl> - 2012-11-02 20:51 +0100
csiph-web