Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #32652
| Path | csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail |
|---|---|
| From | Neil Cerutti <neilc@norwich.edu> |
| Newsgroups | comp.lang.python |
| Subject | Re: csv read clean up and write out to csv |
| Date | 2 Nov 2012 19:33:41 GMT |
| Organization | Norwich University |
| Lines | 42 |
| Message-ID | <afilglFndcnU1@mid.individual.net> (permalink) |
| References | <10133786-7f4c-4b19-8678-a50781ff453d@googlegroups.com> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=us-ascii |
| Content-Transfer-Encoding | 7bit |
| X-Trace | individual.net aAM9YCOBhfx5KeG1Py022w/s7MTA/uGAqNEBmaN0z8ynf4ZRPSt1tAkarpMgV7+tVz |
| Cancel-Lock | sha1:7DS0hWND2TwxB556C6geMhiy/qc= |
| User-Agent | slrn/0.9.9p1/mm/ao (Win32) |
| Xref | csiph.com comp.lang.python:32652 |
Show key headers only | View raw
On 2012-11-02, Sacha Rook <sacharook@gmail.com> wrote:
> Hi
>
> I have a problem with a csv file from a supplier, so they
> export data to csv however the last column in the record is a
> description which is marked up with html.
>
> trying to automate the processing of this csv to upload
> elsewhere in a useable format. If i open the csv with csved it
> looks like all the records aren't escaped correctly as after a
> while i find html tags and text on the next line/record.
Maybe compose a simple parter to disambiguate the lines from the
file.
Something like (you'll have to write is_html, and my Python 2 is
mighty rusty, you'll have to fix up. Note that infile doesn't
have to be in binary mode with this scheme, but it would fail on
bizarre newlines in the file):
def parse_records(iter):
for line in iter:
if is_html(line):
yield ('html', line)
else:
yield ('csv', csv.reader([line.strip()]).next())
infile = open('c:\data\input.csv')
outfile = open('c:\data\output.csv', 'wb')
writer = csv.writer(outfile)
for tag, rec in parse_record(infile):
if tag == 'html':
print rec
elif tag == 'csv':
writer.writerow(rec)
else:
raise ValueError("Unknown record type %s" % tag)
--
Neil Cerutti
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
csv read clean up and write out to csv Sacha Rook <sacharook@gmail.com> - 2012-11-02 10:25 -0700 Re: csv read clean up and write out to csv Neil Cerutti <neilc@norwich.edu> - 2012-11-02 19:33 +0000 Re: csv read clean up and write out to csv Hans Mulder <hansmu@xs4all.nl> - 2012-11-02 20:51 +0100
csiph-web