Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #44181
| Date | 2013-04-23 09:30 -0500 |
|---|---|
| From | Tim Chase <python.list@tim.thechases.com> |
| Subject | Re: There must be a better way |
| References | (3 earlier) <mailman.869.1366506610.3114.python-list@python.org> <kl0opb$pcr$1@theodyn.ncf.ca> <atl0i2Fto6uU2@mid.individual.net> <kl3stb$5ck$1@theodyn.ncf.ca> <atnh2jFgv8iU1@mid.individual.net> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.973.1366727321.3114.python-list@python.org> (permalink) |
On 2013-04-23 13:36, Neil Cerutti wrote:
> On 2013-04-22, Colin J. Williams <cjw@ncf.ca> wrote:
> > Since I'm only interested in one or two columns, the simpler
> > approach is probably better.
>
> Here's a sketch of how one of my projects handles that situation.
> I think the index variables are invaluable documentation, and
> make it a bit more robust. (Python 3, so not every bit is
> relevant to you).
>
> with open("today.csv", encoding='UTF-8', newline='') as today_file:
> reader = csv.reader(today_file)
> header = next(reader)
> majr_index = header.index('MAJR')
> div_index = header.index('DIV')
> for rec in reader:
> major = rec[majr_index]
> rec[div_index] = DIVISION_TABLE[major]
>
> But a csv.DictReader might still be more efficient. I never
> tested. This is the only place I've used this "optimization".
> It's fast enough. ;)
I believe the csv module does all the work at c-level, rather than
as pure Python, so it should be notably faster. The only times I've
had to do things by hand like that are when there are header
peculiarities that I can't control, such as mismatched case or
added/remove punctuation (client files are notorious for this). So I
often end up doing something like
def normalize(header):
return header.strip().upper() # other cleanup as needed
reader = csv.reader(f)
headers = next(reader)
header_map = dict(
(normalize(header), i)
for i, header
in enumerate(headers)
)
item = lambda col: row[header_map[col]].strip()
for row in reader:
major = item("MAJR").upper()
division = item("DIV")
# ...
The function calling might add overhead (in which case one could
just use explicit indirect indexing for each value assignment:
major = row[header_map["MAJR"]].strip().upper()
but I usually find that processing CSV files leaves me I/O bound
rather than CPU bound.
-tkc
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-20 19:46 -0400
Re: There must be a better way Chris Rebert <clp2@rebertia.com> - 2013-04-20 16:57 -0700
Re: There must be a better way Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-21 00:06 +0000
Re: There must be a better way Tim Chase <python.list@tim.thechases.com> - 2013-04-20 19:34 -0500
Re: There must be a better way Terry Jan Reedy <tjreedy@udel.edu> - 2013-04-20 21:07 -0400
Re: There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-21 09:15 -0400
Re: There must be a better way Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2013-04-21 16:39 +0300
Re: There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-21 11:17 -0400
Re: There must be a better way Peter Otten <__peter__@web.de> - 2013-04-21 15:43 +0200
Re: There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-21 11:30 -0400
Re: There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-21 11:30 -0400
Re: There must be a better way Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-22 15:32 +0100
Re: There must be a better way Neil Cerutti <neilc@norwich.edu> - 2013-04-22 14:42 +0000
Re: There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-22 13:44 -0400
Re: There must be a better way Neil Cerutti <neilc@norwich.edu> - 2013-04-23 13:36 +0000
Re: There must be a better way Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 15:15 +0100
Re: There must be a better way Tim Chase <python.list@tim.thechases.com> - 2013-04-23 09:30 -0500
Re: There must be a better way Skip Montanaro <skip@pobox.com> - 2013-04-23 09:36 -0500
Re: There must be a better way (correction) Tim Chase <python.list@tim.thechases.com> - 2013-04-23 10:02 -0500
csiph-web