Re: There must be a better way

Date	2013-04-23 09:30 -0500
From	Tim Chase <python.list@tim.thechases.com>
Subject	Re: There must be a better way
References	(3 earlier) <mailman.869.1366506610.3114.python-list@python.org> <kl0opb$pcr$1@theodyn.ncf.ca> <atl0i2Fto6uU2@mid.individual.net> <kl3stb$5ck$1@theodyn.ncf.ca> <atnh2jFgv8iU1@mid.individual.net>
Newsgroups	comp.lang.python
Message-ID	<mailman.973.1366727321.3114.python-list@python.org> (permalink)

Show all headers | View raw

On 2013-04-23 13:36, Neil Cerutti wrote:
> On 2013-04-22, Colin J. Williams <cjw@ncf.ca> wrote:
> > Since I'm only interested in one or two columns, the simpler
> > approach is probably better.
> 
> Here's a sketch of how one of my projects handles that situation.
> I think the index variables are invaluable documentation, and
> make it a bit more robust. (Python 3, so not every bit is
> relevant to you).
> 
> with open("today.csv", encoding='UTF-8', newline='') as today_file:
>     reader = csv.reader(today_file)
>     header = next(reader)
>     majr_index = header.index('MAJR')
>     div_index = header.index('DIV')
>     for rec in reader:
>         major = rec[majr_index]
>         rec[div_index] = DIVISION_TABLE[major]
> 
> But a csv.DictReader might still be more efficient. I never
> tested. This is the only place I've used this "optimization".
> It's fast enough. ;)

I believe the csv module does all the work at c-level, rather than
as  pure Python, so it should be notably faster.  The only times I've
had to do things by hand like that are when there are header
peculiarities that I can't control, such as mismatched case or
added/remove punctuation (client files are notorious for this).  So I
often end up doing something like

  def normalize(header):
    return header.strip().upper() # other cleanup as needed

  reader = csv.reader(f)
  headers = next(reader)
  header_map = dict(
    (normalize(header), i)
    for i, header
    in enumerate(headers)
    )
  item = lambda col: row[header_map[col]].strip()
  for row in reader:
    major = item("MAJR").upper()
    division = item("DIV")
    # ...

The function calling might add overhead (in which case one could
just use explicit indirect indexing for each value assignment:

  major = row[header_map["MAJR"]].strip().upper()

but I usually find that processing CSV files leaves me I/O bound
rather than CPU bound.

-tkc

Thread

There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-20 19:46 -0400
  Re: There must be a better way Chris Rebert <clp2@rebertia.com> - 2013-04-20 16:57 -0700
  Re: There must be a better way Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-21 00:06 +0000
    Re: There must be a better way Tim Chase <python.list@tim.thechases.com> - 2013-04-20 19:34 -0500
    Re: There must be a better way Terry Jan Reedy <tjreedy@udel.edu> - 2013-04-20 21:07 -0400
      Re: There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-21 09:15 -0400
        Re: There must be a better way Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2013-04-21 16:39 +0300
          Re: There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-21 11:17 -0400
        Re: There must be a better way Peter Otten <__peter__@web.de> - 2013-04-21 15:43 +0200
          Re: There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-21 11:30 -0400
          Re: There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-21 11:30 -0400
        Re: There must be a better way Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-22 15:32 +0100
        Re: There must be a better way Neil Cerutti <neilc@norwich.edu> - 2013-04-22 14:42 +0000
          Re: There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-22 13:44 -0400
            Re: There must be a better way Neil Cerutti <neilc@norwich.edu> - 2013-04-23 13:36 +0000
              Re: There must be a better way Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 15:15 +0100
              Re: There must be a better way Tim Chase <python.list@tim.thechases.com> - 2013-04-23 09:30 -0500
              Re: There must be a better way Skip Montanaro <skip@pobox.com> - 2013-04-23 09:36 -0500
              Re: There must be a better way (correction) Tim Chase <python.list@tim.thechases.com> - 2013-04-23 10:02 -0500

csiph-web