Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #44184

Re: There must be a better way

References (3 earlier) <mailman.869.1366506610.3114.python-list@python.org> <kl0opb$pcr$1@theodyn.ncf.ca> <atl0i2Fto6uU2@mid.individual.net> <kl3stb$5ck$1@theodyn.ncf.ca> <atnh2jFgv8iU1@mid.individual.net>
Date 2013-04-23 09:36 -0500
Subject Re: There must be a better way
From Skip Montanaro <skip@pobox.com>
Newsgroups comp.lang.python
Message-ID <mailman.975.1366727824.3114.python-list@python.org> (permalink)

Show all headers | View raw


> But a csv.DictReader might still be more efficient.

Depends on what efficiency you care about.  The DictReader class is
implemented in Python, and builds a dict for every row.  It will never
be more efficient CPU-wise than instantiating the csv.reader type
directly and only doing what you need.

OTOH, the DictReader class "just works" and its usage is more obvious
when you come back later to modify your code.  It also makes the code
insensitive to column ordering (though yours seems to be as well, if
I'm reading it correctly).  On the programmer efficiency axis, I score
the DictReader class higher than the reader type.

A simple test:

##########################
import csv
from timeit import Timer

setup = '''import csv
lst = ["""a,b,c,d,e,f,g"""]
lst.extend(["""05:38:24,0.6326,1,0,1.0,0.0,0.0"""] * 1000000)
reader = csv.reader(lst)
dreader = csv.DictReader(lst)
'''

t1 = Timer("for row in reader: pass", setup)
t2 = Timer("for row in dreader: pass", setup)

print(min(t1.repeat(number=10)))
print(min(t2.repeat(number=10)))
###############################

demonstrates that the raw reader is, indeed, much faster than the DictReader:

0.972723007202
8.29047989845

but that's for the basic iteration.  Whatever you need to add to the
raw reader to insulate yourself from changes to the structure of the
CSV file and improve readability will slow it down, while the
DictReader will never be worse than the above.

Skip

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-20 19:46 -0400
  Re: There must be a better way Chris Rebert <clp2@rebertia.com> - 2013-04-20 16:57 -0700
  Re: There must be a better way Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-21 00:06 +0000
    Re: There must be a better way Tim Chase <python.list@tim.thechases.com> - 2013-04-20 19:34 -0500
    Re: There must be a better way Terry Jan Reedy <tjreedy@udel.edu> - 2013-04-20 21:07 -0400
      Re: There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-21 09:15 -0400
        Re: There must be a better way Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2013-04-21 16:39 +0300
          Re: There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-21 11:17 -0400
        Re: There must be a better way Peter Otten <__peter__@web.de> - 2013-04-21 15:43 +0200
          Re: There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-21 11:30 -0400
          Re: There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-21 11:30 -0400
        Re: There must be a better way Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-22 15:32 +0100
        Re: There must be a better way Neil Cerutti <neilc@norwich.edu> - 2013-04-22 14:42 +0000
          Re: There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-22 13:44 -0400
            Re: There must be a better way Neil Cerutti <neilc@norwich.edu> - 2013-04-23 13:36 +0000
              Re: There must be a better way Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 15:15 +0100
              Re: There must be a better way Tim Chase <python.list@tim.thechases.com> - 2013-04-23 09:30 -0500
              Re: There must be a better way Skip Montanaro <skip@pobox.com> - 2013-04-23 09:36 -0500
              Re: There must be a better way (correction) Tim Chase <python.list@tim.thechases.com> - 2013-04-23 10:02 -0500

csiph-web