Path: csiph.com!usenet.pasdenom.info!gegeweb.org!newsfeed.kamp.net!newsfeed.kamp.net!newsfeed.freenet.ag!news2.euro.net!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
Sender: skip.montanaro@gmail.com
In-Reply-To: <atnh2jFgv8iU1@mid.individual.net>
References: <kkv9bt$9bm$1@theodyn.ncf.ca> <51732d81$0$29977$c3e8da3$5496439d@news.astraweb.com> <20130420193422.25255e98@bigbox.christie.dr> <mailman.869.1366506610.3114.python-list@python.org> <kl0opb$pcr$1@theodyn.ncf.ca> <atl0i2Fto6uU2@mid.individual.net> <kl3stb$5ck$1@theodyn.ncf.ca> <atnh2jFgv8iU1@mid.individual.net>
Date: Tue, 23 Apr 2013 09:36:55 -0500
Subject: Re: There must be a better way
From: Skip Montanaro <skip@pobox.com>
To: Neil Cerutti <neilc@norwich.edu>
Content-Type: text/plain; charset=UTF-8
Cc: python-list@python.org
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.975.1366727824.3114.python-list@python.org>
Lines: 44
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:44184

> But a csv.DictReader might still be more efficient.

Depends on what efficiency you care about.  The DictReader class is
implemented in Python, and builds a dict for every row.  It will never
be more efficient CPU-wise than instantiating the csv.reader type
directly and only doing what you need.

OTOH, the DictReader class "just works" and its usage is more obvious
when you come back later to modify your code.  It also makes the code
insensitive to column ordering (though yours seems to be as well, if
I'm reading it correctly).  On the programmer efficiency axis, I score
the DictReader class higher than the reader type.

A simple test:

##########################
import csv
from timeit import Timer

setup = '''import csv
lst = ["""a,b,c,d,e,f,g"""]
lst.extend(["""05:38:24,0.6326,1,0,1.0,0.0,0.0"""] * 1000000)
reader = csv.reader(lst)
dreader = csv.DictReader(lst)
'''

t1 = Timer("for row in reader: pass", setup)
t2 = Timer("for row in dreader: pass", setup)

print(min(t1.repeat(number=10)))
print(min(t2.repeat(number=10)))
###############################

demonstrates that the raw reader is, indeed, much faster than the DictReader:

0.972723007202
8.29047989845

but that's for the basic iteration.  Whatever you need to add to the
raw reader to insulate yourself from changes to the structure of the
CSV file and improve readability will slow it down, while the
DictReader will never be worse than the above.

Skip