Path: csiph.com!usenet.pasdenom.info!gegeweb.org!newsfeed.kamp.net!newsfeed.kamp.net!newsfeed.freenet.ag!news2.euro.net!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'python,': 0.02; 'programmer': 0.03; 'skip:[ 20': 0.04; 'column': 0.07; 'modify': 0.07; 'indeed,': 0.09; 'lst': 0.09; 'skip:# 30': 0.09; 'worse': 0.09; 'cc:addr:python-list': 0.11; 'changes': 0.15; "'''": 0.16; 'csv': 0.16; 'dict': 0.16; 'from:addr:pobox.com': 0.16; 'from:addr:skip': 0.16; 'insensitive': 0.16; 'insulate': 0.16; 'iteration.': 0.16; 'otoh,': 0.16; 'readability': 0.16; 'subject:There': 0.16; 'skip:# 20': 0.16; 'sender:addr:gmail.com': 0.17; 'code.': 0.18; 'later': 0.20; 'seems': 0.21; 'import': 0.22; 'cc:addr:python.org': 0.22; 'skip': 0.24; 'cc:2**0': 0.24; 'cc:no real name:2**0': 0.24; 'header:In-Reply-To:1': 0.27; 'skip:p 30': 0.29; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'code': 0.31; 'about.': 0.31; 'file': 0.32; 'class': 0.32; 'implemented': 0.33; 'raw': 0.33; 'reader': 0.33; 'basic': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'add': 0.35; 'doing': 0.36; 'depends': 0.38; 'whatever': 0.38; 'skip:8 10': 0.39; 'structure': 0.39; 'skip:l 50': 0.60; 'simple': 0.61; 'back': 0.62; 'more': 0.64; 'obvious': 0.74; 'score': 0.74; 'yourself': 0.78; '"just': 0.84; 'demonstrates': 0.84; 'yours': 0.88 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=Ro0LSW9cLEJ0MDHepbbqOZfbwnMpBZ2mcHSyoDdtdbc=; b=meHFxj0rUOVnnVGsKYDlvPHSuZ0Gs/M4b7Eo1Hw8lFAF9X+d7lycn6+KIdjOqjqeYB TXqIeEJV6hpceAcNpxsyQ02aNsncoG0cy9fJGitB5d4M9PdfTXTbyNrxYk3Wqe6gF2xJ E9EY5hz4j2TuARjHK+UDqSJ4kyIb00jo3KW9zG2essFhjzv9M5vCvaNJoGH5/P83wY8B j2RBSGdA7rEG+cO9FMQ4nDIudno++000JESzFYRpe6RxL0SJYIhFrJUazw1odVVC+arm Iq68cVAN6bRvm1jnhtWxyE1jZbKMDPPDQKAZeLKG6xvBgYJ1SOwb1gU0jXBZkEzZ4BfN 6yFg== MIME-Version: 1.0 X-Received: by 10.50.3.67 with SMTP id a3mr1291912iga.88.1366727815801; Tue, 23 Apr 2013 07:36:55 -0700 (PDT) Sender: skip.montanaro@gmail.com In-Reply-To: References: <51732d81$0$29977$c3e8da3$5496439d@news.astraweb.com> <20130420193422.25255e98@bigbox.christie.dr> Date: Tue, 23 Apr 2013 09:36:55 -0500 X-Google-Sender-Auth: jNhL18gIWqWCONQ2uPDe55u0wdY Subject: Re: There must be a better way From: Skip Montanaro To: Neil Cerutti Content-Type: text/plain; charset=UTF-8 Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 44 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1366727824 news.xs4all.nl 2201 [2001:888:2000:d::a6]:46219 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:44184 > But a csv.DictReader might still be more efficient. Depends on what efficiency you care about. The DictReader class is implemented in Python, and builds a dict for every row. It will never be more efficient CPU-wise than instantiating the csv.reader type directly and only doing what you need. OTOH, the DictReader class "just works" and its usage is more obvious when you come back later to modify your code. It also makes the code insensitive to column ordering (though yours seems to be as well, if I'm reading it correctly). On the programmer efficiency axis, I score the DictReader class higher than the reader type. A simple test: ########################## import csv from timeit import Timer setup = '''import csv lst = ["""a,b,c,d,e,f,g"""] lst.extend(["""05:38:24,0.6326,1,0,1.0,0.0,0.0"""] * 1000000) reader = csv.reader(lst) dreader = csv.DictReader(lst) ''' t1 = Timer("for row in reader: pass", setup) t2 = Timer("for row in dreader: pass", setup) print(min(t1.repeat(number=10))) print(min(t2.repeat(number=10))) ############################### demonstrates that the raw reader is, indeed, much faster than the DictReader: 0.972723007202 8.29047989845 but that's for the basic iteration. Whatever you need to add to the raw reader to insulate yourself from changes to the structure of the CSV file and improve readability will slow it down, while the DictReader will never be worse than the above. Skip