Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'causing': 0.04; 'argument': 0.05; '(python': 0.07; 'variables': 0.07; '[],': 0.09; 'debugger': 0.09; 'here?': 0.09; 'none)': 0.09; 'raises': 0.09; 'val': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; 'def': 0.12; 'bug': 0.12; '3],': 0.16; 'csv': 0.16; 'generator.': 0.16; 'iterable': 0.16; 'iterates': 0.16; 'iteration': 0.16; 'iterator': 0.16; 'iterator:': 0.16; 'optional': 0.16; 'robust.': 0.16; 'situation.': 0.16; 'subject:There': 0.16; 'susceptible': 0.16; 'exception': 0.16; 'index': 0.16; 'fix': 0.17; 'wrote:': 0.18; 'bit': 0.19; '>>>': 0.22; 'import': 0.22; 'handles': 0.22; 'cc:addr:python.org': 0.22; 'error': 0.23; 'simpler': 0.24; 'williams': 0.24; 'header': 0.24; '(or': 0.24; 'cc:2**0': 0.24; 'cc:no real name:2**0': 0.24; 'pass': 0.26; 'somewhere': 0.26; 'values': 0.27; 'header:In-Reply-To:1': 0.27; 'message- id:@mail.gmail.com': 0.30; "i'm": 0.30; 'code': 0.31; '"",': 0.31; 'context.': 0.31; 'file:': 0.31; 'fixing': 0.31; 'raised': 0.31; 'terminate': 0.31; 'file': 0.32; 'probably': 0.32; 'running': 0.33; '(most': 0.33; 'reader': 0.33; 'skip:# 10': 0.33; 'problem': 0.35; 'received:209.85': 0.35; 'received:209.85.220': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'yield': 0.36; 'next': 0.36; "didn't": 0.36; 'two': 0.37; 'list': 0.37; 'received:209': 0.37; 'track': 0.38; 'recent': 0.39; 'called': 0.40; 'how': 0.40; 'middle': 0.60; 'most': 0.60; 'took': 0.61; 'first': 0.61; 'skip:n 10': 0.64; 'more': 0.64; 'caused': 0.69; 'default': 0.69; 'cautious': 0.84; 'oscar': 0.84; 'you).': 0.95; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=OMpc3oHeNUzuDkYFdLSrSHSO1VsS92AHiiUfmBU4cCU=; b=toADbegZYvFRk/KaPzWQoLjT332p6J1p0vVlKbk/STVmYjR1RE0CBdv7UUI8OtKDl0 MydqQ1m4dPPIVSPhK/iyVgllsmhIsBejsoA01igVcyf5crpQlBDEowACR4hLZPj5DkZZ esqjtt3M9znjwSTGG9T0yG8GTMGrBEI7uLxXBxYh8Y4N2pIAwmpQnXNsjttsPEOdYRCE lBiNwiCfFQTIjA2aCfl3BSKo/L4lSVfxQxOfrtOHiDawHFf5l5P86pyffbgCiJVBn2Lg vJEmTiV7lGk7EI9aLTTHjYyb+1o1pKz4IMNMVWlMLeQ+hniU42hYObAmq7l2qYGcwleh AkEQ== X-Received: by 10.52.93.78 with SMTP id cs14mr19628127vdb.17.1366726541081; Tue, 23 Apr 2013 07:15:41 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <51732d81$0$29977$c3e8da3$5496439d@news.astraweb.com> <20130420193422.25255e98@bigbox.christie.dr> From: Oscar Benjamin Date: Tue, 23 Apr 2013 15:15:21 +0100 Subject: Re: There must be a better way To: Neil Cerutti Content-Type: text/plain; charset=ISO-8859-1 Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 83 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1366726549 news.xs4all.nl 2284 [2001:888:2000:d::a6]:55841 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:44180 On 23 April 2013 14:36, Neil Cerutti wrote: > On 2013-04-22, Colin J. Williams wrote: >> Since I'm only interested in one or two columns, the simpler >> approach is probably better. > > Here's a sketch of how one of my projects handles that situation. > I think the index variables are invaluable documentation, and > make it a bit more robust. (Python 3, so not every bit is > relevant to you). > > with open("today.csv", encoding='UTF-8', newline='') as today_file: > reader = csv.reader(today_file) > header = next(reader) I once had a bug that took a long time to track down and was caused by using next() without an enclosing try/except StopIteration (or the optional default argument to next). This is a sketch of how you can get the bug that I had: $ cat next.py #!/usr/bin/env python def join(iterables): '''Join iterable of iterables, stripping first item''' for iterable in iterables: iterator = iter(iterable) header = next(iterator) # Here's the problem for val in iterator: yield val data = [ ['foo', 1, 2, 3], ['bar', 4, 5, 6], [], # Whoops! Who put this empty iterable here? ['baz', 7, 8, 9], ] for x in join(data): print(x) $ ./next.py 1 2 3 4 5 6 The values 7, 8 and 9 are not printed but no error message is shown. This is because calling next on the iterator over the empty list raises a StopIteration that is not caught in the join generator. The StopIteration is then "caught" by the for loop that iterates over join() causing the loop to terminate prematurely. Since the exception is caught and cleared by the for loop there's no practical way to get a debugger to hook into the event that causes it. In my case this happened somewhere in the middle of a long running process. It was difficult to pin down what was causing this as the iteration was over non-constant data and I didn't know what I was looking for. As a result of the time spent fixing this I'm always very cautious about calling next() to think about what a StopIteration would do in context. In this case a StopIteration is raised when reading from an empty csv file: >>> import csv >>> with open('test.csv', 'w'): pass ... >>> with open('test.csv') as csvfile: ... reader = csv.reader(csvfile) ... header = next(reader) ... Traceback (most recent call last): File "", line 3, in StopIteration If that code were called from a generator then it would most likely be susceptible to the problem I'm describing. The fix is to use next(reader, None) or try/except StopIteration. Oscar