Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!nntp-feed.chiark.greenend.org.uk!ewrotcd!news.nosignal.org!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'parsing': 0.07; 'raised': 0.07; 'suppress': 0.07; 'python': 0.09; '(it': 0.09; 'abort': 0.09; 'integers': 0.09; 'subject:string': 0.09; 'subject:using': 0.09; 'valueerror': 0.09; 'subject:error': 0.11; 'files.': 0.13; '24,': 0.16; 'correctness.': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'integers,': 0.16; 'wrote:': 0.17; 'script.': 0.17; 'string,': 0.17; 'thu,': 0.17; 'jan': 0.18; 'file.': 0.20; 'supposed': 0.21; "i'd": 0.22; "i've": 0.23; 'specifically': 0.24; 'header:In-Reply-To:1': 0.25; 'values': 0.26; 'checking': 0.27; 'first,': 0.27; 'guess': 0.27; 'wonder': 0.27; 'message-id:@mail.gmail.com': 0.27; 'chris': 0.28; 'fine': 0.28; 'complain': 0.29; 'convert': 0.29; 'that.': 0.30; 'usually': 0.30; 'subject: : ': 0.30; 'function': 0.30; 'error': 0.30; 'getting': 0.33; 'to:addr:python-list': 0.33; 'received:google.com': 0.34; 'compared': 0.35; 'doing': 0.35; 'pm,': 0.35; 'received:209.85.220': 0.35; 'sometimes': 0.35; 'received:209.85': 0.35; 'something': 0.35; 'there': 0.35; 'but': 0.36; 'method': 0.36; 'should': 0.36; 'too': 0.36; 'received:209': 0.37; 'well.': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'performance': 0.39; 'to:addr:python.org': 0.39; 'worth': 0.63; 'cast': 0.65; 'differences': 0.65; '2013': 0.84; 'front.': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=2xhNp7Cu0FiXUrOi2PGDNHkXL9sZOAGXm7l5pEKUWWY=; b=md2PPlzepaEvpBZcREZuFfQCYKenUFMai93QG+UBY5I3vgB3DjNdWBKpZSQMRVvqyU c6yrhXG5YNCwclawiKeZ/DxmbHAXj/oMdMCRb436THzOHzF1Bje2HjArrKGkzFo5NVrs p9o6LRpKpDv/ZjJLmQYQ4ZNA4Mq/V+8RYd2bV1nNDXRhH2E8PXjtiCAcdNjnq5aFMuqt 3bDjiN9vrCu0k+AwSnsEv/jpmlZk19Sufeg9rMKDL3b2nBlvzgFqnkAmJg7hnbtUIaq9 nOwdfTT2NJ8nen95PVPjKUrsqv5DkxYwBPboQ5koHA2aAD8M0hlkqs/GXpERZT7+GLA8 lbpQ== MIME-Version: 1.0 X-Received: by 10.66.90.35 with SMTP id bt3mr3780467pab.57.1359027322618; Thu, 24 Jan 2013 03:35:22 -0800 (PST) In-Reply-To: <51011822.3020702@tobix.eu> References: <51011822.3020702@tobix.eu> Date: Thu, 24 Jan 2013 22:35:22 +1100 Subject: Re: using split for a string : error From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 27 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1359027330 news.xs4all.nl 6867 [2001:888:2000:d::a6]:41460 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:37571 On Thu, Jan 24, 2013 at 10:16 PM, Tobias M. wrote: > Chris Angelico wrote: >> The other thing you may want to consider, if the values are supposed >> to be integers, is to convert them to Python integers before >> comparing. > > I thought of this too and I wonder if there are any major differences > regarding performance compared to using the strip() method when parsing > large files. > > In addition I guess one should catch the ValueError that might be raised by > the cast if there is something else than a number in the file. I'd not consider the performance, but the correctness. If you're expecting them to be integers, just cast them, and specifically _don't_ catch ValueError. Any non-integer value will then noisily abort the script. (It may be worth checking for blank first, though, depending on the data origin.) It's usually fine to have int() complain about any non-numerics in the string, but I must confess, I do sometimes yearn for atoi() semantics: atoi("123asd") == 123, and atoi("qqq") == 0. I've not seen a convenient Python function for doing that. Usually it involves manually getting the digits off the front. All I want is to suppress the error on finding a non-digit. Oh well. ChrisA