Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!news.tele.dk!feed118.news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.003 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'attribute': 0.07; 'string': 0.09; 'logic': 0.09; 'lookup': 0.09; 'method:': 0.09; '"key': 0.16; '1.02': 0.16; '1.03': 0.16; '10000000': 0.16; 'all?': 0.16; 'comparison:': 0.16; 'indexerror:': 0.16; 'readability': 0.16; 'subject:bit': 0.16; 'sure.': 0.16; 'tuple': 0.16; 'index': 0.16; 'sender:addr:gmail.com': 0.17; 'wrote:': 0.18; 'thu,': 0.19; 'fit': 0.20; 'seems': 0.21; 'feb': 0.22; 'to:name:python-list@python.org': 0.22; "haven't": 0.24; 'skip:" 20': 0.27; 'header:In-Reply-To:1': 0.27; 'function': 0.29; 'fastest': 0.30; 'message-id:@mail.gmail.com': 0.30; '"",': 0.31; '100000': 0.31; '13,': 0.31; 'file': 0.32; 'stuff': 0.32; 'run': 0.32; '(most': 0.33; 'test': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'doing': 0.36; 'so,': 0.37; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'short': 0.38; 'recent': 0.39; 'expect': 0.39; 'to:addr:python.org': 0.39; 'space': 0.40; 'range': 0.61; 'simple': 0.61; 'further': 0.61; 'times': 0.62; 're:': 0.63; 'winner': 0.74; 'fast,': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=ZDCUcveCnrldpWqf/HtkvTegEzM3cr1TYryr4GxsQag=; b=EMqIurXte8MAQBu3akSKtHP+9CR7n5rBmUL/Rr/+G/XTqt8p4cCAPx7562sJ7O/elF LCRWBUX+VPVszCsVqLwdVNY+WB9L6U4XKCVAGLUGNzMb65CQ0/NsEaBUEuChbQRGmBGc VSmICFJ+6uUfxpJG/FEDOFx5si98S4cgpEoBDAS1mTGeh7aDmCDkWc5GJjMocCrgdaHR OlxoyDTb+MBaF48BeGtSvZJzufT78ZNrEpc8bGByQig0+JHoQ74sAWygQq3WJAwYLPB2 N3zg3OoROduw920mx2tVOY1UWXcHV5DcOqKmMvyyNg8OVu0al+779cT1zE3bgRYkcTDE LVnA== X-Received: by 10.205.33.75 with SMTP id sn11mr56437bkb.51.1392321601100; Thu, 13 Feb 2014 12:00:01 -0800 (PST) MIME-Version: 1.0 Sender: zachary.ware@gmail.com In-Reply-To: <4cc09129-43ee-4205-a24c-03f92b594abc@googlegroups.com> References: <4cc09129-43ee-4205-a24c-03f92b594abc@googlegroups.com> From: Zachary Ware Date: Thu, 13 Feb 2014 13:59:41 -0600 X-Google-Sender-Auth: U7d4zOgCAfAl_u7sj2uTWuhwvms Subject: Re: A curious bit of code... To: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 195 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1392322018 news.xs4all.nl 2975 [2001:888:2000:d::a6]:48643 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:66228 On Thu, Feb 13, 2014 at 12:37 PM, wrote: > I ran across this and I thought there must be a better way of doing it, but then after further consideration I wasn't so sure. > > if key[:1] + key[-1:] == '<>': ... > > > Some possibilities that occurred to me: > > if key.startswith('<') and key.endswith('>'): ... > > and: > > if (key[:1], key[-1:]) == ('<', '>'): ... > > > I haven't run these through a profiler yet, but it seems like the original might be the fastest after all? In a fit of curiosity, I did some timings: 'and'ed indexing: C:\tmp>py -m timeit -s "key = ''" "key[0] == '<' and key[-1] == '>'" 1000000 loops, best of 3: 0.35 usec per loop C:\tmp>py -m timeit -s "key = ''" 1000000 loops, best of 3: 0.398 usec per loop C:\tmp>py -m timeit -s "key = 'test>'" "key[0] == '<' and key[-1] == '>'" 1000000 loops, best of 3: 0.188 usec per loop C:\tmp>py -m timeit -s "key = 'test'" "key[0] == '<' and key[-1] == '>'" 10000000 loops, best of 3: 0.211 usec per loop C:\tmp>py -m timeit -s "key = ''" "key[0] == '<' and key[-1] == '>'" Traceback (most recent call last): File "P:\Python34\lib\timeit.py", line 292, in main x = t.timeit(number) File "P:\Python34\lib\timeit.py", line 178, in timeit timing = self.inner(it, self.timer) File "", line 6, in inner key[0] == '<' and key[-1] == '>' IndexError: string index out of range Slice concatenation: C:\tmp>py -m timeit -s "key = ''" "key[:1] + key[-1:] == '<>'" 1000000 loops, best of 3: 0.649 usec per loop C:\tmp>py -m timeit -s "key = ''" 1000000 loops, best of 3: 0.7 usec per loop C:\tmp>py -m timeit -s "key = 'test>'" "key[:1] + key[-1:] == '<>'" 1000000 loops, best of 3: 0.663 usec per loop C:\tmp>py -m timeit -s "key = 'test'" "key[:1] + key[-1:] == '<>'" 1000000 loops, best of 3: 0.665 usec per loop C:\tmp>py -m timeit -s "key = ''" "key[:1] + key[-1:] == '<>'" 1000000 loops, best of 3: 0.456 usec per loop String methods: C:\tmp>py -m timeit -s "key = ''" "key.startswith('<') and key.endswith('>')" 1000000 loops, best of 3: 1.03 usec per loop C:\tmp>py -m timeit -s "key = '')" 1000000 loops, best of 3: 1.02 usec per loop C:\tmp>py -m timeit -s "key = 'test>'" "key.startswith('<') and key.endswith('>')" 1000000 loops, best of 3: 0.504 usec per loop C:\tmp>py -m timeit -s "key = 'test'" "key.startswith('<') and key.endswith('>')" 1000000 loops, best of 3: 0.502 usec per loop C:\tmp>py -m timeit -s "key = ''" "key.startswith('<') and key.endswith('>')" 1000000 loops, best of 3: 0.49 usec per loop Tuple comparison: C:\tmp>py -m timeit -s "key = ''" "(key[:1], key[-1:]) == ('<', '>')" 1000000 loops, best of 3: 0.629 usec per loop C:\tmp>py -m timeit -s "key = '')" 1000000 loops, best of 3: 0.689 usec per loop C:\tmp>py -m timeit -s "key = 'test>'" "(key[:1], key[-1:]) == ('<', '>')" 1000000 loops, best of 3: 0.676 usec per loop C:\tmp>py -m timeit -s "key = 'test'" "(key[:1], key[-1:]) == ('<', '>')" 1000000 loops, best of 3: 0.675 usec per loop C:\tmp>py -m timeit -s "key = ''" "(key[:1], key[-1:]) == ('<', '>')" 1000000 loops, best of 3: 0.608 usec per loop re.match(): C:\tmp>py -m timeit -s "import re;key = ''" "re.match(r'^<.*>$', key)" 100000 loops, best of 3: 3.39 usec per loop C:\tmp>py -m timeit -s "import re;key = '$', key)" 100000 loops, best of 3: 3.27 usec per loop C:\tmp>py -m timeit -s "import re;key = 'test>'" "re.match(r'^<.*>$', key)" 100000 loops, best of 3: 2.94 usec per loop C:\tmp>py -m timeit -s "import re;key = 'test'" "re.match(r'^<.*>$', key)" 100000 loops, best of 3: 2.97 usec per loop C:\tmp>py -m timeit -s "import re;key = ''" "re.match(r'^<.*>$', key)" 100000 loops, best of 3: 2.97 usec per loop Pre-compiled re: C:\tmp>py -m timeit -s "import re;r = re.compile(r'^<.*>$');key = ''" "r.match(key)" 1000000 loops, best of 3: 0.932 usec per loop C:\tmp>py -m timeit -s "import re;r = re.compile(r'^<.*>$');key = 'py -m timeit -s "import re;r = re.compile(r'^<.*>$');key = 'test>'" "r.match(key)" 1000000 loops, best of 3: 0.718 usec per loop C:\tmp>py -m timeit -s "import re;r = re.compile(r'^<.*>$');key = 'test'" "r.match(key)" 1000000 loops, best of 3: 0.755 usec per loop C:\tmp>py -m timeit -s "import re;r = re.compile(r'^<.*>$');key = ''" "r.match(key)" 1000000 loops, best of 3: 0.731 usec per loop Pre-compiled re with pre-fetched method: C:\tmp>py -m timeit -s "import re;m = re.compile(r'^<.*>$').match;key = ''" "m(key)" 1000000 loops, best of 3: 0.777 usec per loop C:\tmp>py -m timeit -s "import re;m = re.compile(r'^<.*>$').match;key = 'py -m timeit -s "import re;m = re.compile(r'^<.*>$').match;key = 'test>'" "m(key)" 1000000 loops, best of 3: 0.652 usec per loop C:\tmp>py -m timeit -s "import re;m = re.compile(r'^<.*>$').match;key = 'test'" "m(key)" 1000000 loops, best of 3: 0.576 usec per loop C:\tmp>py -m timeit -s "import re;m = re.compile(r'^<.*>$').match;key = ''" "m(key)" 1000000 loops, best of 3: 0.58 usec per loop And the winner is: C:\tmp>py -m timeit -s "key = ''" "key and key[0] == '<' and key[-1] == '>'" 1000000 loops, best of 3: 0.388 usec per loop C:\tmp>py -m timeit -s "key = ''" 1000000 loops, best of 3: 0.413 usec per loop C:\tmp>py -m timeit -s "key = 'test>'" "key and key[0] == '<' and key[-1] == '>'" 1000000 loops, best of 3: 0.219 usec per loop C:\tmp>py -m timeit -s "key = 'test'" "key and key[0] == '<' and key[-1] == '>'" 1000000 loops, best of 3: 0.215 usec per loop C:\tmp>py -m timeit -s "key = ''" "key and key[0] == '<' and key[-1] == '>'" 10000000 loops, best of 3: 0.0481 usec per loop So, the moral of the story? Use short-circuit logic wherever you can, don't use re for simple stuff (because while it may be very fast, it's dominated by attribute lookup and function call overhead), and unless you expect to be doing this test many many millions of times in a very short space of time, go for readability over performance. -- Zach