Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.024 X-Spam-Evidence: '*H*': 0.95; '*S*': 0.00; 'cache': 0.05; 'python': 0.09; 'semantics': 0.09; 'sep': 0.09; "wouldn't": 0.11; 'encoding': 0.15; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'roy': 0.16; 'sources,': 0.16; 'tweak': 0.16; 'wrote:': 0.17; 'tests.': 0.17; 'unicode': 0.17; 'received:209.85.214.174': 0.21; 'header:In-Reply-To:1': 0.25; 'wondering': 0.26; 'am,': 0.27; 'possibly': 0.27; 'message- id:@mail.gmail.com': 0.27; 'run': 0.28; "i'm": 0.29; 'ends': 0.30; 'safely': 0.33; 'problem': 0.33; 'to:addr:python-list': 0.33; 'received:google.com': 0.34; 'wrong': 0.34; 'compared': 0.35; 'faster': 0.35; 'subject:?': 0.35; 'received:209.85': 0.35; 'but': 0.36; 'possible': 0.37; 'two': 0.37; 'uses': 0.37; 'quite': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'mean': 0.38; 'some': 0.38; 'sure': 0.38; 'instead': 0.39; 'to:addr:python.org': 0.39; 'received:209.85.214': 0.39; 'header:Received:5': 0.40; 'most': 0.61; 'smith': 0.71; 'benchmark': 0.84; 'front.': 0.84; 'python- dev': 0.84; 'canonical': 0.91; 'overall,': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=LgVlSKoBojryII+la1z7WCebR/Aovu2wpfgi+sigLpU=; b=BAP1vlHMAETy2MPi6xz5HVkYo6RTQbEyapBmEjmxWnH7tckvN0lcoERLqV+wCTUIxm M/lTXLbaSpakLDoRDdWCWEkOCAcwxWVW3Ls1Xyht0QimsOsn7zb8FkwJAaIN6oWNosBe Nq6q7LF/RI2usLFQrMORzDOJ1HXIqICn221xfcgCoQ8Gw+2AvpRf50YfRbQCI7MSGwAl 1cAehlzCb5nzL8uF4M8buSTSNmS+DHzc8kD5ASI51BlMivCwNojo2ivU4zadHMgSkZ0P aKsZkTRqjmhg90fq8vyJ6LLTqZ2ry++Nk+tCW5O+EJrzVlLFXCZmD8FJgF6TpoXboBOD D2Rg== MIME-Version: 1.0 In-Reply-To: References: Date: Tue, 4 Sep 2012 12:07:22 +1000 Subject: Re: Comparing strings from the back? From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 19 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1346724446 news.xs4all.nl 6906 [2001:888:2000:d::a6]:44429 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:28383 On Tue, Sep 4, 2012 at 11:54 AM, Roy Smith wrote: > I'm wondering if it might be faster to start at the ends of the strings > instead of at the beginning? > I'm also not sure how this work with all the possible UCS/UTF encodings. > With some of them, you may get the encoding semantics wrong if you don't > start from the front. No problem there; Python uses only fixed-width encodings. Also, any canonical encoding can be safely compared byte-for-byte; two identical Unicode strings will be bit-wise identical in (say) UTF-8. There's issues of cache locality and such that quite possibly mean it's not going to be faster overall, but it wouldn't be difficult to tweak the Python sources, recompile, and run some tests. I'm sure python-dev or python-list will be most happy to discuss some benchmark figures! ChrisA