Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!goblin2!goblin.stu.neva.ru!feeder1.cambriumusenet.nl!feed.tweaknews.nl!194.109.133.85.MISMATCH!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.011 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'startup': 0.05; 'strings.': 0.07; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'thread,': 0.09; 'tuple.': 0.09; 'assume': 0.11; 'equality.': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'roy': 0.16; 'suggested,': 0.16; 'unequal': 0.16; 'string': 0.17; 'wrote:': 0.17; 'basically': 0.17; 'comparing': 0.17; 'string,': 0.17; 'tests': 0.18; 'versions': 0.20; 'testing': 0.24; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'looks': 0.26; 'wondering': 0.26; 'i.e.': 0.27; 'header:X -Complaints-To:1': 0.28; 'rest': 0.28; 'character.': 0.29; 'dan': 0.29; 'equality': 0.29; 'hash': 0.29; 'case,': 0.29; "i'm": 0.29; 'ends': 0.30; 'turns': 0.33; 'to:addr:python-list': 0.33; 'typically': 0.33; 'or,': 0.34; 'done': 0.34; 'along': 0.35; 'compared': 0.35; 'faster': 0.35; 'subject:?': 0.35; "won't": 0.35; 'received:org': 0.36; 'but': 0.36; 'characters': 0.36; 'compare': 0.36; 'two': 0.37; 'subject:: ': 0.38; 'store': 0.38; 'instead': 0.39; 'to:addr:python.org': 0.39; 'header:Received:5': 0.40; 'your': 0.60; 'most': 0.61; "you've": 0.61; "you'll": 0.62; 'time,': 0.62; 'situation': 0.62; 'more': 0.63; 'costs': 0.64; 'obvious': 0.71; 'smith': 0.71; 'discover': 0.72; 'received:109': 0.74; 'quicker': 0.84; 'real-life': 0.84; 'situations,': 0.84; 'strings)': 0.84 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Dan Goodman Subject: Re: Comparing strings from the back? Date: Mon, 10 Sep 2012 18:07:41 +0200 References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Gmane-NNTP-Posting-Host: 226.49.9.109.rev.sfr.net User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120824 Thunderbird/15.0 In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 27 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1347293273 news.xs4all.nl 6870 [2001:888:2000:d::a6]:54913 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:28828 On 04/09/2012 03:54, Roy Smith wrote: > Let's assume you're testing two strings for equality. You've already > done the obvious quick tests (i.e they're the same length), and you're > down to the O(n) part of comparing every character. > > I'm wondering if it might be faster to start at the ends of the strings > instead of at the beginning? If the strings are indeed equal, it's the > same amount of work starting from either end. But, if it turns out that > for real-life situations, the ends of strings have more entropy than the > beginnings, the odds are you'll discover that they're unequal quicker by > starting at the end. From the rest of the thread, it looks like in most situations it won't make much difference as typically very few characters need to be compared if they are unequal. However, if you were in a situation with many strings which were almost equal, the most general way to improve the situation might be store a hash of the string along with the string, i.e. store (hash(x), x) and then compare equality of this tuple. Almost all of the time, if the strings are unequal the hash will be unequal. Or, as someone else suggested, use interned versions of the strings. This is basically the same solution but even better. In this case, your startup costs will be higher (creating the strings) but your comparisons will always be instant. Dan