Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!ecngs!feeder2.ecngs.de!newsfeed.freenet.ag!news2.euro.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.016 X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'case.': 0.05; 'memory.': 0.05; 'python': 0.09; 'derived': 0.09; 'length.': 0.09; 'sep': 0.09; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'rationale': 0.16; 'wed,': 0.16; 'string': 0.17; 'wrote:': 0.17; 'comparing': 0.17; 'random': 0.24; 'second': 0.24; 'header:In- Reply-To:1': 0.25; 'common': 0.26; 'am,': 0.27; 'scale': 0.27; 'message-id:@mail.gmail.com': 0.27; 'options': 0.27; 'actual': 0.28; 'equality': 0.29; 'probability': 0.29; 'restricted': 0.29; 'strings,': 0.29; 'character': 0.29; 'received:209.85.210.174': 0.30; 'received:mail-iy0-f174.google.com': 0.30; 'cases,': 0.33; 'to:addr:python-list': 0.33; 'received:google.com': 0.34; 'third': 0.34; 'compared': 0.35; 'data,': 0.35; 'follows:': 0.35; 'so,': 0.35; 'subject:?': 0.35; 'received:209.85': 0.35; 'there': 0.35; 'really': 0.36; 'but': 0.36; 'compare': 0.36; 'two': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'to:addr:python.org': 0.39; 'little': 0.39; 'where': 0.40; 'header:Received:5': 0.40; 'real': 0.61; 'first': 0.61; 'chance': 0.61; 'more': 0.63; 'random,': 0.84; 'dozen': 0.91; 'same,': 0.91; 'average': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=y9sksEKoI0q8MPNq+wfij0qwtDszDh1SSjl85QVd4Y4=; b=O2vcNQoiGVl46kSxltmYWZVgZmMCmzvM0sruXXWDsozpkO1ghntv94eD8FbUe+CwsL xPtWbsj7Sr8JkeQCd/vb8T1vmz9a7iMM+FoNCAPuqwbZlJo1ohqxV8y6gfhiWcKW2BJQ 8nkZGM8g7I9lk8fXMlQjU0669anYiQzRZ3gUvsWWyWXOrJK8UshWcBnuzlGHW8NQGo4Y 0QqBBUCCmFD18ykbv5f/pFGdYEab3lX94NFDZEN4kzQKN+9SFIfVKUVFsHeCcBErcsRC iQoga4NoX6ueXs2LcNcx5Sxe+vK/v243+BUfhCe2/XpjMF1ehha7zkv87/1PF0NGeu67 ztZA== MIME-Version: 1.0 In-Reply-To: References: <504564ba$0$29978$c3e8da3$5496439d@news.astraweb.com> Date: Wed, 5 Sep 2012 07:59:43 +1000 Subject: Re: Comparing strings from the back? From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 25 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1346795987 news.xs4all.nl 6906 [2001:888:2000:d::a6]:46451 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:28428 On Wed, Sep 5, 2012 at 2:32 AM, Johannes Bauer wrote: > How do you arrive at that conclusion? When comparing two random strings, > I just derived > > n = (256 / 255) * (1 - 256 ^ (-c)) > > where n is the average number of character comparisons and c. The > rationale as follows: The first character has to be compared in any > case. The second with a probability of 1/256, the third with 1/(256^2) > and so on. That would be for comparing two random areas of memory. Python strings don't have 256 options per character; and in terms of actual strings, there's so many possibilities. The strings that a program is going to compare for equality are going to use a vastly restricted alphabet; for a lot of cases, there might be only a few dozen plausible characters. But even so, it's going to scale approximately linearly with the string length. If they're really random, then yes, there's little chance that either a 1MB string or a 2MB string will be the same, but with real data, they might very well have a long common prefix. So it's still going to be more or less O(n). ChrisA