Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!ecngs!feeder2.ecngs.de!newsfeed.freenet.ag!news2.euro.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <k25afd$7vj$1@news.albasani.net>
References: <roy-B37C28.21540103092012@news.panix.com> <504564ba$0$29978$c3e8da3$5496439d@news.astraweb.com> <k25afd$7vj$1@news.albasani.net>
Date: Wed, 5 Sep 2012 07:59:43 +1000
Subject: Re: Comparing strings from the back?
From: Chris Angelico <rosuav@gmail.com>
To: python-list@python.org
Content-Type: text/plain; charset=ISO-8859-1
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.194.1346795987.27098.python-list@python.org>
Lines: 25
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:28428

On Wed, Sep 5, 2012 at 2:32 AM, Johannes Bauer <dfnsonfsduifb@gmx.de> wrote:
> How do you arrive at that conclusion? When comparing two random strings,
> I just derived
>
> n = (256 / 255) * (1 - 256 ^ (-c))
>
> where n is the average number of character comparisons and c. The
> rationale as follows: The first character has to be compared in any
> case. The second with a probability of 1/256, the third with 1/(256^2)
> and so on.

That would be for comparing two random areas of memory. Python strings
don't have 256 options per character; and in terms of actual strings,
there's so many possibilities. The strings that a program is going to
compare for equality are going to use a vastly restricted alphabet;
for a lot of cases, there might be only a few dozen plausible
characters.

But even so, it's going to scale approximately linearly with the
string length. If they're really random, then yes, there's little
chance that either a 1MB string or a 2MB string will be the same, but
with real data, they might very well have a long common prefix. So
it's still going to be more or less O(n).

ChrisA