Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!feeder.news-service.com!tudelft.nl!txtfeed1.tudelft.nl!dedekind.zen.co.uk!zen.net.uk!hamilton.zen.co.uk!shaftesbury.zen.co.uk.POSTED!not-for-mail
From: Nobody <nobody@nowhere.com>
Subject: Re: how to avoid leading white spaces
Date: Sat, 04 Jun 2011 20:44:56 +0100
User-Agent: Pan/0.14.2 (This is not a psychotic episode. It's a cleansing moment of clarity.)
Message-Id: <pan.2011.06.04.19.44.55.938000@nowhere.com>
Newsgroups: comp.lang.python
References: <BANLkTikjY3U9Y24s-GOEyi8CNqCFLXuG6g@mail.gmail.com> <9e861b0e-e768-401b-b5ca-190f20830a08@s9g2000yqm.googlegroups.com> <94ph22FrhvU5@mid.individual.net> <roy-E2FA6F.21571602062011@news.panix.com> <is9ikg083h@news1.newsguy.com> <94tgqfF4tiU1@mid.individual.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Lines: 17
Organization: Zen Internet
NNTP-Posting-Host: 9d6029ca.news.zen.co.uk
X-Trace: DXC=k`ocOIeI\K0f<^?D`?[F`:nok4Z\<mH49hDU:2=XKPk8[T9F]ajnEQ6QOMCU8L]Sh7QkQk67VUhc5
X-Complaints-To: abuse@zen.co.uk
Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:7019

On Sat, 04 Jun 2011 13:41:33 +1200, Gregory Ewing wrote:

>> Python might be penalized by its use of Unicode here, since a
>> Boyer-Moore table for a full 16-bit Unicode string would need
>> 65536 entries
> 
> But is there any need for the Boyer-Moore algorithm to
> operate on characters?
> 
> Seems to me you could just as well chop the UTF-16 up
> into bytes and apply Boyer-Moore to them, and it would
> work about as well.

No, because that won't care about alignment. E.g. on a big-endian
architecture, if you search for '\u2345' in the string '\u0123\u4567', it
will find a match (at an offset of 1 byte).