Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!feeder.news-service.com!feeder.erje.net!news.musoftware.de!wum.musoftware.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Gregory Ewing Newsgroups: comp.lang.python Subject: Re: how to avoid leading white spaces Date: Sat, 04 Jun 2011 13:41:33 +1200 Lines: 14 Message-ID: <94tgqfF4tiU1@mid.individual.net> References: <9e861b0e-e768-401b-b5ca-190f20830a08@s9g2000yqm.googlegroups.com> <94ph22FrhvU5@mid.individual.net> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: individual.net m93OQ4+9igmxx1lWfB64swSmFpwmfBeX1dok5duvIqzszDddoy Cancel-Lock: sha1:ZugQtXp7ZeUhqpPydy0iDsbq7X0= User-Agent: Mozilla Thunderbird 1.0.5 (Macintosh/20050711) X-Accept-Language: en-us, en In-Reply-To: Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:6990 Chris Torek wrote: > Python might be penalized by its use of Unicode here, since a > Boyer-Moore table for a full 16-bit Unicode string would need > 65536 entries But is there any need for the Boyer-Moore algorithm to operate on characters? Seems to me you could just as well chop the UTF-16 up into bytes and apply Boyer-Moore to them, and it would work about as well. -- Greg