Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!feeder.news-service.com!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'skip': 0.04; 'encoded': 0.05; 'string,': 0.05; 'skipping': 0.07; 'received:81.103': 0.09; 'received:81.103.221': 0.09; 'received:81.103.221.35': 0.09; 'received:ispmail.ntl.com': 0.09; 'received:ntl.com': 0.09; 'wrote:': 0.14; 'from:addr:hobson42': 0.16; 'from:name:ian': 0.16; 'received:192.168.0.12': 0.16; 'surprising,': 0.16; 'utf8': 0.16; 'walked': 0.16; 'algorithm': 0.16; 'header:In-Reply-To:1': 0.21; 'variable': 0.21; 'string': 0.26; 'fixed': 0.27; 'subject:how': 0.29; 'anyway.': 0.29; 'bit': 0.30; 'table.': 0.30; 'skip:- 40': 0.32; 'to:addr:python-list': 0.33; 'chris': 0.34; 'characters': 0.34; 'header:User-Agent:1': 0.35; 'message-id:@gmail.com': 0.36; 'table': 0.37; 'could': 0.38; 'but': 0.38; 'subject:: ': 0.38; 'received:192': 0.38; 'should': 0.39; 'to:addr:python.org': 0.39; 'immediate': 0.64; 'proportional': 0.84; 'something.': 0.91 Date: Mon, 06 Jun 2011 22:04:06 +0100 From: Ian User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Mnenhy/0.8.3 Thunderbird/3.1.10 MIME-Version: 1.0 To: python-list@python.org Subject: Re: how to avoid leading white spaces References: <9e861b0e-e768-401b-b5ca-190f20830a08@s9g2000yqm.googlegroups.com> <94ph22FrhvU5@mid.individual.net> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Cloudmark-Analysis: v=1.1 cv=R50lirqlHffDPPkwUlkuVa99MrvKdVWo//yz83qex8g= c=1 sm=0 a=Py4kDrXLE7IA:10 a=-jB8BVGujlQA:10 a=nDghuxUhq_wA:10 a=8nJEP1OIZ-IA:10 a=wlzwd1Zrlf75w-QkRs8A:9 a=676o8texszvPIXHyFAcA:7 a=wPNLvfGTeEIA:10 a=HpAAvcLHHh0Zw7uRqdWCyQ==:117 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 19 NNTP-Posting-Host: 82.94.164.166 X-Trace: 1307394262 news.xs4all.nl 49045 [::ffff:82.94.164.166]:40652 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:7111 On 03/06/2011 03:58, Chris Torek wrote: > >> ------------------------------------------------- > This is a bit surprising, since both "s1 in s2" and re.search() > could use a Boyer-Moore-based algorithm for a sufficiently-long > fixed string, and the time required should be proportional to that > needed to set up the skip table. The re.compile() gets to re-use > the table every time. Is that true? My immediate thought is that Boyer-Moore would quickly give the number of characters to skip, but skipping them would be slow because UTF8 encoded characters are variable sized, and the string would have to be walked anyway. Or am I misunderstanding something. Ian