Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.albasani.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.004 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'string.': 0.05; 'tries': 0.07; 'string': 0.09; 'assuming': 0.09; 'difference,': 0.09; 'occurrences': 0.09; 'subject:Why': 0.09; 'useful,': 0.14; 'algorithm.': 0.16; 'eleven': 0.16; 'peek': 0.16; 'received:74.208.4.195': 0.16; 'subject:slow': 0.16; 'substring': 0.16; 'wrote:': 0.18; 'header:User-Agent:1': 0.23; 'post': 0.26; 'header:In-Reply-To:1': 0.27; 'rest': 0.29; 'character': 0.29; 'instruction': 0.29; 'room': 0.29; "doesn't": 0.30; 'characters': 0.30; 'matching': 0.30; 'program,': 0.31; "d'aprano": 0.31; 'steven': 0.31; 'probably': 0.32; 'maybe': 0.34; "i'd": 0.34; 'could': 0.34; 'test': 0.35; 'but': 0.35; 'doing': 0.36; 'subject:?': 0.36; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'expect': 0.39; 'to:addr:python.org': 0.39; 'even': 0.60; 'hope': 0.61; 'simple': 0.61; 'show': 0.63; 'received:74.208': 0.68; 'valid,': 0.84 Date: Tue, 18 Jun 2013 22:11:01 -0400 From: Dave Angel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6 MIME-Version: 1.0 To: python-list@python.org Subject: Re: Why is regex so slow? References: <51c10e9e$0$29973$c3e8da3$5496439d@news.astraweb.com> In-Reply-To: <51c10e9e$0$29973$c3e8da3$5496439d@news.astraweb.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:udPOzUlHLNINM4f34Vz6krGATm53FpTxL2dAzpNOJla hzMEzmmHGcyholbFJqOjiC59eZ/VKHcWOZXBtX/dduzwHXWsxt KkzSinHV7rub8xnceNfaPizGTT7JAMOQ1nvua5bJVqDaNoFbl/ eFTvVHVc47YBX1f3ZR96pMM6rqVgrpOSMy7BQMB9/m2/BujNYn CMylXPf1UKnUgq/UdN9tOivXSzT45yEHzlsVEzWtJUG5PUhkHR Ho/n5ZWI9A5CBoTEvxH3CEcj9LhxlufQLkSIs4Q4GFx9KOaCsm 53pDQUgawFmTZ9b9OYRRieXwxqpATTW2yHFVfglEqoGy7jJXiY JJyr5t3NufqAZtmVaaqE= X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 33 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1371607883 news.xs4all.nl 15866 [2001:888:2000:d::a6]:39287 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:48676 On 06/18/2013 09:51 PM, Steven D'Aprano wrote: > > Even if the regex engine is just as efficient at doing simple character > matching as `in`, and it probably isn't, your regex tries to match all > eleven characters of "ENQUEUEING" while the `in` test only has to match > three, "ENQ". > The rest of your post was valid, and useful, but there's a misconception in this paragraph; I hope you don't mind me pointing it out. In general, for simple substring searches, you can search for a large string faster than you can search for a smaller one. I'd expect if "ENQUEUING" in bigbuffer to be faster than if "ENQ" in bigbuffer assuming that all occurrences of ENQ will actually match the whole thing. If CPython's implementation doesn't show the speed difference, maybe there's some room for optimization. See Boyer-Moore if you want a peek at the algorithm. When I was writiing a simple search program, I could typically search for a 4-character string faster than REP SCASB could match a one character string. And that's a single instruction (with prefix). -- DaveA