Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #48676

Re: Why is regex so slow?

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.albasani.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <davea@davea.name>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.004
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; 'string.': 0.05; 'tries': 0.07; 'string': 0.09; 'assuming': 0.09; 'difference,': 0.09; 'occurrences': 0.09; 'subject:Why': 0.09; 'useful,': 0.14; 'algorithm.': 0.16; 'eleven': 0.16; 'peek': 0.16; 'received:74.208.4.195': 0.16; 'subject:slow': 0.16; 'substring': 0.16; 'wrote:': 0.18; 'header:User-Agent:1': 0.23; 'post': 0.26; 'header:In-Reply-To:1': 0.27; 'rest': 0.29; 'character': 0.29; 'instruction': 0.29; 'room': 0.29; "doesn't": 0.30; 'characters': 0.30; 'matching': 0.30; 'program,': 0.31; "d'aprano": 0.31; 'steven': 0.31; 'probably': 0.32; 'maybe': 0.34; "i'd": 0.34; 'could': 0.34; 'test': 0.35; 'but': 0.35; 'doing': 0.36; 'subject:?': 0.36; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'expect': 0.39; 'to:addr:python.org': 0.39; 'even': 0.60; 'hope': 0.61; 'simple': 0.61; 'show': 0.63; 'received:74.208': 0.68; 'valid,': 0.84
Date Tue, 18 Jun 2013 22:11:01 -0400
From Dave Angel <davea@davea.name>
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6
MIME-Version 1.0
To python-list@python.org
Subject Re: Why is regex so slow?
References <kpq2r9$gg6$1@panix2.panix.com> <51c10e9e$0$29973$c3e8da3$5496439d@news.astraweb.com>
In-Reply-To <51c10e9e$0$29973$c3e8da3$5496439d@news.astraweb.com>
Content-Type text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding 7bit
X-Provags-ID V02:K0:udPOzUlHLNINM4f34Vz6krGATm53FpTxL2dAzpNOJla hzMEzmmHGcyholbFJqOjiC59eZ/VKHcWOZXBtX/dduzwHXWsxt KkzSinHV7rub8xnceNfaPizGTT7JAMOQ1nvua5bJVqDaNoFbl/ eFTvVHVc47YBX1f3ZR96pMM6rqVgrpOSMy7BQMB9/m2/BujNYn CMylXPf1UKnUgq/UdN9tOivXSzT45yEHzlsVEzWtJUG5PUhkHR Ho/n5ZWI9A5CBoTEvxH3CEcj9LhxlufQLkSIs4Q4GFx9KOaCsm 53pDQUgawFmTZ9b9OYRRieXwxqpATTW2yHFVfglEqoGy7jJXiY JJyr5t3NufqAZtmVaaqE=
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.3566.1371607883.3114.python-list@python.org> (permalink)
Lines 33
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1371607883 news.xs4all.nl 15866 [2001:888:2000:d::a6]:39287
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:48676

Show key headers only | View raw


On 06/18/2013 09:51 PM, Steven D'Aprano wrote:

    <SNIP>
>
> Even if the regex engine is just as efficient at doing simple character
> matching as `in`, and it probably isn't, your regex tries to match all
> eleven characters of "ENQUEUEING" while the `in` test only has to match
> three, "ENQ".
>

The rest of your post was valid, and useful, but there's a misconception 
in this paragraph;  I hope you don't mind me pointing it out.

In general, for simple substring searches, you can search for a large 
string faster than you can search for a smaller one.  I'd expect

if "ENQUEUING" in bigbuffer

to be faster than

if "ENQ"  in bigbuffer

assuming that all occurrences of ENQ will actually match the whole 
thing.  If CPython's implementation doesn't show the speed difference, 
maybe there's some room for optimization.

See Boyer-Moore if you want a peek at the algorithm.

When I was writiing a simple search program, I could typically search 
for a 4-character string faster than REP SCASB could match a one 
character string.  And that's a single instruction (with prefix).
-- 
DaveA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Why is regex so slow? roy@panix.com (Roy Smith) - 2013-06-18 12:45 -0400
  Re: Why is regex so slow? Skip Montanaro <skip@pobox.com> - 2013-06-18 12:01 -0500
  Re: Why is regex so slow? Roy Smith <roy@panix.com> - 2013-06-18 13:08 -0400
  Re: Why is regex so slow? Chris Angelico <rosuav@gmail.com> - 2013-06-19 03:20 +1000
    Re: Why is regex so slow? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2013-06-18 20:10 +0200
      Re: Why is regex so slow? Roy Smith <roy@panix.com> - 2013-06-18 12:40 -0700
      Re: Why is regex so slow? André Malo <ndparker@gmail.com> - 2013-06-18 21:59 +0200
        Re: Why is regex so slow? André Malo <ndparker@gmail.com> - 2013-06-18 22:13 +0200
  Re: Why is regex so slow? MRAB <python@mrabarnett.plus.com> - 2013-06-18 18:31 +0100
  Re: Why is regex so slow? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-06-18 18:34 +0100
    Re: Why is regex so slow? roy@panix.com (Roy Smith) - 2013-06-18 15:21 -0400
      Re: Why is regex so slow? MRAB <python@mrabarnett.plus.com> - 2013-06-18 20:49 +0100
  Re: Why is regex so slow? Rick Johnson <rantingrickjohnson@gmail.com> - 2013-06-18 12:21 -0700
  Re: Why is regex so slow? Antoine Pitrou <solipsis@pitrou.net> - 2013-06-18 20:05 +0000
    Re: Why is regex so slow? Roy Smith <roy@panix.com> - 2013-06-18 13:23 -0700
      Re: Why is regex so slow? Duncan Booth <duncan.booth@invalid.invalid> - 2013-06-19 13:21 +0000
        Re: Why is regex so slow? Roy Smith <roy@panix.com> - 2013-06-19 12:55 -0700
    Re: Why is regex so slow? Grant Edwards <invalid@invalid.invalid> - 2013-06-18 20:30 +0000
      Re: Why is regex so slow? Terry Reedy <tjreedy@udel.edu> - 2013-06-18 17:29 -0400
      Re: Why is regex so slow? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2013-06-19 10:29 +0200
  Re: Why is regex so slow? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-19 01:51 +0000
    Re: Why is regex so slow? Dave Angel <davea@davea.name> - 2013-06-18 22:11 -0400
      Re: Why is regex so slow? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-19 03:16 +0000

csiph-web