Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #48667

Re: Why is regex so slow?

X-Received by 10.224.129.196 with SMTP id p4mr276826qas.6.1371587005776; Tue, 18 Jun 2013 13:23:25 -0700 (PDT)
X-Received by 10.182.96.166 with SMTP id dt6mr33671obb.20.1371587005583; Tue, 18 Jun 2013 13:23:25 -0700 (PDT)
Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!bw2no2149176qab.0!news-out.google.com!y6ni3510qax.0!nntp.google.com!j2no674341qak.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups comp.lang.python
Date Tue, 18 Jun 2013 13:23:25 -0700 (PDT)
In-Reply-To <mailman.3557.1371585950.3114.python-list@python.org>
Complaints-To groups-abuse@google.com
Injection-Info glegroupsg2000goo.googlegroups.com; posting-host=66.114.87.114; posting-account=dAOV_goAAABEUkqpkMcBZ68KkIzfCnsO
NNTP-Posting-Host 66.114.87.114
References <kpq2r9$gg6$1@panix2.panix.com> <mailman.3557.1371585950.3114.python-list@python.org>
User-Agent G2/1.0
MIME-Version 1.0
Message-ID <67bebb63-5f5c-4824-bf5b-e61547425449@googlegroups.com> (permalink)
Subject Re: Why is regex so slow?
From Roy Smith <roy@panix.com>
Injection-Date Tue, 18 Jun 2013 20:23:25 +0000
Content-Type text/plain; charset=ISO-8859-1
Content-Transfer-Encoding quoted-printable
Xref csiph.com comp.lang.python:48667

Show key headers only | View raw


On Tuesday, June 18, 2013 4:05:25 PM UTC-4, Antoine Pitrou wrote:

> One invokes a fast special-purpose substring searching routine (the
> str.__contains__ operator), the other a generic matching engine able to
> process complex patterns. It's hardly a surprise for the specialized routine
> to be faster.

Except that the complexity in regexes is compiling the pattern down to a FSM.  Once you've got the FSM built, the inner loop should be pretty quick. In C, the inner loop for executing a FSM should be something like:

for(char* p = input; p; ++p) {
    next_state = current_state[*p];
    if (next_state == MATCH) {
        break;
   }
}

which should compile down to a couple of machine instructions which run entirely in the instruction pipeline cache.  But I'm probably simplifying it more than I should :-)

> (to be fair, on CPython there's also the fact that operators are faster
> than method calls, so some overhead is added by that too)

I've been doing some experimenting, and I'm inclined to believe this is indeed a significant part of it.  I also took some ideas from André Malo and factored out some name lookups from the inner loop.  That bummed me another 10% in speed.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Why is regex so slow? roy@panix.com (Roy Smith) - 2013-06-18 12:45 -0400
  Re: Why is regex so slow? Skip Montanaro <skip@pobox.com> - 2013-06-18 12:01 -0500
  Re: Why is regex so slow? Roy Smith <roy@panix.com> - 2013-06-18 13:08 -0400
  Re: Why is regex so slow? Chris Angelico <rosuav@gmail.com> - 2013-06-19 03:20 +1000
    Re: Why is regex so slow? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2013-06-18 20:10 +0200
      Re: Why is regex so slow? Roy Smith <roy@panix.com> - 2013-06-18 12:40 -0700
      Re: Why is regex so slow? André Malo <ndparker@gmail.com> - 2013-06-18 21:59 +0200
        Re: Why is regex so slow? André Malo <ndparker@gmail.com> - 2013-06-18 22:13 +0200
  Re: Why is regex so slow? MRAB <python@mrabarnett.plus.com> - 2013-06-18 18:31 +0100
  Re: Why is regex so slow? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-06-18 18:34 +0100
    Re: Why is regex so slow? roy@panix.com (Roy Smith) - 2013-06-18 15:21 -0400
      Re: Why is regex so slow? MRAB <python@mrabarnett.plus.com> - 2013-06-18 20:49 +0100
  Re: Why is regex so slow? Rick Johnson <rantingrickjohnson@gmail.com> - 2013-06-18 12:21 -0700
  Re: Why is regex so slow? Antoine Pitrou <solipsis@pitrou.net> - 2013-06-18 20:05 +0000
    Re: Why is regex so slow? Roy Smith <roy@panix.com> - 2013-06-18 13:23 -0700
      Re: Why is regex so slow? Duncan Booth <duncan.booth@invalid.invalid> - 2013-06-19 13:21 +0000
        Re: Why is regex so slow? Roy Smith <roy@panix.com> - 2013-06-19 12:55 -0700
    Re: Why is regex so slow? Grant Edwards <invalid@invalid.invalid> - 2013-06-18 20:30 +0000
      Re: Why is regex so slow? Terry Reedy <tjreedy@udel.edu> - 2013-06-18 17:29 -0400
      Re: Why is regex so slow? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2013-06-19 10:29 +0200
  Re: Why is regex so slow? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-19 01:51 +0000
    Re: Why is regex so slow? Dave Angel <davea@davea.name> - 2013-06-18 22:11 -0400
      Re: Why is regex so slow? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-19 03:16 +0000

csiph-web