Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #48712

Re: Why is regex so slow?

From Duncan Booth <duncan.booth@invalid.invalid>
Newsgroups comp.lang.python
Subject Re: Why is regex so slow?
Date 2013-06-19 13:21 +0000
Message-ID <XnsA1E492148973Aduncanbooth@127.0.0.1> (permalink)
References <kpq2r9$gg6$1@panix2.panix.com> <mailman.3557.1371585950.3114.python-list@python.org> <67bebb63-5f5c-4824-bf5b-e61547425449@googlegroups.com>

Show all headers | View raw


Roy Smith <roy@panix.com> wrote:

> Except that the complexity in regexes is compiling the pattern down to
> a FSM.  Once you've got the FSM built, the inner loop should be pretty
> quick. In C, the inner loop for executing a FSM should be something
> like: 
> 
> for(char* p = input; p; ++p) {
>     next_state = current_state[*p];
>     if (next_state == MATCH) {
>         break;
>    }
> }
> 
> which should compile down to a couple of machine instructions which
> run entirely in the instruction pipeline cache.  But I'm probably
> simplifying it more than I should :-) 

I'd just like to point out that your simple loop is looking at every 
character of the input string. The simple "'ENQ' not in line" test can look 
at the third character of the string and if it's none of 'E', 'N' or 'Q' 
skip to checking the 6th and then the 9th. It doesn't have to touch the 
intervening characters at all.

Or as the source puts it: it's a mix between Boyer-Moore and Horspool, with 
a few more bells and whistles on the top.

Also the regex library has to do a whole lot more than just figuring out if 
it got a match, so you have massively over-simplified it.

-- 
Duncan Booth http://kupuguy.blogspot.com

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Why is regex so slow? roy@panix.com (Roy Smith) - 2013-06-18 12:45 -0400
  Re: Why is regex so slow? Skip Montanaro <skip@pobox.com> - 2013-06-18 12:01 -0500
  Re: Why is regex so slow? Roy Smith <roy@panix.com> - 2013-06-18 13:08 -0400
  Re: Why is regex so slow? Chris Angelico <rosuav@gmail.com> - 2013-06-19 03:20 +1000
    Re: Why is regex so slow? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2013-06-18 20:10 +0200
      Re: Why is regex so slow? Roy Smith <roy@panix.com> - 2013-06-18 12:40 -0700
      Re: Why is regex so slow? André Malo <ndparker@gmail.com> - 2013-06-18 21:59 +0200
        Re: Why is regex so slow? André Malo <ndparker@gmail.com> - 2013-06-18 22:13 +0200
  Re: Why is regex so slow? MRAB <python@mrabarnett.plus.com> - 2013-06-18 18:31 +0100
  Re: Why is regex so slow? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-06-18 18:34 +0100
    Re: Why is regex so slow? roy@panix.com (Roy Smith) - 2013-06-18 15:21 -0400
      Re: Why is regex so slow? MRAB <python@mrabarnett.plus.com> - 2013-06-18 20:49 +0100
  Re: Why is regex so slow? Rick Johnson <rantingrickjohnson@gmail.com> - 2013-06-18 12:21 -0700
  Re: Why is regex so slow? Antoine Pitrou <solipsis@pitrou.net> - 2013-06-18 20:05 +0000
    Re: Why is regex so slow? Roy Smith <roy@panix.com> - 2013-06-18 13:23 -0700
      Re: Why is regex so slow? Duncan Booth <duncan.booth@invalid.invalid> - 2013-06-19 13:21 +0000
        Re: Why is regex so slow? Roy Smith <roy@panix.com> - 2013-06-19 12:55 -0700
    Re: Why is regex so slow? Grant Edwards <invalid@invalid.invalid> - 2013-06-18 20:30 +0000
      Re: Why is regex so slow? Terry Reedy <tjreedy@udel.edu> - 2013-06-18 17:29 -0400
      Re: Why is regex so slow? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2013-06-19 10:29 +0200
  Re: Why is regex so slow? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-19 01:51 +0000
    Re: Why is regex so slow? Dave Angel <davea@davea.name> - 2013-06-18 22:11 -0400
      Re: Why is regex so slow? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-19 03:16 +0000

csiph-web