Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #48712
| From | Duncan Booth <duncan.booth@invalid.invalid> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: Why is regex so slow? |
| Date | 2013-06-19 13:21 +0000 |
| Message-ID | <XnsA1E492148973Aduncanbooth@127.0.0.1> (permalink) |
| References | <kpq2r9$gg6$1@panix2.panix.com> <mailman.3557.1371585950.3114.python-list@python.org> <67bebb63-5f5c-4824-bf5b-e61547425449@googlegroups.com> |
Roy Smith <roy@panix.com> wrote:
> Except that the complexity in regexes is compiling the pattern down to
> a FSM. Once you've got the FSM built, the inner loop should be pretty
> quick. In C, the inner loop for executing a FSM should be something
> like:
>
> for(char* p = input; p; ++p) {
> next_state = current_state[*p];
> if (next_state == MATCH) {
> break;
> }
> }
>
> which should compile down to a couple of machine instructions which
> run entirely in the instruction pipeline cache. But I'm probably
> simplifying it more than I should :-)
I'd just like to point out that your simple loop is looking at every
character of the input string. The simple "'ENQ' not in line" test can look
at the third character of the string and if it's none of 'E', 'N' or 'Q'
skip to checking the 6th and then the 9th. It doesn't have to touch the
intervening characters at all.
Or as the source puts it: it's a mix between Boyer-Moore and Horspool, with
a few more bells and whistles on the top.
Also the regex library has to do a whole lot more than just figuring out if
it got a match, so you have massively over-simplified it.
--
Duncan Booth http://kupuguy.blogspot.com
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Why is regex so slow? roy@panix.com (Roy Smith) - 2013-06-18 12:45 -0400
Re: Why is regex so slow? Skip Montanaro <skip@pobox.com> - 2013-06-18 12:01 -0500
Re: Why is regex so slow? Roy Smith <roy@panix.com> - 2013-06-18 13:08 -0400
Re: Why is regex so slow? Chris Angelico <rosuav@gmail.com> - 2013-06-19 03:20 +1000
Re: Why is regex so slow? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2013-06-18 20:10 +0200
Re: Why is regex so slow? Roy Smith <roy@panix.com> - 2013-06-18 12:40 -0700
Re: Why is regex so slow? André Malo <ndparker@gmail.com> - 2013-06-18 21:59 +0200
Re: Why is regex so slow? André Malo <ndparker@gmail.com> - 2013-06-18 22:13 +0200
Re: Why is regex so slow? MRAB <python@mrabarnett.plus.com> - 2013-06-18 18:31 +0100
Re: Why is regex so slow? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-06-18 18:34 +0100
Re: Why is regex so slow? roy@panix.com (Roy Smith) - 2013-06-18 15:21 -0400
Re: Why is regex so slow? MRAB <python@mrabarnett.plus.com> - 2013-06-18 20:49 +0100
Re: Why is regex so slow? Rick Johnson <rantingrickjohnson@gmail.com> - 2013-06-18 12:21 -0700
Re: Why is regex so slow? Antoine Pitrou <solipsis@pitrou.net> - 2013-06-18 20:05 +0000
Re: Why is regex so slow? Roy Smith <roy@panix.com> - 2013-06-18 13:23 -0700
Re: Why is regex so slow? Duncan Booth <duncan.booth@invalid.invalid> - 2013-06-19 13:21 +0000
Re: Why is regex so slow? Roy Smith <roy@panix.com> - 2013-06-19 12:55 -0700
Re: Why is regex so slow? Grant Edwards <invalid@invalid.invalid> - 2013-06-18 20:30 +0000
Re: Why is regex so slow? Terry Reedy <tjreedy@udel.edu> - 2013-06-18 17:29 -0400
Re: Why is regex so slow? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2013-06-19 10:29 +0200
Re: Why is regex so slow? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-19 01:51 +0000
Re: Why is regex so slow? Dave Angel <davea@davea.name> - 2013-06-18 22:11 -0400
Re: Why is regex so slow? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-19 03:16 +0000
csiph-web