Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #48665
| From | Antoine Pitrou <solipsis@pitrou.net> |
|---|---|
| Subject | Re: Why is regex so slow? |
| Date | 2013-06-18 20:05 +0000 |
| References | <kpq2r9$gg6$1@panix2.panix.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.3557.1371585950.3114.python-list@python.org> (permalink) |
Roy Smith <roy <at> panix.com> writes: > > Every line which contains 'ENQ' also matches the full regex (61425 > lines match, out of 2.1 million total). I don't understand why the > first way is so much slower. One invokes a fast special-purpose substring searching routine (the str.__contains__ operator), the other a generic matching engine able to process complex patterns. It's hardly a surprise for the specialized routine to be faster. That's like saying "I don't understand why my CPU is slower than my GPU at calculating 3D structures". That said, there may be ways to improve the regex engine to deal with such special cases in a speedier way. But there will still be some overhead related to the fact that you are invoking a powerful generic engine rather than a lean and mean specialized routine. (to be fair, on CPython there's also the fact that operators are faster than method calls, so some overhead is added by that too) > Once the regex is compiled, you should have a state machine pattern > matcher. It should be O(n) in the length of the input to figure out > that it doesn't match as far as "ENQ". And that's exactly how long it > should take for "if 'ENQ' not in line" to run as well. You should read again on the O(...) notation. It's an asymptotic complexity, it tells you nothing about the exact function values at different data points. So you can have two O(n) routines, one of which always twice faster than the other. Regards Antoine.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Why is regex so slow? roy@panix.com (Roy Smith) - 2013-06-18 12:45 -0400
Re: Why is regex so slow? Skip Montanaro <skip@pobox.com> - 2013-06-18 12:01 -0500
Re: Why is regex so slow? Roy Smith <roy@panix.com> - 2013-06-18 13:08 -0400
Re: Why is regex so slow? Chris Angelico <rosuav@gmail.com> - 2013-06-19 03:20 +1000
Re: Why is regex so slow? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2013-06-18 20:10 +0200
Re: Why is regex so slow? Roy Smith <roy@panix.com> - 2013-06-18 12:40 -0700
Re: Why is regex so slow? André Malo <ndparker@gmail.com> - 2013-06-18 21:59 +0200
Re: Why is regex so slow? André Malo <ndparker@gmail.com> - 2013-06-18 22:13 +0200
Re: Why is regex so slow? MRAB <python@mrabarnett.plus.com> - 2013-06-18 18:31 +0100
Re: Why is regex so slow? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-06-18 18:34 +0100
Re: Why is regex so slow? roy@panix.com (Roy Smith) - 2013-06-18 15:21 -0400
Re: Why is regex so slow? MRAB <python@mrabarnett.plus.com> - 2013-06-18 20:49 +0100
Re: Why is regex so slow? Rick Johnson <rantingrickjohnson@gmail.com> - 2013-06-18 12:21 -0700
Re: Why is regex so slow? Antoine Pitrou <solipsis@pitrou.net> - 2013-06-18 20:05 +0000
Re: Why is regex so slow? Roy Smith <roy@panix.com> - 2013-06-18 13:23 -0700
Re: Why is regex so slow? Duncan Booth <duncan.booth@invalid.invalid> - 2013-06-19 13:21 +0000
Re: Why is regex so slow? Roy Smith <roy@panix.com> - 2013-06-19 12:55 -0700
Re: Why is regex so slow? Grant Edwards <invalid@invalid.invalid> - 2013-06-18 20:30 +0000
Re: Why is regex so slow? Terry Reedy <tjreedy@udel.edu> - 2013-06-18 17:29 -0400
Re: Why is regex so slow? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2013-06-19 10:29 +0200
Re: Why is regex so slow? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-19 01:51 +0000
Re: Why is regex so slow? Dave Angel <davea@davea.name> - 2013-06-18 22:11 -0400
Re: Why is regex so slow? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-06-19 03:16 +0000
csiph-web