Path: csiph.com!usenet.pasdenom.info!news.albasani.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
To: python-list@python.org
From: Mark Lawrence <breamoreboy@yahoo.co.uk>
Subject: Re: Why is regex so slow?
Date: Tue, 18 Jun 2013 18:34:01 +0100
References: <kpq2r9$gg6$1@panix2.panix.com> <CANc-5UyouN5EQw_GDecaAR+inyAB26g=e2pZ=zBuspXdmxJh+Q@mail.gmail.com> <B2AEBA54-6098-47D6-B7E0-77F0B7D8D722@panix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:17.0) Gecko/20130509 Thunderbird/17.0.6
In-Reply-To: <B2AEBA54-6098-47D6-B7E0-77F0B7D8D722@panix.com>
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.3549.1371576854.3114.python-list@python.org>
Lines: 40
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:48648

On 18/06/2013 18:08, Roy Smith wrote:
>
> On Jun 18, 2013, at 1:01 PM, Skip Montanaro wrote:
>
>>> I don't understand why the first way is so much slower.
>>
>> I have no obvious answers, but a couple suggestions:
>>
>> 1. Can you anchor the pattern at the beginning of the line?  (use
>> match() instead of search())
>
> That's one of the things we tried.  Didn't make any difference.
>
>> 2. Does it get faster it you eliminate the "(.*)" part of the pattern?
>
> Just tried that, it also didn't make any difference.
>
>> It seems that if you find a line matching the first part of the
>> pattern, you could just as easily split the line yourself instead of
>> creating a group.
>
>
> At this point, I'm not so much interested in making this faster as understanding why it's so slow.  I'm tempted to open this up as a performance bug against the regex module (which I assume will be rejected, at least for the 2.x series).
>
> ---
> Roy Smith
> roy@panix.com
>

Out of curiousity have the tried the new regex module from pypi rather 
than the stdlib version?  A heck of a lot of work has gone into it see 
http://bugs.python.org/issue2636

-- 
"Steve is going for the pink ball - and for those of you who are 
watching in black and white, the pink is next to the green." Snooker 
commentator 'Whispering' Ted Lowe.

Mark Lawrence