Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #18392 > unrolled thread

Regular expression : non capturing groups are faster ?

Started bycandide <candide@free.invalid>
First post2012-01-03 12:14 +0100
Last post2012-01-03 13:59 +0200
Articles 7 — 3 participants

Back to article view | Back to comp.lang.python


Contents

  Regular expression : non capturing groups are faster ? candide <candide@free.invalid> - 2012-01-03 12:14 +0100
    Re: Regular expression : non capturing groups are faster ? Devin Jeanpierre <jeanpierreda@gmail.com> - 2012-01-03 06:56 -0500
      Re: Regular expression : non capturing groups are faster ? candide <candide@free.invalid> - 2012-01-03 15:50 +0100
        Re: Regular expression : non capturing groups are faster ? Devin Jeanpierre <jeanpierreda@gmail.com> - 2012-01-03 14:31 -0500
        Re: Regular expression : non capturing groups are faster ? "Octavian Rasnita" <orasnita@gmail.com> - 2012-01-03 22:07 +0200
        Re: Regular expression : non capturing groups are faster ? Devin Jeanpierre <jeanpierreda@gmail.com> - 2012-01-03 15:38 -0500
    Re: Regular expression : non capturing groups are faster ? "Octavian Rasnita" <orasnita@gmail.com> - 2012-01-03 13:59 +0200

#18392 — Regular expression : non capturing groups are faster ?

Fromcandide <candide@free.invalid>
Date2012-01-03 12:14 +0100
SubjectRegular expression : non capturing groups are faster ?
Message-ID<4f02e31c$0$15724$426a74cc@news.free.fr>
Excerpt from the Regular Expression HOWTO 
(http://docs.python.org/howto/regex.html#non-capturing-and-named-groups) :


-----------------------------------------------
It should be mentioned that there’s no performance difference in 
searching between capturing and non-capturing groups; neither form is 
any faster than the other.
-----------------------------------------------


Now from the Perl Regular Expression tutorial 
(http://perldoc.perl.org/perlretut.html#Non-capturing-groupings) :


-----------------------------------------------
Because there is no extraction, non-capturing groupings are faster than 
capturing groupings.
-----------------------------------------------


The second assertion sounds more likely. It seems very odd that Python 
and Perl implementations are divergent on this point. Any opinion ?

[toc] | [next] | [standalone]


#18396

FromDevin Jeanpierre <jeanpierreda@gmail.com>
Date2012-01-03 06:56 -0500
Message-ID<mailman.4346.1325591848.27778.python-list@python.org>
In reply to#18392
> The second assertion sounds more likely. It seems very odd that Python and
> Perl implementations are divergent on this point. Any opinion ?

The Python documentation oversimplifies. What it means to say is that
while one might expect capturing matches to do extra work proportional
to the capture, they do not. They don't do anything other than mark
down where to extract submatches, and the extra work done is pretty
much negligible. (That is, the work done for submatch extraction is a
polynomial (looks like quadratic) in the number of capturing groups
(which is very small almost always), with a small coefficient).

The Perl documentation is technically correct, but if the HOWTO said
it, it would give the wrong impression. You shouldn't care about
capturing vs noncapturing except with regards to how it interferes
with your group numbering scheme.

-- Devin

On Tue, Jan 3, 2012 at 6:14 AM, candide <candide@free.invalid> wrote:
> Excerpt from the Regular Expression HOWTO
> (http://docs.python.org/howto/regex.html#non-capturing-and-named-groups) :
>
>
> -----------------------------------------------
> It should be mentioned that there’s no performance difference in searching
> between capturing and non-capturing groups; neither form is any faster than
> the other.
> -----------------------------------------------
>
>
> Now from the Perl Regular Expression tutorial
> (http://perldoc.perl.org/perlretut.html#Non-capturing-groupings) :
>
>
> -----------------------------------------------
> Because there is no extraction, non-capturing groupings are faster than
> capturing groupings.
> -----------------------------------------------
>
>
> The second assertion sounds more likely. It seems very odd that Python and
> Perl implementations are divergent on this point. Any opinion ?
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [next] | [standalone]


#18411

Fromcandide <candide@free.invalid>
Date2012-01-03 15:50 +0100
Message-ID<4f0315c6$0$10967$426a74cc@news.free.fr>
In reply to#18396
Le 03/01/2012 12:56, Devin Jeanpierre a écrit :
>> The second assertion sounds more likely. It seems very odd that Python and
>> Perl implementations are divergent on this point. Any opinion ?
>
> The Python documentation oversimplifies.

You meant Perl Documentation, didn't you ?


It's a commun opinion that non-capturing groups have a price (minor), 
for instance Jan Goyvaerts, a well known regular expression guru, 
refering to Python code, tells :


non-capturing groups (...)  offer (slightly) better performance as the 
regex engine doesn't have to keep track of the text matched by 
non-capturing groups.


[link is there : 
http://stackoverflow.com/questions/2703029/why-regular-expressions-non-capturing-group-is-not-working]



It seems Javascript performs better respect to non-capturing groups : 
http://jsperf.com/regex-capture-vs-non-capture

The same for java : http://eyalsch.wordpress.com/2009/05/21/regex/
(no benchmarks).

For my part, Python tests didn't show any kind of significative penality.

[toc] | [prev] | [next] | [standalone]


#18436

FromDevin Jeanpierre <jeanpierreda@gmail.com>
Date2012-01-03 14:31 -0500
Message-ID<mailman.4372.1325619159.27778.python-list@python.org>
In reply to#18411
> You meant Perl Documentation, didn't you ?

I guess that works too. I did mean Python, though -- its intent is to
say "you shouldn't worry about this", but in the process it says "this
does not exist" (a lie).

"slightly better performance" would be accurate, as said by Goyvaerts/

-- Devin

On Tue, Jan 3, 2012 at 9:50 AM, candide <candide@free.invalid> wrote:
> Le 03/01/2012 12:56, Devin Jeanpierre a écrit :
>>>
>>> The second assertion sounds more likely. It seems very odd that Python
>>> and
>>> Perl implementations are divergent on this point. Any opinion ?
>>
>>
>> The Python documentation oversimplifies.
>
>
> You meant Perl Documentation, didn't you ?
>
>
> It's a commun opinion that non-capturing groups have a price (minor), for
> instance Jan Goyvaerts, a well known regular expression guru, refering to
> Python code, tells :
>
>
> non-capturing groups (...)  offer (slightly) better performance as the regex
> engine doesn't have to keep track of the text matched by non-capturing
> groups.
>
>
> [link is there :
> http://stackoverflow.com/questions/2703029/why-regular-expressions-non-capturing-group-is-not-working]
>
>
>
> It seems Javascript performs better respect to non-capturing groups :
> http://jsperf.com/regex-capture-vs-non-capture
>
> The same for java : http://eyalsch.wordpress.com/2009/05/21/regex/
> (no benchmarks).
>
> For my part, Python tests didn't show any kind of significative penality.
> --
> http://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [next] | [standalone]


#18446

From"Octavian Rasnita" <orasnita@gmail.com>
Date2012-01-03 22:07 +0200
Message-ID<mailman.4380.1325621415.27778.python-list@python.org>
In reply to#18411
From: "Devin Jeanpierre" <jeanpierreda@gmail.com>
Subject: Re: Regular expression : non capturing groups are faster ?


>> You meant Perl Documentation, didn't you ?
> 
> I guess that works too. I did mean Python, though -- its intent is to
> say "you shouldn't worry about this", but in the process it says "this
> does not exist" (a lie).


**
However, the Perl documentation doesn't lie.

I tested 10 million matches on my computer using capturing groups and it took ~ 6 seconds, but only ~ 2 seconds with non-capturing params.

So yes, it is very fast anyway, but ~ 3 times faster with non-capturing params, so there is a difference.

Octavian

[toc] | [prev] | [next] | [standalone]


#18451

FromDevin Jeanpierre <jeanpierreda@gmail.com>
Date2012-01-03 15:38 -0500
Message-ID<mailman.4384.1325623160.27778.python-list@python.org>
In reply to#18411
> I tested 10 million matches on my computer using capturing groups and it took ~ 6 seconds, but only ~ 2 seconds with non-capturing params.

Are you talking about Python or Perl? I can't reproduce this in
Python. Best I can do is a 3:4 ratio between running times. ('(a)*
versus '(?:a)*)

Also, wouldn't say "very fast". Compare those two groups with 'a*'.
I'm not sure what's going on there.

-- Devin

On Tue, Jan 3, 2012 at 3:07 PM, Octavian Rasnita <orasnita@gmail.com> wrote:
> From: "Devin Jeanpierre" <jeanpierreda@gmail.com>
> Subject: Re: Regular expression : non capturing groups are faster ?
>
>
>>> You meant Perl Documentation, didn't you ?
>>
>> I guess that works too. I did mean Python, though -- its intent is to
>> say "you shouldn't worry about this", but in the process it says "this
>> does not exist" (a lie).
>
>
> **
> However, the Perl documentation doesn't lie.
>
> I tested 10 million matches on my computer using capturing groups and it took ~ 6 seconds, but only ~ 2 seconds with non-capturing params.
>
> So yes, it is very fast anyway, but ~ 3 times faster with non-capturing params, so there is a difference.
>
> Octavian
>
> --
> http://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [next] | [standalone]


#18397

From"Octavian Rasnita" <orasnita@gmail.com>
Date2012-01-03 13:59 +0200
Message-ID<mailman.4347.1325592105.27778.python-list@python.org>
In reply to#18392
From: "candide" <candide@free.invalid>
Subject: Regular expression : non capturing groups are faster ?


Excerpt from the Regular Expression HOWTO 
(http://docs.python.org/howto/regex.html#non-capturing-and-named-groups) :


-----------------------------------------------
It should be mentioned that there’s no performance difference in 
searching between capturing and non-capturing groups; neither form is 
any faster than the other.
-----------------------------------------------


Now from the Perl Regular Expression tutorial 
(http://perldoc.perl.org/perlretut.html#Non-capturing-groupings) :


-----------------------------------------------
Because there is no extraction, non-capturing groupings are faster than 
capturing groupings.
-----------------------------------------------


The second assertion sounds more likely. It seems very odd that Python 
and Perl implementations are divergent on this point. Any opinion ?


-- 
**
At least in Perl's case, it is true. I tested and using (?:...) is much faster than ().

Of course, it takes a few seconds for dozen million matches...

Octavian

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web