Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #18437
| References | <4f02c069$0$690$426a74cc@news.free.fr> <CABicbJKshBFOBgVNAmmGL6q7=3tz_QG2OYbGiS62HcaW2W=JmQ@mail.gmail.com> <4F034FA3.6020902@mrabarnett.plus.com> |
|---|---|
| From | Devin Jeanpierre <jeanpierreda@gmail.com> |
| Date | 2012-01-03 14:36 -0500 |
| Subject | Re: Repeating assertions in regular expression |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.4373.1325619405.27778.python-list@python.org> (permalink) |
> Put simply, it doesn't occur often enough to be worth it. The cost
> outweighs the potential benefit.
I don't buy it. You could backtrack instead of failing for \b+ and
\b*, and it would be almost as fast as this optimization.
-- Devin
On Tue, Jan 3, 2012 at 1:57 PM, MRAB <python@mrabarnett.plus.com> wrote:
> On 03/01/2012 09:45, Devin Jeanpierre wrote:
>>>
>>> \\b\\b and \\b{2} aren't equivalent ?
>>
>>
>> This sounds suspiciously like a bug!
>>
>>> Why the wording is "should never" ? Repeating a zero-width assertion is
>>> not
>>> forbidden, for instance :
>>>
>>>>>> import re
>>>>>> re.compile("\\b\\b\w+\\b\\b")
>>>
>>> <_sre.SRE_Pattern object at 0xb7831140>
>>>>>>
>>>>>>
>>
>> I believe this is meant to refer to arbitrary-length repetitions, such
>> as r'\b*', not simple concatenations like that. r'\b*' will abort the
>> whole match if is run on a boundary, because Python detects a
>> repetition of a zero-width match and decides this is an error.
>>
> r"\b+" can be optimised to r"\b", but r"\b*" can be optimised to r"".
> r"\b\b", r"\b\b\b", etc, can be optimised to r"\b".
>
> So why doesn't it optimised?
>
> Because every potential optimisation has a cost, which is the time it
> would take to look for it.
>
> That cost needs to be balanced against the potential benefit.
>
> How often do you see repeated r"\b"?
>
> Put simply, it doesn't occur often enough to be worth it. The cost
> outweighs the potential benefit.
> --
> http://mail.python.org/mailman/listinfo/python-list
Back to comp.lang.python | Previous | Next — Previous in thread | Find similar | Unroll thread
Repeating assertions in regular expression candide <candide@free.invalid> - 2012-01-03 09:46 +0100 Re: Repeating assertions in regular expression Devin Jeanpierre <jeanpierreda@gmail.com> - 2012-01-03 04:45 -0500 Re: Repeating assertions in regular expression MRAB <python@mrabarnett.plus.com> - 2012-01-03 18:57 +0000 Re: Repeating assertions in regular expression Devin Jeanpierre <jeanpierreda@gmail.com> - 2012-01-03 14:36 -0500
csiph-web