Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #18437

Re: Repeating assertions in regular expression

References <4f02c069$0$690$426a74cc@news.free.fr> <CABicbJKshBFOBgVNAmmGL6q7=3tz_QG2OYbGiS62HcaW2W=JmQ@mail.gmail.com> <4F034FA3.6020902@mrabarnett.plus.com>
From Devin Jeanpierre <jeanpierreda@gmail.com>
Date 2012-01-03 14:36 -0500
Subject Re: Repeating assertions in regular expression
Newsgroups comp.lang.python
Message-ID <mailman.4373.1325619405.27778.python-list@python.org> (permalink)

Show all headers | View raw


> Put simply, it doesn't occur often enough to be worth it. The cost
> outweighs the potential benefit.

I don't buy it. You could backtrack instead of failing for \b+ and
\b*, and it would be almost as fast as this optimization.

-- Devin

On Tue, Jan 3, 2012 at 1:57 PM, MRAB <python@mrabarnett.plus.com> wrote:
> On 03/01/2012 09:45, Devin Jeanpierre wrote:
>>>
>>>  \\b\\b and \\b{2} aren't equivalent ?
>>
>>
>> This sounds suspiciously like a bug!
>>
>>>  Why the wording is "should never" ? Repeating a zero-width assertion is
>>> not
>>>  forbidden, for instance :
>>>
>>>>>>  import re
>>>>>>  re.compile("\\b\\b\w+\\b\\b")
>>>
>>>  <_sre.SRE_Pattern object at 0xb7831140>
>>>>>>
>>>>>>
>>
>> I believe this is meant to refer to arbitrary-length repetitions, such
>> as r'\b*', not simple concatenations like that. r'\b*' will abort the
>> whole match if is run on a boundary, because Python detects a
>> repetition of a zero-width match and decides this is an error.
>>
> r"\b+" can be optimised to r"\b", but r"\b*" can be optimised to r"".
> r"\b\b", r"\b\b\b", etc, can be optimised to r"\b".
>
> So why doesn't it optimised?
>
> Because every potential optimisation has a cost, which is the time it
> would take to look for it.
>
> That cost needs to be balanced against the potential benefit.
>
> How often do you see repeated r"\b"?
>
> Put simply, it doesn't occur often enough to be worth it. The cost
> outweighs the potential benefit.
> --
> http://mail.python.org/mailman/listinfo/python-list

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

Repeating assertions in regular expression candide <candide@free.invalid> - 2012-01-03 09:46 +0100
  Re: Repeating assertions in regular expression Devin Jeanpierre <jeanpierreda@gmail.com> - 2012-01-03 04:45 -0500
  Re: Repeating assertions in regular expression MRAB <python@mrabarnett.plus.com> - 2012-01-03 18:57 +0000
  Re: Repeating assertions in regular expression Devin Jeanpierre <jeanpierreda@gmail.com> - 2012-01-03 14:36 -0500

csiph-web