Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #18432

Re: Repeating assertions in regular expression

Path csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <python@mrabarnett.plus.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.004
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; 'python': 0.08; 'from:addr:python': 0.09; '"should': 0.16; 'abort': 0.16; 'assertion': 0.16; 'benefit.': 0.16; 'from:addr:mrabarnett.plus.com': 0.16; 'from:name:mrab': 0.16; 'message-id:@mrabarnett.plus.com': 0.16; 'optimised': 0.16; 'received:84.92': 0.16; 'received:84.92.122': 0.16; 'received:84.92.122.60': 0.16; 'received:84.93': 0.16; 'received:84.93.230': 0.16; 'repetition': 0.16; 'reply-to:addr :python-list': 0.16; 'subject:expression': 0.16; 'subject:regular': 0.16; 'wording': 0.16; 'meant': 0.17; 'wrote:': 0.18; 'instance': 0.18; 'repeated': 0.18; "aren't": 0.21; "doesn't": 0.22; 'header:In-Reply-To:1': 0.22; 'etc,': 0.23; 'import': 0.27; 'equivalent': 0.31; 'error.': 0.32; 'header:User- Agent:1': 0.33; 'match': 0.34; 'to:addr:python-list': 0.34; 'it.': 0.34; 'received:84': 0.34; 'reply-to:addr:python.org': 0.34; 'but': 0.37; 'run': 0.37; 'enough': 0.38; 'put': 0.38; 'why': 0.39; 'to:addr:python.org': 0.40; 'worth': 0.61; 'cost': 0.63; 'believe': 0.65; 'header:Reply-To:1': 0.71; 'reply-to:no real name:2**0': 0.72; 'balanced': 0.84; 'optimisation': 0.84; 'cost,': 0.91
X-CM-Score 0.00
X-CNFS-Analysis v=2.0 cv=J8QoHXbS c=1 sm=1 a=0nF1XD0wxitMEM03M9B4ZQ==:17 a=9jsOeB20M3cA:10 a=wrC6FmDd7jIA:10 a=OUOv7kDek9cA:10 a=IkcTkHD0fZMA:10 a=CcCocpg0e8Rirpe-4JYA:9 a=QEXdDO2ut3YA:10 a=0nF1XD0wxitMEM03M9B4ZQ==:117
X-AUTH mrabarnett:2500
Date Tue, 03 Jan 2012 18:57:39 +0000
From MRAB <python@mrabarnett.plus.com>
User-Agent Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20111105 Thunderbird/8.0
MIME-Version 1.0
To python-list@python.org
Subject Re: Repeating assertions in regular expression
References <4f02c069$0$690$426a74cc@news.free.fr> <CABicbJKshBFOBgVNAmmGL6q7=3tz_QG2OYbGiS62HcaW2W=JmQ@mail.gmail.com>
In-Reply-To <CABicbJKshBFOBgVNAmmGL6q7=3tz_QG2OYbGiS62HcaW2W=JmQ@mail.gmail.com>
Content-Type text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding 7bit
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
Reply-To python-list@python.org
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.4368.1325617059.27778.python-list@python.org> (permalink)
Lines 32
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1325617059 news.xs4all.nl 6868 [2001:888:2000:d::a6]:54798
X-Complaints-To abuse@xs4all.nl
Xref x330-a1.tempe.blueboxinc.net comp.lang.python:18432

Show key headers only | View raw


On 03/01/2012 09:45, Devin Jeanpierre wrote:
>>  \\b\\b and \\b{2} aren't equivalent ?
>
> This sounds suspiciously like a bug!
>
>>  Why the wording is "should never" ? Repeating a zero-width assertion is not
>>  forbidden, for instance :
>>
>>>>>  import re
>>>>>  re.compile("\\b\\b\w+\\b\\b")
>>  <_sre.SRE_Pattern object at 0xb7831140>
>>>>>
>
> I believe this is meant to refer to arbitrary-length repetitions, such
> as r'\b*', not simple concatenations like that. r'\b*' will abort the
> whole match if is run on a boundary, because Python detects a
> repetition of a zero-width match and decides this is an error.
>
r"\b+" can be optimised to r"\b", but r"\b*" can be optimised to r"". 
r"\b\b", r"\b\b\b", etc, can be optimised to r"\b".

So why doesn't it optimised?

Because every potential optimisation has a cost, which is the time it
would take to look for it.

That cost needs to be balanced against the potential benefit.

How often do you see repeated r"\b"?

Put simply, it doesn't occur often enough to be worth it. The cost
outweighs the potential benefit.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Repeating assertions in regular expression candide <candide@free.invalid> - 2012-01-03 09:46 +0100
  Re: Repeating assertions in regular expression Devin Jeanpierre <jeanpierreda@gmail.com> - 2012-01-03 04:45 -0500
  Re: Repeating assertions in regular expression MRAB <python@mrabarnett.plus.com> - 2012-01-03 18:57 +0000
  Re: Repeating assertions in regular expression Devin Jeanpierre <jeanpierreda@gmail.com> - 2012-01-03 14:36 -0500

csiph-web