Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.004 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'python': 0.08; 'from:addr:python': 0.09; '"should': 0.16; 'abort': 0.16; 'assertion': 0.16; 'benefit.': 0.16; 'from:addr:mrabarnett.plus.com': 0.16; 'from:name:mrab': 0.16; 'message-id:@mrabarnett.plus.com': 0.16; 'optimised': 0.16; 'received:84.92': 0.16; 'received:84.92.122': 0.16; 'received:84.92.122.60': 0.16; 'received:84.93': 0.16; 'received:84.93.230': 0.16; 'repetition': 0.16; 'reply-to:addr :python-list': 0.16; 'subject:expression': 0.16; 'subject:regular': 0.16; 'wording': 0.16; 'meant': 0.17; 'wrote:': 0.18; 'instance': 0.18; 'repeated': 0.18; "aren't": 0.21; "doesn't": 0.22; 'header:In-Reply-To:1': 0.22; 'etc,': 0.23; 'import': 0.27; 'equivalent': 0.31; 'error.': 0.32; 'header:User- Agent:1': 0.33; 'match': 0.34; 'to:addr:python-list': 0.34; 'it.': 0.34; 'received:84': 0.34; 'reply-to:addr:python.org': 0.34; 'but': 0.37; 'run': 0.37; 'enough': 0.38; 'put': 0.38; 'why': 0.39; 'to:addr:python.org': 0.40; 'worth': 0.61; 'cost': 0.63; 'believe': 0.65; 'header:Reply-To:1': 0.71; 'reply-to:no real name:2**0': 0.72; 'balanced': 0.84; 'optimisation': 0.84; 'cost,': 0.91 X-CM-Score: 0.00 X-CNFS-Analysis: v=2.0 cv=J8QoHXbS c=1 sm=1 a=0nF1XD0wxitMEM03M9B4ZQ==:17 a=9jsOeB20M3cA:10 a=wrC6FmDd7jIA:10 a=OUOv7kDek9cA:10 a=IkcTkHD0fZMA:10 a=CcCocpg0e8Rirpe-4JYA:9 a=QEXdDO2ut3YA:10 a=0nF1XD0wxitMEM03M9B4ZQ==:117 X-AUTH: mrabarnett:2500 Date: Tue, 03 Jan 2012 18:57:39 +0000 From: MRAB User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20111105 Thunderbird/8.0 MIME-Version: 1.0 To: python-list@python.org Subject: Re: Repeating assertions in regular expression References: <4f02c069$0$690$426a74cc@news.free.fr> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list Reply-To: python-list@python.org List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 32 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1325617059 news.xs4all.nl 6868 [2001:888:2000:d::a6]:54798 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:18432 On 03/01/2012 09:45, Devin Jeanpierre wrote: >> \\b\\b and \\b{2} aren't equivalent ? > > This sounds suspiciously like a bug! > >> Why the wording is "should never" ? Repeating a zero-width assertion is not >> forbidden, for instance : >> >>>>> import re >>>>> re.compile("\\b\\b\w+\\b\\b") >> <_sre.SRE_Pattern object at 0xb7831140> >>>>> > > I believe this is meant to refer to arbitrary-length repetitions, such > as r'\b*', not simple concatenations like that. r'\b*' will abort the > whole match if is run on a boundary, because Python detects a > repetition of a zero-width match and decides this is an error. > r"\b+" can be optimised to r"\b", but r"\b*" can be optimised to r"". r"\b\b", r"\b\b\b", etc, can be optimised to r"\b". So why doesn't it optimised? Because every potential optimisation has a cost, which is the time it would take to look for it. That cost needs to be balanced against the potential benefit. How often do you see repeated r"\b"? Put simply, it doesn't occur often enough to be worth it. The cost outweighs the potential benefit.