Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #30821

Re: Re: + in regular expression

Path csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <driscoll@cs.wisc.edu>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.004
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; 'patterns': 0.04; 'expressions': 0.07; 'parsing': 0.07; 'repeated': 0.07; 'booth': 0.09; 'finite': 0.09; 'sentence': 0.09; ':-)': 0.13; 'language': 0.14; '"hey': 0.16; '(low': 0.16; '*you*': 0.16; 'excludes': 0.16; 'expressions)': 0.16; 'regex,': 0.16; 'subject:expression': 0.16; 'subject:regular': 0.16; 'wrote:': 0.17; 'element': 0.17; '(in': 0.18; 'followed': 0.20; 'parse': 0.22; 'recognize': 0.22; 'simpler': 0.22; 'header:In-Reply-To:1': 0.25; 'header:User- Agent:1': 0.26; 'am,': 0.27; 'converting': 0.27; 'regular': 0.27; 'way?': 0.29; "i'm": 0.29; 'knows': 0.30; '(and': 0.32; 'says': 0.33; 'anyone': 0.33; 'to:addr:python-list': 0.33; 'so,': 0.35; 'received:192.168.0': 0.35; 'really': 0.36; "didn't": 0.36; 'should': 0.36; 'problems': 0.36; 'two': 0.37; 'subject:: ': 0.38; 'nothing': 0.38; 'to:addr:python.org': 0.39; 'received:192': 0.39; 'received:192.168': 0.40; 'decision': 0.60; 'most': 0.61; 'first': 0.61; 'story': 0.61; 'is.': 0.62; 'techniques': 0.63; 'different': 0.63; 'real-world': 0.65; 'fact,': 0.69; 'designers': 0.75; 'received:192.168.0.3': 0.84; 'reveal': 0.84
Date Fri, 05 Oct 2012 10:27:00 -0500
From Evan Driscoll <driscoll@cs.wisc.edu>
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120714 Thunderbird/14.0
MIME-Version 1.0
To python-list@python.org
Subject Re: Re: + in regular expression
References <CALwzidnH2T5vsYT=nMvBmO4V6fmK+aMfHpxQDWrwArJ6aKtVew@mail.gmail.com> <mailman.1838.1349414969.27098.python-list@python.org> <XnsA0E3689B3693duncanbooth@127.0.0.1>
In-Reply-To <XnsA0E3689B3693duncanbooth@127.0.0.1>
Content-Type text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding 7bit
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.1855.1349450806.27098.python-list@python.org> (permalink)
Lines 26
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1349450806 news.xs4all.nl 6941 [2001:888:2000:d::a6]:50089
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:30821

Show key headers only | View raw


On 10/05/2012 04:23 AM, Duncan Booth wrote:
> A regular expression element may be followed by a quantifier.
> Quantifiers are '*', '+', '?', '{n}', '{n,m}' (and lazy quantifiers
> '*?', '+?', '{n,m}?'). There's nothing in the regex language which says
> you can follow an element with two quantifiers.
In fact, *you* did -- the first sentence of that paragraph! :-)

\s is a regex, so you can follow it with a quantifier and get \s{6}. 
That's also a regex, so you should be able to follow it with a quantifier.

I can understand that you can create a grammar that excludes it. I'm 
actually really interested to know if anyone knows whether this was a 
deliberate decision and, if so, what the reason is. (And if not -- 
should it be considered a (low priority) bug?)

Was it because such patterns often reveal a mistake? Because "\s{6}+" 
has other meanings in different regex syntaxes and the designers didn't 
want confusion? Because it was simpler to parse that way? Because the 
"hey you recognize regular expressions by converting it to a finite 
automaton" story is a lie in most real-world regex implementations (in 
part because they're not actually regular expressions) and repeated 
quantifiers cause problems with the parsing techniques that actually get 
used?

Evan

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Re: + in regular expression Cameron Simpson <cs@zip.com.au> - 2012-10-05 15:22 +1000
  Re: + in regular expression Duncan Booth <duncan.booth@invalid.invalid> - 2012-10-05 09:23 +0000
    Re: Re: + in regular expression Evan Driscoll <driscoll@cs.wisc.edu> - 2012-10-05 10:27 -0500
    Re: + in regular expression Evan Driscoll <driscoll@cs.wisc.edu> - 2012-10-05 10:31 -0500
    Re: + in regular expression MRAB <python@mrabarnett.plus.com> - 2012-10-05 17:07 +0100

csiph-web