Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #30825

Re: + in regular expression

Date 2012-10-05 17:07 +0100
From MRAB <python@mrabarnett.plus.com>
Subject Re: + in regular expression
References <CALwzidnH2T5vsYT=nMvBmO4V6fmK+aMfHpxQDWrwArJ6aKtVew@mail.gmail.com> <mailman.1838.1349414969.27098.python-list@python.org> <XnsA0E3689B3693duncanbooth@127.0.0.1> <506EFC44.40508@cs.wisc.edu>
Newsgroups comp.lang.python
Message-ID <mailman.1860.1349453267.27098.python-list@python.org> (permalink)

Show all headers | View raw


On 2012-10-05 16:27, Evan Driscoll wrote:
> On 10/05/2012 04:23 AM, Duncan Booth wrote:
>> A regular expression element may be followed by a quantifier.
>> Quantifiers are '*', '+', '?', '{n}', '{n,m}' (and lazy quantifiers
>> '*?', '+?', '{n,m}?'). There's nothing in the regex language which says
>> you can follow an element with two quantifiers.
> In fact, *you* did -- the first sentence of that paragraph! :-)
>
> \s is a regex, so you can follow it with a quantifier and get \s{6}.
> That's also a regex, so you should be able to follow it with a quantifier.
>
> I can understand that you can create a grammar that excludes it. I'm
> actually really interested to know if anyone knows whether this was a
> deliberate decision and, if so, what the reason is. (And if not --
> should it be considered a (low priority) bug?)
>
> Was it because such patterns often reveal a mistake? Because "\s{6}+"
> has other meanings in different regex syntaxes and the designers didn't
> want confusion? Because it was simpler to parse that way? Because the
> "hey you recognize regular expressions by converting it to a finite
> automaton" story is a lie in most real-world regex implementations (in
> part because they're not actually regular expressions) and repeated
> quantifiers cause problems with the parsing techniques that actually get
> used?
>
You rarely want to repeat a repeated element. It can also result in 
catastrophic
backtracking unless you're _very_ careful.

In many other regex implementations (including mine), "*+", "*+" and
"?+" are possessive quantifiers, much as "??", "*?" and "??" are lazy
quantifiers.

You could, of course, ask why adding "?" after a quantifier doesn't
make it optional, e.g. why r"\s{6}?" doesn't mean the same as
r"(?:\s{6})?", or why r"\s{0,6}?" doesn't mean the same as
r"(?:\s{0,6})?".

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

Re: + in regular expression Cameron Simpson <cs@zip.com.au> - 2012-10-05 15:22 +1000
  Re: + in regular expression Duncan Booth <duncan.booth@invalid.invalid> - 2012-10-05 09:23 +0000
    Re: Re: + in regular expression Evan Driscoll <driscoll@cs.wisc.edu> - 2012-10-05 10:27 -0500
    Re: + in regular expression Evan Driscoll <driscoll@cs.wisc.edu> - 2012-10-05 10:31 -0500
    Re: + in regular expression MRAB <python@mrabarnett.plus.com> - 2012-10-05 17:07 +0100

csiph-web