Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #30825
| Date | 2012-10-05 17:07 +0100 |
|---|---|
| From | MRAB <python@mrabarnett.plus.com> |
| Subject | Re: + in regular expression |
| References | <CALwzidnH2T5vsYT=nMvBmO4V6fmK+aMfHpxQDWrwArJ6aKtVew@mail.gmail.com> <mailman.1838.1349414969.27098.python-list@python.org> <XnsA0E3689B3693duncanbooth@127.0.0.1> <506EFC44.40508@cs.wisc.edu> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.1860.1349453267.27098.python-list@python.org> (permalink) |
On 2012-10-05 16:27, Evan Driscoll wrote:
> On 10/05/2012 04:23 AM, Duncan Booth wrote:
>> A regular expression element may be followed by a quantifier.
>> Quantifiers are '*', '+', '?', '{n}', '{n,m}' (and lazy quantifiers
>> '*?', '+?', '{n,m}?'). There's nothing in the regex language which says
>> you can follow an element with two quantifiers.
> In fact, *you* did -- the first sentence of that paragraph! :-)
>
> \s is a regex, so you can follow it with a quantifier and get \s{6}.
> That's also a regex, so you should be able to follow it with a quantifier.
>
> I can understand that you can create a grammar that excludes it. I'm
> actually really interested to know if anyone knows whether this was a
> deliberate decision and, if so, what the reason is. (And if not --
> should it be considered a (low priority) bug?)
>
> Was it because such patterns often reveal a mistake? Because "\s{6}+"
> has other meanings in different regex syntaxes and the designers didn't
> want confusion? Because it was simpler to parse that way? Because the
> "hey you recognize regular expressions by converting it to a finite
> automaton" story is a lie in most real-world regex implementations (in
> part because they're not actually regular expressions) and repeated
> quantifiers cause problems with the parsing techniques that actually get
> used?
>
You rarely want to repeat a repeated element. It can also result in
catastrophic
backtracking unless you're _very_ careful.
In many other regex implementations (including mine), "*+", "*+" and
"?+" are possessive quantifiers, much as "??", "*?" and "??" are lazy
quantifiers.
You could, of course, ask why adding "?" after a quantifier doesn't
make it optional, e.g. why r"\s{6}?" doesn't mean the same as
r"(?:\s{6})?", or why r"\s{0,6}?" doesn't mean the same as
r"(?:\s{0,6})?".
Back to comp.lang.python | Previous | Next — Previous in thread | Find similar | Unroll thread
Re: + in regular expression Cameron Simpson <cs@zip.com.au> - 2012-10-05 15:22 +1000
Re: + in regular expression Duncan Booth <duncan.booth@invalid.invalid> - 2012-10-05 09:23 +0000
Re: Re: + in regular expression Evan Driscoll <driscoll@cs.wisc.edu> - 2012-10-05 10:27 -0500
Re: + in regular expression Evan Driscoll <driscoll@cs.wisc.edu> - 2012-10-05 10:31 -0500
Re: + in regular expression MRAB <python@mrabarnett.plus.com> - 2012-10-05 17:07 +0100
csiph-web