Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #30854 > unrolled thread
| Started by | Cameron Simpson <cs@zip.com.au> |
|---|---|
| First post | 2012-10-06 09:37 +1000 |
| Last post | 2012-10-09 11:29 +0000 |
| Articles | 2 — 2 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: + in regular expression Cameron Simpson <cs@zip.com.au> - 2012-10-06 09:37 +1000
Re: + in regular expression Duncan Booth <duncan.booth@invalid.invalid> - 2012-10-09 11:29 +0000
| From | Cameron Simpson <cs@zip.com.au> |
|---|---|
| Date | 2012-10-06 09:37 +1000 |
| Subject | Re: + in regular expression |
| Message-ID | <mailman.1884.1349480266.27098.python-list@python.org> |
On 05Oct2012 10:27, Evan Driscoll <driscoll@cs.wisc.edu> wrote:
| I can understand that you can create a grammar that excludes it. [...]
| Was it because such patterns often reveal a mistake?
For myself, I would consider that sufficient reason.
I've seen plenty of languages (C and shell, for example, though they
are not alone or egrarious) where a compiler can emit a syntax complaint
many lines from the actual coding mistake (in shell, an unclosed quote
or control construct is a common examplei; Python has the same issue
but mitigated by the indentation requirements which cut the occurence
down a lot).
Forbidding a common error by requiring a wordier workaround isn't
unreasonable.
| Because "\s{6}+"
| has other meanings in different regex syntaxes and the designers didn't
| want confusion?
I think Python REs are supposed to be Perl compatible; ISTR an opening
sentence to that effect...
| Because it was simpler to parse that way? Because the
| "hey you recognize regular expressions by converting it to a finite
| automaton" story is a lie in most real-world regex implementations (in
| part because they're not actually regular expressions) and repeated
| quantifiers cause problems with the parsing techniques that actually get
| used?
There are certainly constructs that can cause an exponential amount
of backtracking is misused. One could make a case for discouragement
(though not a case for forbidding them).
Just my 2c,
--
Cameron Simpson <cs@zip.com.au>
The most annoying thing about being without my files after our disc crash was
discovering once again how widespread BLINK was on the web.
[toc] | [next] | [standalone]
| From | Duncan Booth <duncan.booth@invalid.invalid> |
|---|---|
| Date | 2012-10-09 11:29 +0000 |
| Message-ID | <XnsA0E698D1EA28duncanbooth@127.0.0.1> |
| In reply to | #30854 |
Cameron Simpson <cs@zip.com.au> wrote:
>| Because "\s{6}+"
>| has other meanings in different regex syntaxes and the designers didn't
>| want confusion?
>
> I think Python REs are supposed to be Perl compatible; ISTR an opening
> sentence to that effect...
>
I don't know the full history of how regex engines evolved, but I suspect
at least part of the answer is that the decisions the Perl developers made
influenced the other implementations.
Perl's quantifiers allow both '?' and '+' as modifiers on the standard
quantifiers so clearly you cannot stack those particular quantifiers in
Perl, therefore quantifiers in general are unstackable.
The only grammars I can find online for regular expressions split out the
elements and quantifiers the way I did in my previous post. Python's regex
parser (and I would guess also most of the others in existence) tend more
to the spaghetti code than following a grammar (_parse is a 238 line
function). So I think it really is just trying to match existing regular
expression parsers and any possible grammar is an excuse for why it should
be the way it is rather than an explanation.
--
Duncan Booth http://kupuguy.blogspot.com
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web