Groups > comp.lang.python > #19409 > unrolled thread

PyWart: Python regular expression syntax is not intuitive.

Started by	Rick Johnson <rantingrickjohnson@gmail.com>
First post	2012-01-25 09:16 -0800
Last post	2012-01-25 19:05 -0700
Articles	20 on this page of 21 — 8 participants

Back to article view | Back to comp.lang.python

  PyWart: Python regular expression syntax is not intuitive. Rick Johnson <rantingrickjohnson@gmail.com> - 2012-01-25 09:16 -0800
    Re: PyWart: Python regular expression syntax is not intuitive. Rick Johnson <rantingrickjohnson@gmail.com> - 2012-01-25 09:33 -0800
    Re: PyWart: Python regular expression syntax is not intuitive. Duncan Booth <duncan.booth@invalid.invalid> - 2012-01-25 19:32 +0000
      Re: PyWart: Python regular expression syntax is not intuitive. Devin Jeanpierre <jeanpierreda@gmail.com> - 2012-01-25 18:19 -0500
    Re: PyWart: Python regular expression syntax is not intuitive. Ian Kelly <ian.g.kelly@gmail.com> - 2012-01-25 13:17 -0700
      Re: PyWart: Python regular expression syntax is not intuitive. Rick Johnson <rantingrickjohnson@gmail.com> - 2012-01-25 13:19 -0800
        Re: PyWart: Python regular expression syntax is not intuitive. Duncan Booth <duncan.booth@invalid.invalid> - 2012-01-25 21:41 +0000
          Re: PyWart: Python regular expression syntax is not intuitive. Rick Johnson <rantingrickjohnson@gmail.com> - 2012-01-25 14:26 -0800
        Re: PyWart: Python regular expression syntax is not intuitive. Ian Kelly <ian.g.kelly@gmail.com> - 2012-01-25 16:36 -0700
          Re: PyWart: Python regular expression syntax is not intuitive. Rick Johnson <rantingrickjohnson@gmail.com> - 2012-01-25 16:14 -0800
            Re: PyWart: Python regular expression syntax is not intuitive. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-01-26 01:58 +0000
            Re: PyWart: Python regular expression syntax is not intuitive. Devin Jeanpierre <jeanpierreda@gmail.com> - 2012-01-25 21:24 -0500
              Re: PyWart: Python regular expression syntax is not intuitive. Rick Johnson <rantingrickjohnson@gmail.com> - 2012-01-25 20:21 -0800
            Re: PyWart: Python regular expression syntax is not intuitive. Evan Driscoll <edriscoll@wisc.edu> - 2012-01-25 23:38 -0600
      Re: PyWart: Python regular expression syntax is not intuitive. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-01-26 05:28 +0000
    Re: PyWart: Python regular expression syntax is not intuitive. Terry Reedy <tjreedy@udel.edu> - 2012-01-25 15:33 -0500
    Re: PyWart: Python regular expression syntax is not intuitive. Devin Jeanpierre <jeanpierreda@gmail.com> - 2012-01-25 18:28 -0500
      Re: PyWart: Python regular expression syntax is not intuitive. Rick Johnson <rantingrickjohnson@gmail.com> - 2012-01-25 15:44 -0800
        Re: PyWart: Python regular expression syntax is not intuitive. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-01-26 05:28 +0000
    Re: PyWart: Python regular expression syntax is not intuitive. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-01-26 01:54 +0000
      Re: PyWart: Python regular expression syntax is not intuitive. Michael Torrie <torriem@gmail.com> - 2012-01-25 19:05 -0700

Page 1 of 2 [1] 2 Next page →

#19409 — PyWart: Python regular expression syntax is not intuitive.

From	Rick Johnson <rantingrickjohnson@gmail.com>
Date	2012-01-25 09:16 -0800
Subject	PyWart: Python regular expression syntax is not intuitive.
Message-ID	<30949a2a-bfd4-4d22-a56b-22b1c6cdf1e8@m11g2000yqe.googlegroups.com>

In particular i find the "extension notation" syntax to be woefully
inadequate. You should be able to infer the action of the extension
syntax intuitively, simply from looking at its signature. I find
myself continually needing to consult the docs because of the lacking
or misleading style of the current syntax. Consider:

(...) # Group Capture
Okay here. Parenthesis feel very natural for delimiting a group.

(?...)  # Base Extension Syntax
All extensions are wrapped in parenthesis and start with a question
mark, but i believe the question mark was a very bad choice, since the
question mark is already specific to "zero or one repetitions of
preceding RE". This simple error is why i believe Python re's are so
damn difficult to eyeball parse. You'll constantly be forced to spend
too much time deciding if this question mark is a referring to
repeats, or is the start of an extension syntax. We should have
choosen another char, and the char should NOT be known to RE in any
other place. Maybe the tilde would work? Wait, i have a MUCH better
idea!!!

Actually the best choice would have been using BRACES instead of
PARENTHESIS to delimit the extension syntax, since parenthesis are
used (wisely i might add!) for group captures.  Also, anything
contained in braces is more likely to be understood (by almost all
programmers) as a "command block" -- unfortunately some idiot decided
to use braces for specifying ranges! WHAT A F'ING WASTE of intuitive
chars!

(?iLmsux) # Passing Flags Internally
This is ridiculous. re's are cryptic enough without inviting TIMTOWDI
over to play. Passing flags this way does nothing BUT harm
readability. Please people, pass your flags as an argument to the
appropriate re.method() and NOT as another cryptic syntax.

(?:...) # Non-Capturing Group
When i look at this pattern "non-capturing" DOES NOT scream out at me,
and again, the question mark is used incorrectly. When i think of a
char that screams NEGATIVE, i think of the exclamation mark, NOT the
question mark. And how the HELL is the colon helping me to interpret
this syntax?

(?P<name>...) # Named Group Capture
(?P=name) # Named Group Reference
(?#...)  # Comment

################################################
## The following assertions are highly flawed ##
################################################

(?=...)  # positive look ahead
(?!...)  # negative look ahead
(?<=...) # positive look behind
(?<!...) # negative look behind

I cannot decipher these patterns in their current syntactical forms.
Too much information is missing or misleading. I have no idea which
pattern is looking forward, which pattern is looking backward, which
is pattern negative, and which pattern is positive. I need syntactical
clues! Consider these:

(?>=...) #Read as "forward equals pattern?"
(?>!=...) #Read as "forward NOT equals pattern?"
(?<=...) #Read as "backwards equals pattern?"
(?<!=...) #Read as "backwards NOT equals pattern?"

However, i really don't like the fact that negative assertions need
one extra char than positive assertions. Here is an alternative:

(?>+...) #Read as "forward equals pattern?"
(?>-...) #Read as "forward NOT equals pattern?"
(?<+...) #Read as "backwards equals pattern?"
(?<-...) #Read as "backwards NOT equals pattern?"

Looks much better HOWEVER we still have too much useless noise.
Replace the parenthesis delimiters with braces, and drop the "where's
waldo" question mark,  and we have a simplistically intuitive
syntactical bliss!

{...}  # Base Extension Syntax
{iLmsux}  # Passing Flags Internally
{!()...} or (!...) # Non Capturing.
{NG=identifier...}  # Named Group Capture
{NG.name}  # Named Group Reference
{#...}  # Comment
{>+...}  # Positive Look Ahead Assertion
{>-...}  # Negative Look Ahead Assertion
{<+...}  # Positive Look Behind Assertion
{<-...}  # Positive Look Behind Assertion
{(id/name)yes-pat|no-pat}

*school-bell-rings*

PS: In my eyes, Python 3000 is already a dinosaur.

[toc] | [next] | [standalone]

#19413

From	Rick Johnson <rantingrickjohnson@gmail.com>
Date	2012-01-25 09:33 -0800
Message-ID	<5797eb6f-4129-4377-b597-fbde4ee1afd6@c21g2000yqi.googlegroups.com>
In reply to	#19409

On Jan 25, 11:16 am, Rick Johnson <rantingrickjohn...@gmail.com>
wrote:

> {!()...} or (!...) # Non Capturing.

Yuck: on second thought, i don't like {!()...}, mainly because non-
capturing groups should use the parenthesis delimiters to keep the API
consistent. Try this instead --> (!:...)

> {NG=identifier...}  # Named Group Capture
> {NG.name}  # Named Group Reference

...should be {NG.identifier}. I am also feeling like named group
syntax could be more simplistic without sacrificing readability.

{=identifier...}  # Named Group Capture
{.identifier}  # Named Group Reference

[toc] | [prev] | [next] | [standalone]

#19420

From	Duncan Booth <duncan.booth@invalid.invalid>
Date	2012-01-25 19:32 +0000
Message-ID	<Xns9FE5C6C30337duncanbooth@127.0.0.1>
In reply to	#19409

Rick Johnson <rantingrickjohnson@gmail.com> wrote:

> (?...)  # Base Extension Syntax
> All extensions are wrapped in parenthesis and start with a question
> mark, but i believe the question mark was a very bad choice, since the
> question mark is already specific to "zero or one repetitions of
> preceding RE". This simple error is why i believe Python re's are so
> damn difficult to eyeball parse. You'll constantly be forced to spend
> too much time deciding if this question mark is a referring to
> repeats, or is the start of an extension syntax. We should have
> choosen another char, and the char should NOT be known to RE in any
> other place. Maybe the tilde would work? Wait, i have a MUCH better
> idea!!!

The problem with your idea is that it breaks compatability with other non-
Python regular expression engines. Python didn't invent the (?...) syntax, 
it originated with Perl.

Try complaining to a Perl group instead.

-- 
Duncan Booth http://kupuguy.blogspot.com

[toc] | [prev] | [next] | [standalone]

#19445

From	Devin Jeanpierre <jeanpierreda@gmail.com>
Date	2012-01-25 18:19 -0500
Message-ID	<mailman.5099.1327533584.27778.python-list@python.org>
In reply to	#19420

On Wed, Jan 25, 2012 at 2:32 PM, Duncan Booth
<duncan.booth@invalid.invalid> wrote:
> The problem with your idea is that it breaks compatability with other non-
> Python regular expression engines. Python didn't invent the (?...) syntax,
> it originated with Perl.
>
> Try complaining to a Perl group instead.

The Perl folks didn't like it either:

http://en.wikipedia.org/wiki/Perl_6_rules

-- Devin

[toc] | [prev] | [next] | [standalone]

#19426

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2012-01-25 13:17 -0700
Message-ID	<mailman.5090.1327522663.27778.python-list@python.org>
In reply to	#19409

On Wed, Jan 25, 2012 at 10:16 AM, Rick Johnson
<rantingrickjohnson@gmail.com> wrote:
> (?...)  # Base Extension Syntax
> All extensions are wrapped in parenthesis and start with a question
> mark, but i believe the question mark was a very bad choice, since the
> question mark is already specific to "zero or one repetitions of
> preceding RE". This simple error is why i believe Python re's are so
> damn difficult to eyeball parse. You'll constantly be forced to spend
> too much time deciding if this question mark is a referring to
> repeats, or is the start of an extension syntax. We should have
> choosen another char, and the char should NOT be known to RE in any
> other place. Maybe the tilde would work? Wait, i have a MUCH better
> idea!!!

Did you read the very first sentence of the re module documentation?
"This module provides regular expression matching operations *similar
to those found in Perl*" (my emphasis).  The goal here is
compatibility with existing RE syntaxes, not readability.  Perl uses
the (?...) syntax, so the re module does too.

> (?iLmsux) # Passing Flags Internally
> This is ridiculous. re's are cryptic enough without inviting TIMTOWDI
> over to play. Passing flags this way does nothing BUT harm
> readability. Please people, pass your flags as an argument to the
> appropriate re.method() and NOT as another cryptic syntax.

1) Not all regular expressions are hard-coded.  Some applications even
allow users to supply regular expressions as data.  Permitting flags
in the regular expression allows the user to specify or override the
defaults set by the application.

2) Permitting flags in the regular expression allows different
combinations of flags to be in effect for different parts of complex
regular expressions.  You can't do that just by passing in the flags
as an argument.

> (?:...) # Non-Capturing Group
> When i look at this pattern "non-capturing" DOES NOT scream out at me,
> and again, the question mark is used incorrectly. When i think of a
> char that screams NEGATIVE, i think of the exclamation mark, NOT the
> question mark. And how the HELL is the colon helping me to interpret
> this syntax?

Don't ask us.  Ask Larry Wall.

> (?=...)  # positive look ahead
> (?!...)  # negative look ahead
> (?<=...) # positive look behind
> (?<!...) # negative look behind
>
> I cannot decipher these patterns in their current syntactical forms.
> Too much information is missing or misleading. I have no idea which
> pattern is looking forward, which pattern is looking backward, which
> is pattern negative, and which pattern is positive. I need syntactical
> clues! Consider these:
>
> (?>=...) #Read as "forward equals pattern?"
> (?>!=...) #Read as "forward NOT equals pattern?"
> (?<=...) #Read as "backwards equals pattern?"
> (?<!=...) #Read as "backwards NOT equals pattern?"
>
> However, i really don't like the fact that negative assertions need
> one extra char than positive assertions. Here is an alternative:
>
> (?>+...) #Read as "forward equals pattern?"
> (?>-...) #Read as "forward NOT equals pattern?"
> (?<+...) #Read as "backwards equals pattern?"
> (?<-...) #Read as "backwards NOT equals pattern?"
>
> Looks much better HOWEVER we still have too much useless noise.
> Replace the parenthesis delimiters with braces, and drop the "where's
> waldo" question mark,  and we have a simplistically intuitive
> syntactical bliss!

Once again, these come from Perl.  Note also that Perl already has
(?>...) which means something entirely different.

> {...}  # Base Extension Syntax
> {iLmsux}  # Passing Flags Internally
> {!()...} or (!...) # Non Capturing.
> {NG=identifier...}  # Named Group Capture
> {NG.name}  # Named Group Reference
> {#...}  # Comment
> {>+...}  # Positive Look Ahead Assertion
> {>-...}  # Negative Look Ahead Assertion
> {<+...}  # Positive Look Behind Assertion
> {<-...}  # Positive Look Behind Assertion
> {(id/name)yes-pat|no-pat}
>
> *school-bell-rings*

Regular expression reform is not necessarily a bad thing, but this is
just forcing everybody to learn Yet Another Regex Syntax for no real
purpose.  All that you've changed here is window dressing.  For an
overview of many of the *real* problems with regular expression
syntax, see

http://www.perl.com/pub/2002/06/04/apo5.html

Ian

[toc] | [prev] | [next] | [standalone]

#19435

From	Rick Johnson <rantingrickjohnson@gmail.com>
Date	2012-01-25 13:19 -0800
Message-ID	<e8a96135-3444-4484-97f0-3ae196f82f5e@p21g2000yqm.googlegroups.com>
In reply to	#19426

On Jan 25, 2:17 pm, Ian Kelly <ian.g.ke...@gmail.com> wrote:
> On Wed, Jan 25, 2012 at 10:16 AM, Rick Johnson

> Did you read the very first sentence of the re module documentation?
> "This module provides regular expression matching operations *similar
> to those found in Perl*" (my emphasis).  The goal here is
> compatibility with existing RE syntaxes, not readability.  Perl uses
> the (?...) syntax, so the re module does too.

@Duncan and Ian:
Did you not read the title of my post? :o) " Python regular expression
syntax is not intuitive." While i understand WHERE the syntax
orientations from, that fact does not solve the problem. The syntax is
not intuitive, and Python should ALWAYS be intuitive! We should always
borrow ideas from anyone (even our enemies) when those ideas support
our ideology. Perl style regexes are not Pythonic. They violate our
philosophy in too many places.

> > (?iLmsux) # Passing Flags Internally
> > This is ridiculous. re's are cryptic enough without inviting TIMTOWDI
> > over to play. Passing flags this way does nothing BUT harm
> > readability. Please people, pass your flags as an argument to the
> > appropriate re.method() and NOT as another cryptic syntax.
>
> 1) Not all regular expressions are hard-coded.  Some applications even
> allow users to supply regular expressions as data.  Permitting flags
> in the regular expression allows the user to specify or override the
> defaults set by the application.
>
> 2) Permitting flags in the regular expression allows different
> combinations of flags to be in effect for different parts of complex
> regular expressions.  You can't do that just by passing in the flags
> as an argument.

This is a valid argument, and i totally agree with you that we should
not remove the ability to pass flags internally. However, my main
point still stands strong (with a slight tweak). """Please people,
pass your flags as an argument to the appropriate re.method() and NOT
as another cryptic syntax, UNLESS YOU HAVE NO OTHER CHOICE!""" Thanks
for pointing this out.

> Regular expression reform is not necessarily a bad thing, but this is
> just forcing everybody to learn Yet Another Regex Syntax for no real
> purpose.

I disagree here.
Whist some people may be "die-hard" fans of the un-intuitive perl
regex syntax, i believe many, if not exponentially MORE people would
like to have a better alternative. Do i want to remove the current
"well established" re module? No. But i would like to create a new
regex module that is more pythonic. A regex module that we can be
proud of. And just maybe, a regex module that "sets the bar" for all
other regular expressions.

Listen. Backwards compatibility and cross pollination is wonderful
WHEN you can make it work. However, in the case of Perl regex syntax,
this is not a "cross pollination", this is a "cross pollution".

> All that you've changed here is window dressing.  For an
> overview of many of the *real* problems with regular expression
> syntax, see

Window dressing is important Ian, if not, then shop owners would not
continue to show displays in their shop windows. What does window
dressing do exactly? It attracts the masses, and without the masses
all merchants will eventually go out of buisness. Note: my argument
HAS NOTHING to do with the number of folks programming python (or any
language). The argument is focused on module sustainability in a
community. Modules that are morbidly DIFFICULT to learn do not last.

I know about PyParsing but i believe we have room for PyParsing and a
more Pythonic take on Perl style regular expressions. I don't see why
we could not keep all three. Let the people decide what is best for
them.

The greatest aspect of regexes is their compactness, and we should
keep them compact. And in that respect regexes will always be cryptic
to the neophyte.  However, regexes do not have to be a scourge to the
initiated. We must balance the compact and the intuitive nature of
regexes. But most importantly, we must understand that these aspects
of regexes are NOT mutually exclusive.

[toc] | [prev] | [next] | [standalone]

#19438

From	Duncan Booth <duncan.booth@invalid.invalid>
Date	2012-01-25 21:41 +0000
Message-ID	<Xns9FE5DCA017AFFduncanbooth@127.0.0.1>
In reply to	#19435

Rick Johnson <rantingrickjohnson@gmail.com> wrote:

> On Jan 25, 2:17ÿpm, Ian Kelly <ian.g.ke...@gmail.com> wrote:
>> On Wed, Jan 25, 2012 at 10:16 AM, Rick Johnson
> 
>> Did you read the very first sentence of the re module documentation?
>> "This module provides regular expression matching operations *similar
>> to those found in Perl*" (my emphasis). ÿThe goal here is
>> compatibility with existing RE syntaxes, not readability. ÿPerl uses
>> the (?...) syntax, so the re module does too.
> 
> @Duncan and Ian:
> Did you not read the title of my post? :o) " Python regular expression
> syntax is not intuitive." While i understand WHERE the syntax
> orientations from, that fact does not solve the problem. The syntax is
> not intuitive, and Python should ALWAYS be intuitive! We should always
> borrow ideas from anyone (even our enemies) when those ideas support
> our ideology. Perl style regexes are not Pythonic. They violate our
> philosophy in too many places.
> 
Or we could implement de-facto standards where they exist.

*plonk*


-- 
Duncan Booth http://kupuguy.blogspot.com

[toc] | [prev] | [next] | [standalone]

#19440

From	Rick Johnson <rantingrickjohnson@gmail.com>
Date	2012-01-25 14:26 -0800
Message-ID	<d5265e8a-3008-4998-9616-22d395a2fa84@p21g2000yqm.googlegroups.com>
In reply to	#19438

On Jan 25, 3:41 pm, Duncan Booth <duncan.bo...@invalid.invalid> wrote:
> Rick Johnson <rantingrickjohn...@gmail.com> wrote:
> > On Jan 25, 2:17ÿpm, Ian Kelly <ian.g.ke...@gmail.com> wrote:
> >> On Wed, Jan 25, 2012 at 10:16 AM, Rick Johnson
>
> >> Did you read the very first sentence of the re module documentation?
> >> "This module provides regular expression matching operations *similar
> >> to those found in Perl*" (my emphasis). ÿThe goal here is
> >> compatibility with existing RE syntaxes, not readability. ÿPerl uses
> >> the (?...) syntax, so the re module does too.
>
> > @Duncan and Ian:
> > Did you not read the title of my post? :o) " Python regular expression
> > syntax is not intuitive." While i understand WHERE the syntax
> > orientations from, that fact does not solve the problem. The syntax is
> > not intuitive, and Python should ALWAYS be intuitive! We should always
> > borrow ideas from anyone (even our enemies) when those ideas support
> > our ideology. Perl style regexes are not Pythonic. They violate our
> > philosophy in too many places.
>
> Or we could implement de-facto standards where they exist.

Are you so naive as to think that the Perl folks are even *slightly*
interested in intuitive regexps? Have you written, or even read, any
Perl code my friend? The *standards* are broken. Obviously they don't
care, or they prefer the esoteric nature of their cryptic creation.

> *plonk*

And good day to you.

[toc] | [prev] | [next] | [standalone]

#19448

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2012-01-25 16:36 -0700
Message-ID	<mailman.5101.1327534637.27778.python-list@python.org>
In reply to	#19435

On Wed, Jan 25, 2012 at 2:19 PM, Rick Johnson
<rantingrickjohnson@gmail.com> wrote:
> I disagree here.
> Whist some people may be "die-hard" fans of the un-intuitive perl
> regex syntax, i believe many, if not exponentially MORE people would
> like to have a better alternative. Do i want to remove the current
> "well established" re module? No. But i would like to create a new
> regex module that is more pythonic. A regex module that we can be
> proud of. And just maybe, a regex module that "sets the bar" for all
> other regular expressions.

Compact regex notations are inherently unpythonic.  While your
reimplementation may be more intuitive to you, I don't think that it's
more pythonic at all.

> Window dressing is important Ian, if not, then shop owners would not
> continue to show displays in their shop windows. What does window
> dressing do exactly? It attracts the masses, and without the masses
> all merchants will eventually go out of buisness. Note: my argument
> HAS NOTHING to do with the number of folks programming python (or any
> language). The argument is focused on module sustainability in a
> community. Modules that are morbidly DIFFICULT to learn do not last.

Well, FWIW, I think that the current re module was easier for me to
learn than your version would have been, mainly because the re module
matches the syntax that I was already familiar with well before I
started using Python.  If you think you can do better, though, I
encourage you to actually write your regex module and put it up on
PyPI.

> I know about PyParsing but i believe we have room for PyParsing and a
> more Pythonic take on Perl style regular expressions. I don't see why
> we could not keep all three. Let the people decide what is best for
> them.

PyParsing produces recursive descent parsers.  It's an alternative to
regular expressions for a different class of parsing problems, not a
replacement, and so it's not particularly germane to this discussion.

[toc] | [prev] | [next] | [standalone]

#19450

From	Rick Johnson <rantingrickjohnson@gmail.com>
Date	2012-01-25 16:14 -0800
Message-ID	<de407e6d-e508-4650-8531-8c3f515ab812@f12g2000yqo.googlegroups.com>
In reply to	#19448

On Jan 25, 5:36 pm, Ian Kelly <ian.g.ke...@gmail.com> wrote:
> On Wed, Jan 25, 2012 at 2:19 PM, Rick Johnson
>
> <rantingrickjohn...@gmail.com> wrote:
> > I disagree here.
> > Whist some people may be "die-hard" fans of the un-intuitive perl
> > regex syntax, i believe many, if not exponentially MORE people would
> > like to have a better alternative. Do i want to remove the current
> > "well established" re module? No. But i would like to create a new
> > regex module that is more pythonic. A regex module that we can be
> > proud of. And just maybe, a regex module that "sets the bar" for all
> > other regular expressions.
>
> Compact regex notations are inherently unpythonic.  While your
> reimplementation may be more intuitive to you, I don't think that it's
> more pythonic at all.

Regexps will never be "truly Pythonic". By their very nature they must
be implicit, complicated, most times nested and dense, not as readable
as we'd like, special cases everywhere, not very practical, hard(sic)
to explain, and just plain cryptic. They violate almost every aspect
of the zen. The point is NOT to make regexes "Pythonic", the point is
to make them as "Pythonic" as we can and not a bit more. I discussed
this very topic earlier, did you miss my speech? I though it was quite
elegant...

Rick Johnsons stump speech 2.0: """ The greatest aspect of regexes is
their compactness, and not only should we keep them compact, we should
celebrate their compactness. It is in that respect that regexes will
always be cryptic to the neophyte, however, we must NEVER allow
regexes to be a scourge on the initiated, no. We must balance the
compact and the intuitive natures of regexes until we reach a natural
harmony. But most importantly, we must understand that these aspects
of regexes are NOT mutually exclusive -- for it is our understanding
that is flawed."""

*applause*

> > I know about PyParsing but i believe we have room for PyParsing and a
> > more Pythonic take on Perl style regular expressions. I don't see why
> > we could not keep all three. Let the people decide what is best for
> > them.
>
> PyParsing produces recursive descent parsers.  It's an alternative to
> regular expressions for a different class of parsing problems, not a
> replacement, and so it's not particularly germane to this discussion.

It is germane in the fact that i believe PyParsing, re, and my new
regex module can co-exist in harmony.

[toc] | [prev] | [next] | [standalone]

#19460

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2012-01-26 01:58 +0000
Message-ID	<4f20b329$0$29968$c3e8da3$5496439d@news.astraweb.com>
In reply to	#19450

On Wed, 25 Jan 2012 16:14:09 -0800, Rick Johnson wrote:

> It is germane in the fact that i believe PyParsing, re, and my new regex
> module can co-exist in harmony.

You don't have a new regex module.

When you have written it, then you will have a new regex module. Until 
then, you're all talk.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#19463

From	Devin Jeanpierre <jeanpierreda@gmail.com>
Date	2012-01-25 21:24 -0500
Message-ID	<mailman.5108.1327544703.27778.python-list@python.org>
In reply to	#19450

On Wed, Jan 25, 2012 at 7:14 PM, Rick Johnson
<rantingrickjohnson@gmail.com> wrote:
> It is germane in the fact that i believe PyParsing, re, and my new
> regex module can co-exist in harmony.

If all you're going to change is the parser, maybe it'd be easier to
get things to coexist if parsers were pluggable in the re module.

It's more generally useful, too. Would let re gain a PyParsing/SNOBOL
like expression "syntax", for example. Or a regular grammar syntax.
Neat for experimentation.

-- Devin

[toc] | [prev] | [next] | [standalone]

#19471

From	Rick Johnson <rantingrickjohnson@gmail.com>
Date	2012-01-25 20:21 -0800
Message-ID	<1d5032da-1c8e-4fb1-bee6-a43b37f2f90e@b23g2000yqn.googlegroups.com>
In reply to	#19463

On Jan 25, 8:24 pm, Devin Jeanpierre <jeanpierr...@gmail.com> wrote:
> On Wed, Jan 25, 2012 at 7:14 PM, Rick Johnson
>
> <rantingrickjohn...@gmail.com> wrote:
> > It is germane in the fact that i believe PyParsing, re, and my new
> > regex module can co-exist in harmony.
>
> If all you're going to change is the parser, maybe it'd be easier to
> get things to coexist if parsers were pluggable in the re module.
>
> It's more generally useful, too. Would let re gain a PyParsing/SNOBOL
> like expression "syntax", for example. Or a regular grammar syntax.
> Neat for experimentation.

I like your idea. Not sure about feasibility though. Unfortunately the
Python module "re" is under proprietary copyright. Hmm, seems not
everything is completely open source in the python world.

# This version of the SRE library can be redistributed under CNRI's
# Python 1.6 license.  For any other use, please contact Secret Labs
# AB (info@pythonware.com).

I need to dive into the "re" base code and see what is possible. My
original idea was to just start from scratch, but that may be foolish
considering all the scaffolding that will need to be erected.

[toc] | [prev] | [next] | [standalone]

#19477

From	Evan Driscoll <edriscoll@wisc.edu>
Date	2012-01-25 23:38 -0600
Message-ID	<mailman.5113.1327556384.27778.python-list@python.org>
In reply to	#19450

[Multipart message — attachments visible in raw view] — view raw

On 1/25/2012 20:24, Devin Jeanpierre wrote:
> If all you're going to change is the parser, maybe it'd be easier to
> get things to coexist if parsers were pluggable in the re module.
>
> It's more generally useful, too. Would let re gain a PyParsing/SNOBOL
> like expression "syntax", for example. Or a regular grammar syntax.
> Neat for experimentation.

I don't know what would be involved in that, but if it could be made to
work, that sounds to me like a remarkably good idea to have come out of
this thread.

(Now it's time for my own troll: "About as good of an idea as no longer
calling PCRE-alikes 'regular expressions', because they aren't." Ahhh,
got that out of my system. :-))

Evan

[toc] | [prev] | [next] | [standalone]

#19476

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2012-01-26 05:28 +0000
Message-ID	<4f20e474$0$29968$c3e8da3$5496439d@news.astraweb.com>
In reply to	#19426

On Wed, 25 Jan 2012 13:17:11 -0700, Ian Kelly wrote:

> 2) Permitting flags in the regular expression allows different
> combinations of flags to be in effect for different parts of complex
> regular expressions.  You can't do that just by passing in the flags as
> an argument.

I don't believe Python's regex engine supports scoped flags, I think all 
flags are global to the entire regex.

MRAB's regex engine does support scoped flags.

http://pypi.python.org/pypi/regex

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#19429

From	Terry Reedy <tjreedy@udel.edu>
Date	2012-01-25 15:33 -0500
Message-ID	<mailman.5091.1327523659.27778.python-list@python.org>
In reply to	#19409

On 1/25/2012 12:16 PM, Rick Johnson wrote:
>
> (?...)  # Base Extension Syntax
> All extensions are wrapped in parenthesis and start with a question
> mark, but i believe the question mark was a very bad choice, since the

I think that syntax came either from Perl or the pcre library used by 
several open source programs, including several Python versions.
https://en.wikipedia.org/wiki/Pcre
has  some info on this.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#19447

From	Devin Jeanpierre <jeanpierreda@gmail.com>
Date	2012-01-25 18:28 -0500
Message-ID	<mailman.5100.1327534149.27778.python-list@python.org>
In reply to	#19409

On Wed, Jan 25, 2012 at 12:16 PM, Rick Johnson
<rantingrickjohnson@gmail.com> wrote:
> In particular i find the "extension notation" syntax to be woefully
> inadequate. You should be able to infer the action of the extension
> syntax intuitively, simply from looking at its signature.

This is nice in theory. I see no reason to believe this is possible,
or that your syntax is closer to this ideal than the existing syntax.

Perhaps you should perform some experiments to prove intuitiveness?
Science is more convincing than insults.

Also, the "!" in negative assertions doesn't stand for "not equal" --
matches aren't equality. It stands for "not". It's the "=" that's a
misnomer.

-- Devin

[toc] | [prev] | [next] | [standalone]

#19449

From	Rick Johnson <rantingrickjohnson@gmail.com>
Date	2012-01-25 15:44 -0800
Message-ID	<0869d1e8-225b-45f8-a57f-f21c2831b283@f14g2000yqe.googlegroups.com>
In reply to	#19447

On Jan 25, 5:28 pm, Devin Jeanpierre <jeanpierr...@gmail.com> wrote:
> Perhaps you should perform some experiments to prove intuitiveness [of your syntax]?

I've posted my thoughts and my initial syntax. You (and everyone else)
are free to critic or offer suggestions of your own. Listen, none of
these issues that plague Python are going to be resolved until people
around here set aside the grudges and haughty arrogance. We need to
get to work. But step one is NOT writing code. Step one is to gather
the community into lively discussion on these crucial topics. And the
folks who really want to get involved are not going to speak up unless
the rhetoric is toned down a bit.

> Science is more convincing than insults.

I can assure you my intentions are not to insult. My blanket
observations is that the current Python re syntax is not intuitive
enough for Python, and that we can make it better.

[toc] | [prev] | [next] | [standalone]

#19475

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2012-01-26 05:28 +0000
Message-ID	<4f20e45f$0$29968$c3e8da3$5496439d@news.astraweb.com>
In reply to	#19449

On Wed, 25 Jan 2012 15:44:35 -0800, Rick Johnson wrote:

> I've posted my thoughts and my initial syntax. You (and everyone else)
> are free to critic or offer suggestions of your own. Listen, none of
> these issues that plague Python are going to be resolved until people
> around here set aside the grudges and haughty arrogance. We need to get
> to work. But step one is NOT writing code.

Well, that suits you well then, since you're an expert on not writing 
code.

How is that fork of Python coming along? I really look forward to the day 
that you make good on your promise to fork the language so all the right-
thinking people can follow you to the Promised Land.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#19459

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2012-01-26 01:54 +0000
Message-ID	<4f20b26d$0$29968$c3e8da3$5496439d@news.astraweb.com>
In reply to	#19409

On Wed, 25 Jan 2012 09:16:01 -0800, Rick Johnson wrote:

> In particular i find the "extension notation" syntax to be woefully
> inadequate. You should be able to infer the action of the extension
> syntax intuitively, simply from looking at its signature. I find myself
> continually needing to consult the docs because of the lacking or
> misleading style of the current syntax. Consider:

The only intuitive interface is the nipple. Everything else is learned.

Nevertheless, there are legitimate problems with Python's regex syntax. 
It is based on Perl's syntax, and even Larry Wall agrees that it has some 
serious problems. 

Read Apocalypse 5: Wall gives a fantastic explanation of what's wrong 
with current regex syntax (without such trivial platitudes as "it is not 
intuitive", as if we can all agree on what it intuitive), why it has 
become that way, and what Perl 6 will do about it.

http://www.perl.com/pub/2002/06/04/apo5.html

Regexes are essentially a programming language. They may or may not be 
Turing complete, depending on the implementation (true regexes are not, 
but Perl regexes are more powerful than true regexes), but they are still 
a programming language. And users want regexes to be concise, otherwise 
they would ask for richer string search methods and avoid regexes 
altogether.

The problem is that conciseness and readability are usually (but not 
always) in opposition. So regexes will never be as readable as Python 
code, because the requirements of regexes -- that they be short, concise, 
and usually written as one-liners (or at least one-liners must be 
possible) -- do not meet Python standards of readability. How can they? 
Regexes are shorthand. If you want longhand, write your search in 
straight Python.

> PS: In my eyes, Python 3000 is already a dinosaur.

We look forward to seeing your re-write. I'm sure all right-thinking 
programmers will flock to your Python fork as soon as you start writing 
it.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

Page 1 of 2 [1] 2 Next page →

csiph-web

PyWart: Python regular expression syntax is not intuitive.

Contents

#19409 — PyWart: Python regular expression syntax is not intuitive.

#19413

#19420

#19445

#19426

#19435

#19438

#19440

#19448

#19450

#19460

#19463

#19471

#19477

#19476

#19429

#19447

#19449

#19475

#19459