Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #50786 > unrolled thread

Re: grimace: a fluent regular expression generator in Python

Started byBen Last <ben@benlast.com>
First post2013-07-17 10:33 +0800
Last post2013-07-18 11:51 +1200
Articles 3 — 3 participants

Back to article view | Back to comp.lang.python


Contents

  Re: grimace: a fluent regular expression generator in Python Ben Last <ben@benlast.com> - 2013-07-17 10:33 +0800
    Re: grimace: a fluent regular expression generator in Python Johann Hibschman <jhibschman@gmail.com> - 2013-07-17 07:55 -0500
    Re: grimace: a fluent regular expression generator in Python Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2013-07-18 11:51 +1200

#50786 — Re: grimace: a fluent regular expression generator in Python

FromBen Last <ben@benlast.com>
Date2013-07-17 10:33 +0800
SubjectRe: grimace: a fluent regular expression generator in Python
Message-ID<mailman.4800.1374059632.3114.python-list@python.org>

[Multipart message — attachments visible in raw view] — view raw

On 16 July 2013 20:48, <python-list-request@python.org> wrote:

> From: "Anders J. Munch" <2013@jmunch.dk>
> Date: Tue, 16 Jul 2013 13:38:35 +0200
> Ben Last wrote:
>
>> north_american_number_re = (RE().start
>> .literal('(').followed_by.**exactly(3).digits.then.**literal(')')
>>                                      .then.one.literal("-").then.**
>> exactly(3).digits
>> .then.one.dash.followed_by.**exactly(4).digits.then.end
>>                                      .as_string())
>>
>
> Very cool.  It's a bit verbose for my taste, and I'm not sure how well it
> will cope with nested structure.
>

I guess verbosity is the aim, in that *explicit is better than implicit* :)
 And I suppose that's one of the attributes of a fluent system; they tend
to need more typing.  It's not Perl...



> The problem with Perl-style regexp notation isn't so much that it's terse
> - it's that the syntax is irregular (sic) and doesn't follow modern
> principles for lexical structure in computer languages.  You can get a long
> way just by ignoring whitespace, putting literals in quotes and allowing
> embedded comments.
>

Good points.  I wanted to find a syntax that allows comments as well as
being fluent:

RE()
.any_number_of.digits  # Recall that any_number_of includes zero
.followed_by.an_optional.dot.then.at_least_one.digit  # The dot is
specifically optional
# but we must have one digit as a minimum
.as_string()

... and yes, I aso specifically wanted to have literals quoted.

Nested groups work, but I haven't tackled lookahead and backreferences :
essentially because if you're writing an RE that complex, you should
probably be working directly in RE strings.

Depending on what you mean by "nested", re-use of RE objects is easy
(example from the unit tests):

identifier_start_chars = RE().regex("[a-zA-Z_]")
identifier_chars = RE().regex("[a-zA-Z0-9_]")

self.assertEqual(RE().one_or_more.of(identifier_start_chars)
                     .followed_by.zero_or_more(identifier_chars)
                     .as_string(),
                     r"[a-zA-Z_]+[a-zA-Z0-9_]*")


Thanks for the comments!
ben

[toc] | [next] | [standalone]


#50788

FromJohann Hibschman <jhibschman@gmail.com>
Date2013-07-17 07:55 -0500
Message-ID<xjrkgppuh73cm.fsf@gmail.com>
In reply to#50786
Ben Last <ben@benlast.com> writes:

> Good points. I wanted to find a syntax that allows comments as well as
> being fluent:
> RE()
> .any_number_of.digits # Recall that any_number_of includes zero 
> .followed_by.an_optional.dot.then.at_least_one.digit # The dot is
> specifically optional
> # but we must have one digit as a minimum
> .as_string()

Speaking of syntax, have you looked at pyparsing?  I like their
pattern-matching syntax, and I can see it being applied to regexes.

They use an operator-heavy syntax, like:

    '(' + digits * 3 + ')-' + digits * 3 + '-' + digits * 4

That seems easier for me to read than the foo.then.follow syntax.

That then makes me think of ometa, which is a fun read, but probably not
completely relevant.

Regards,
Johann

[toc] | [prev] | [next] | [standalone]


#50810

FromGregory Ewing <greg.ewing@canterbury.ac.nz>
Date2013-07-18 11:51 +1200
Message-ID<b4op08FehpU1@mid.individual.net>
In reply to#50786
Ben Last wrote:
>         north_american_number_re = (RE().start
>         .literal('(').followed_by.__exactly(3).digits.then.__literal(')')                        
>          .then.one.literal("-").then.__exactly(3).digits
>         .then.one.dash.followed_by.__exactly(4).digits.then.end
>                                              .as_string())

Is 'dash' the same as 'literal("-")'?

Is there any difference between 'then' and 'followed_by'?

Why do some things have __ in front of them? Is there a
difference between 'literal' and '__literal'?

-- 
Greg

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web