Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #50786

Re: grimace: a fluent regular expression generator in Python

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <benlast@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.001
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'essentially': 0.04; 'languages.': 0.04; 'syntax': 0.04; 'subject:Python': 0.06; 'explicit': 0.07; 'nested': 0.07; 'suppose': 0.07; 'attributes': 0.09; 'strings.': 0.09; 'system;': 0.09; 'email addr:python.org&gt;': 0.11; 'cool.': 0.16; 'fluent': 0.16; 'ignoring': 0.16; 'irregular': 0.16; 'lexical': 0.16; 'literals': 0.16; 'notation': 0.16; 'optional': 0.16; 'subject:expression': 0.16; 'subject:generator': 0.16; 'subject:regular': 0.16; 'terse': 0.16; '\xc2\xa0i': 0.16; '\xc2\xa0you': 0.16; 'sender:addr:gmail.com': 0.17; 'wrote:': 0.18; 'bit': 0.19; 'skip:f 30': 0.19; 'work,': 0.20; '8bit%:5': 0.22; 'putting': 0.22; 'tend': 0.24; "haven't": 0.24; '+0200': 0.26; 'specifically': 0.29; "doesn't": 0.30; 'message- id:@mail.gmail.com': 0.30; "i'm": 0.30; 'comments': 0.31; 'quotes': 0.31; 'allows': 0.31; 'probably': 0.32; 'guess': 0.33; 'date:': 0.34; 'problem': 0.35; 'objects': 0.35; 'but': 0.35; 'received:google.com': 0.35; '8bit%:9': 0.36; 'skip:s 60': 0.36; 'thanks': 0.36; 'should': 0.36; 'unit': 0.37; 'being': 0.38; 'minimum': 0.38; 'skip:& 10': 0.38; '8bit%:4': 0.38; 'ben': 0.38; 'skip:. 20': 0.38; 'to:addr:python-list': 0.38; 'embedded': 0.39; 'skip:. 10': 0.39; 'structure': 0.39; 'sure': 0.39; 'to:addr:python.org': 0.39; '8bit%:6': 0.40; 'how': 0.40; 'easy': 0.60; 'skip:\xc2 10': 0.60; "you're": 0.61; 'july': 0.63; 'more': 0.64; 'skip:r 30': 0.69; 'jul': 0.74; '8bit%:24': 0.84; 'complex,': 0.84; 'regexp': 0.84; 'skip:. 50': 0.84; 'skip:. 60': 0.84; '2013': 0.98
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:from:date:x-google-sender-auth:message-id :subject:to:content-type; bh=DVkfjgHFRr6WHVbUzK8+V3b6+NwgMQut4R0SXbzh6ic=; b=cN47W1cQbgX2CX6xyjnjRk1gvkTtXNC+QGrvaq4q0PYhEqGACLsffHRmCzAKfKOfNd LJfn4bqK8al5x4jdk5py8Gz1ZHW6Pf2ubEMKhCIgZv2I94YTqtXcxZqhwpNEOnX9mWp+ RCFOyRgvEQGt8ZiGklWTuOwVX95ZSKuvCK1DLlNNOrSdUrq003CRo/EMIUo9lYekhM5+ 8L2wBEB41ISbsF2Jw+/C81Yl+Y6Z/bpF4zzMZIjd8Ajqtn6yr90CwnaFtzzB4XVXKuem 0nKgGWymTy0I9RO4THaPuLYbbwuwedP9F9eo/4jvx7Vr6UThxKw9ean3bgnxW7X0Pb3X LaJA==
X-Received by 10.194.178.138 with SMTP id cy10mr3222318wjc.61.1374028417450; Tue, 16 Jul 2013 19:33:37 -0700 (PDT)
MIME-Version 1.0
Sender benlast@gmail.com
From Ben Last <ben@benlast.com>
Date Wed, 17 Jul 2013 10:33:17 +0800
X-Google-Sender-Auth xfDckj6l0NkVX0w91aaDLhxAoHs
Subject Re: grimace: a fluent regular expression generator in Python
To python-list@python.org
Content-Type multipart/alternative; boundary=089e013d1dc6bbc8af04e1abeb3d
X-Mailman-Approved-At Wed, 17 Jul 2013 13:13:51 +0200
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.4800.1374059632.3114.python-list@python.org> (permalink)
Lines 152
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1374059632 news.xs4all.nl 15921 [2001:888:2000:d::a6]:46746
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:50786

Show key headers only | View raw


[Multipart message — attachments visible in raw view] - view raw

On 16 July 2013 20:48, <python-list-request@python.org> wrote:

> From: "Anders J. Munch" <2013@jmunch.dk>
> Date: Tue, 16 Jul 2013 13:38:35 +0200
> Ben Last wrote:
>
>> north_american_number_re = (RE().start
>> .literal('(').followed_by.**exactly(3).digits.then.**literal(')')
>>                                      .then.one.literal("-").then.**
>> exactly(3).digits
>> .then.one.dash.followed_by.**exactly(4).digits.then.end
>>                                      .as_string())
>>
>
> Very cool.  It's a bit verbose for my taste, and I'm not sure how well it
> will cope with nested structure.
>

I guess verbosity is the aim, in that *explicit is better than implicit* :)
 And I suppose that's one of the attributes of a fluent system; they tend
to need more typing.  It's not Perl...



> The problem with Perl-style regexp notation isn't so much that it's terse
> - it's that the syntax is irregular (sic) and doesn't follow modern
> principles for lexical structure in computer languages.  You can get a long
> way just by ignoring whitespace, putting literals in quotes and allowing
> embedded comments.
>

Good points.  I wanted to find a syntax that allows comments as well as
being fluent:

RE()
.any_number_of.digits  # Recall that any_number_of includes zero
.followed_by.an_optional.dot.then.at_least_one.digit  # The dot is
specifically optional
# but we must have one digit as a minimum
.as_string()

... and yes, I aso specifically wanted to have literals quoted.

Nested groups work, but I haven't tackled lookahead and backreferences :
essentially because if you're writing an RE that complex, you should
probably be working directly in RE strings.

Depending on what you mean by "nested", re-use of RE objects is easy
(example from the unit tests):

identifier_start_chars = RE().regex("[a-zA-Z_]")
identifier_chars = RE().regex("[a-zA-Z0-9_]")

self.assertEqual(RE().one_or_more.of(identifier_start_chars)
                     .followed_by.zero_or_more(identifier_chars)
                     .as_string(),
                     r"[a-zA-Z_]+[a-zA-Z0-9_]*")


Thanks for the comments!
ben

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Re: grimace: a fluent regular expression generator in Python Ben Last <ben@benlast.com> - 2013-07-17 10:33 +0800
  Re: grimace: a fluent regular expression generator in Python Johann Hibschman <jhibschman@gmail.com> - 2013-07-17 07:55 -0500
  Re: grimace: a fluent regular expression generator in Python Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2013-07-18 11:51 +1200

csiph-web