Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #45881 > unrolled thread

Utility to locate errors in regular expressions

Started byMalte Forkel <malte.forkel@berlin.de>
First post2013-05-24 14:58 +0200
Last post2013-05-24 20:21 +0200
Articles 5 — 5 participants

Back to article view | Back to comp.lang.python


Contents

  Utility to locate errors in regular expressions Malte Forkel <malte.forkel@berlin.de> - 2013-05-24 14:58 +0200
    Re: Utility to locate errors in regular expressions Roy Smith <roy@panix.com> - 2013-05-24 09:12 -0400
      Re: Utility to locate errors in regular expressions Neil Cerutti <neilc@norwich.edu> - 2013-05-24 13:58 +0000
    Re: Utility to locate errors in regular expressions rusi <rustompmody@gmail.com> - 2013-05-24 07:09 -0700
    Re: Utility to locate errors in regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2013-05-24 20:21 +0200

#45881 — Utility to locate errors in regular expressions

FromMalte Forkel <malte.forkel@berlin.de>
Date2013-05-24 14:58 +0200
SubjectUtility to locate errors in regular expressions
Message-ID<mailman.2062.1369400329.3114.python-list@python.org>
Finding out why a regular expression does not match a given string can
very tedious. I would like to write a utility that identifies the
sub-expression causing the non-match. My idea is to use a parser to
create a tree representing the complete regular expression. Then I could
simplify the expression by dropping sub-expressions one by one from
right to left and from bottom to top until the remaining regex matches.
The last sub-expression dropped should be (part of) the problem.

As a first step, I am looking for a parser for Python regular
expressions, or a Python regex grammar to create a parser from.

But may be my idea is flawed? Or a similar (or better) tools already
exists? Any advice will be highly appreciated!

Malte

[toc] | [next] | [standalone]


#45884

FromRoy Smith <roy@panix.com>
Date2013-05-24 09:12 -0400
Message-ID<roy-14A84E.09121624052013@news.panix.com>
In reply to#45881
In article <mailman.2062.1369400329.3114.python-list@python.org>,
 Malte Forkel <malte.forkel@berlin.de> wrote:

> Finding out why a regular expression does not match a given string can
> very tedious. I would like to write a utility that identifies the
> sub-expression causing the non-match. My idea is to use a parser to
> create a tree representing the complete regular expression. Then I could
> simplify the expression by dropping sub-expressions one by one from
> right to left and from bottom to top until the remaining regex matches.
> The last sub-expression dropped should be (part of) the problem.
> 
> As a first step, I am looking for a parser for Python regular
> expressions, or a Python regex grammar to create a parser from.
> 
> But may be my idea is flawed? Or a similar (or better) tools already
> exists? Any advice will be highly appreciated!

I think this would be a really cool tool.  The debugging process I've 
always used is essentially what you describe.  I start try progressively 
shorter sub-patterns until I get a match, then try to incrementally add 
back little bits of the original pattern until it no longer matches.  
With luck, the problem will become obvious at that point.

Having a tool which automated this would be really useful.

Of course, most of Python user community are wimps and shy away from big 
hairy regexes [ducking and running].

[toc] | [prev] | [next] | [standalone]


#45892

FromNeil Cerutti <neilc@norwich.edu>
Date2013-05-24 13:58 +0000
Message-ID<b09a0dFj505U1@mid.individual.net>
In reply to#45884
On 2013-05-24, Roy Smith <roy@panix.com> wrote:
> Of course, most of Python user community are wimps and shy away
> from big hairy regexes [ducking and running].

I prefer the simple, lumbering regular expressions like those in
the original Night of the Regular Expressions. The fast, powerful
ones from programs like the remake of Dawn of the GREP, just
aren't as scary.

-- 
Neil Cerutti

[toc] | [prev] | [next] | [standalone]


#45895

Fromrusi <rustompmody@gmail.com>
Date2013-05-24 07:09 -0700
Message-ID<a9ff7e3a-4e98-4a9f-9c84-db9f8bca3130@li6g2000pbb.googlegroups.com>
In reply to#45881
On May 24, 5:58 pm, Malte Forkel <malte.for...@berlin.de> wrote:
> Finding out why a regular expression does not match a given string can
> very tedious. I would like to write a utility that identifies the
> sub-expression causing the non-match. My idea is to use a parser to
> create a tree representing the complete regular expression. Then I could
> simplify the expression by dropping sub-expressions one by one from
> right to left and from bottom to top until the remaining regex matches.
> The last sub-expression dropped should be (part of) the problem.
>
> As a first step, I am looking for a parser for Python regular
> expressions, or a Python regex grammar to create a parser from.
>
> But may be my idea is flawed? Or a similar (or better) tools already
> exists? Any advice will be highly appreciated!
>
> Malte



python-specific:  http://kodos.sourceforge.net/
Online: http://gskinner.com/RegExr/
emacs-specific: re-builder and regex-tool http://bc.tech.coop/blog/071103.html

[toc] | [prev] | [next] | [standalone]


#45907

FromChristian Gollwitzer <auriocus@gmx.de>
Date2013-05-24 20:21 +0200
Message-ID<knoaru$4lr$1@dont-email.me>
In reply to#45881
Am 24.05.13 14:58, schrieb Malte Forkel:
> Finding out why a regular expression does not match a given string can
> very tedious. I would like to write a utility that identifies the
> sub-expression causing the non-match.

Try

	http://laurent.riesterer.free.fr/regexp/

it shows the subexpressions which cause the match by coloring the parts. 
Not exacty what you want, but very intuitive and powerful. Beware this 
is Tcl and there might be subtle differences in RE syntax, but largely 
it's the same.

	Christian

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web