Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!xlned.com!feeder3.xlned.com!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
To: python-list@python.org
From: Malte Forkel <malte.forkel@berlin.de>
Subject: Utility to locate errors in regular expressions
Date: Fri, 24 May 2013 14:58:24 +0200
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 7bit
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130509 Thunderbird/17.0.6
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.2062.1369400329.3114.python-list@python.org>
Lines: 16
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:45881

Finding out why a regular expression does not match a given string can
very tedious. I would like to write a utility that identifies the
sub-expression causing the non-match. My idea is to use a parser to
create a tree representing the complete regular expression. Then I could
simplify the expression by dropping sub-expressions one by one from
right to left and from bottom to top until the remaining regex matches.
The last sub-expression dropped should be (part of) the problem.

As a first step, I am looking for a parser for Python regular
expressions, or a Python regex grammar to create a parser from.

But may be my idea is flawed? Or a similar (or better) tools already
exists? Any advice will be highly appreciated!

Malte