Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.softwaretools > #34

Re: Regex tools

From Robert Klemme <shortcutter@googlemail.com>
Newsgroups comp.lang.java.softwaretools
Subject Re: Regex tools
Date 2011-05-27 21:38 +0200
Message-ID <94acthF4dgU1@mid.individual.net> (permalink)
References <9516p69657r0vi887067gnu632f0hvmrbv@4ax.com>

Show all headers | View raw


On 30.03.2011 12:27, Roedy Green wrote:
> has anyone ever thought of writing a regex debugging frame.

There are tools out there which let you debug a regular expression. 
This also greatly helps understanding how matching proceeds.  Turns out 
I recommended Regexp Coach to you already:

http://www.velocityreviews.com/forums/t648698-debugging-regex.html

> You would give it lists of lines that should match and lists of lines
> that should not.
>
> It then shows you the furthest offset in the regex it managed to match
> to for each line, and verifies lines that should match matched (and
> the group(i), and that lines that did not match did not.

Well, basically you can easily write a short program which throws a 
number of texts against a regular expression and spits out matches, 
their positions etc.

> Once you have mastered that, consider writing a automatic regex
> composer.  Just from the examples it composes a regex.  You then add
> more examples to test the regex, or modify it by hand. Perhaps it even
> generates more examples to clarify your intentions.  You iterate till
> you have a working regex.

This is basically impossible to do automatically.  The formal reason is 
that all repetition operators match indefinite many strings.  The more 
practical reason is that for a number of inputs there are likely 
multiple patterns which match.  How do you want to decide whether "abc" 
and "aab" were intended to be matched by /a+b+c*/ or /a+\w+/ or even 
/\w+/?  The story becomes more complicated when adding groups to the 
mix.  You would at least have to identify corresponding regions in all 
strings which you provide which soon will get messy.  Frankly, I'd 
rather be writing expressions by hand - much quicker.

> The thing that bothers me is I am never sure a regex is fully
> debugged.

Typically you would write proper tests for matches and mismatches which 
especially ensure that "near matches" do not accidentally match.  Other 
than that probably no piece of software is fully debugged - ever.  You 
need to test until you have gained enough confidence.  Well, with 
regular expressions it may actually be possible to formally prove that 
they are correct - if you can equally formally specify the possible 
input.  I wouldn't bother to spend the efforts; for me testing produces 
enough confidence, because I try to carefully pick test cases.

Kind regards

	robert


-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Back to comp.lang.java.softwaretools | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

Regex tools Roedy Green <see_website@mindprod.com.invalid> - 2011-03-30 03:27 -0700
  Re: Regex tools David Lamb <dalamb@cs.queensu.ca> - 2011-03-30 16:31 -0400
  Re: Regex tools terry0k <terryok00@gmail.com> - 2011-05-12 21:01 +0800
  Re: Regex tools Robert Klemme <shortcutter@googlemail.com> - 2011-05-27 21:38 +0200

csiph-web