Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.softwaretools > #34
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Newsgroups | comp.lang.java.softwaretools |
| Subject | Re: Regex tools |
| Date | 2011-05-27 21:38 +0200 |
| Message-ID | <94acthF4dgU1@mid.individual.net> (permalink) |
| References | <9516p69657r0vi887067gnu632f0hvmrbv@4ax.com> |
On 30.03.2011 12:27, Roedy Green wrote: > has anyone ever thought of writing a regex debugging frame. There are tools out there which let you debug a regular expression. This also greatly helps understanding how matching proceeds. Turns out I recommended Regexp Coach to you already: http://www.velocityreviews.com/forums/t648698-debugging-regex.html > You would give it lists of lines that should match and lists of lines > that should not. > > It then shows you the furthest offset in the regex it managed to match > to for each line, and verifies lines that should match matched (and > the group(i), and that lines that did not match did not. Well, basically you can easily write a short program which throws a number of texts against a regular expression and spits out matches, their positions etc. > Once you have mastered that, consider writing a automatic regex > composer. Just from the examples it composes a regex. You then add > more examples to test the regex, or modify it by hand. Perhaps it even > generates more examples to clarify your intentions. You iterate till > you have a working regex. This is basically impossible to do automatically. The formal reason is that all repetition operators match indefinite many strings. The more practical reason is that for a number of inputs there are likely multiple patterns which match. How do you want to decide whether "abc" and "aab" were intended to be matched by /a+b+c*/ or /a+\w+/ or even /\w+/? The story becomes more complicated when adding groups to the mix. You would at least have to identify corresponding regions in all strings which you provide which soon will get messy. Frankly, I'd rather be writing expressions by hand - much quicker. > The thing that bothers me is I am never sure a regex is fully > debugged. Typically you would write proper tests for matches and mismatches which especially ensure that "near matches" do not accidentally match. Other than that probably no piece of software is fully debugged - ever. You need to test until you have gained enough confidence. Well, with regular expressions it may actually be possible to formally prove that they are correct - if you can equally formally specify the possible input. I wouldn't bother to spend the efforts; for me testing produces enough confidence, because I try to carefully pick test cases. Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
Back to comp.lang.java.softwaretools | Previous | Next — Previous in thread | Find similar | Unroll thread
Regex tools Roedy Green <see_website@mindprod.com.invalid> - 2011-03-30 03:27 -0700 Re: Regex tools David Lamb <dalamb@cs.queensu.ca> - 2011-03-30 16:31 -0400 Re: Regex tools terry0k <terryok00@gmail.com> - 2011-05-12 21:01 +0800 Re: Regex tools Robert Klemme <shortcutter@googlemail.com> - 2011-05-27 21:38 +0200
csiph-web