Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.softwaretools > #16 > unrolled thread
| Started by | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| First post | 2011-03-30 03:27 -0700 |
| Last post | 2011-05-27 21:38 +0200 |
| Articles | 4 — 4 participants |
Back to article view | Back to comp.lang.java.softwaretools
Regex tools Roedy Green <see_website@mindprod.com.invalid> - 2011-03-30 03:27 -0700
Re: Regex tools David Lamb <dalamb@cs.queensu.ca> - 2011-03-30 16:31 -0400
Re: Regex tools terry0k <terryok00@gmail.com> - 2011-05-12 21:01 +0800
Re: Regex tools Robert Klemme <shortcutter@googlemail.com> - 2011-05-27 21:38 +0200
| From | Roedy Green <see_website@mindprod.com.invalid> |
|---|---|
| Date | 2011-03-30 03:27 -0700 |
| Subject | Regex tools |
| Message-ID | <9516p69657r0vi887067gnu632f0hvmrbv@4ax.com> |
has anyone ever thought of writing a regex debugging frame. You would give it lists of lines that should match and lists of lines that should not. It then shows you the furthest offset in the regex it managed to match to for each line, and verifies lines that should match matched (and the group(i), and that lines that did not match did not. Once you have mastered that, consider writing a automatic regex composer. Just from the examples it composes a regex. You then add more examples to test the regex, or modify it by hand. Perhaps it even generates more examples to clarify your intentions. You iterate till you have a working regex. The thing that bothers me is I am never sure a regex is fully debugged. -- Roedy Green Canadian Mind Products http://mindprod.com There are only two industries that refer to their customers as "users". ~ Edward Tufte
[toc] | [next] | [standalone]
| From | David Lamb <dalamb@cs.queensu.ca> |
|---|---|
| Date | 2011-03-30 16:31 -0400 |
| Message-ID | <GqMkp.1583$g56.573@newsfe04.iad> |
| In reply to | #16 |
On 30/03/2011 6:27 AM, Roedy Green wrote: > has anyone ever thought of writing a regex debugging frame. > The thing that bothers me is I am never sure a regex is fully > debugged. It has been far too long since I've looked at finite automata theory, but I suspect somebody already develiped some algorithm generating a minimal set of inputs that drives a deterministic finite state machine through some suitable set of paths through its states. (I'm thinking of needing only a few of test cases for (a)*: zero a's, one, and either 2 or some larger number. But regexes in programming languages are lots more complex than the ones usually used in theory (where all you get is primitives like single symbol, sequencing like ab, repetition like a*, and alternation like a|b.
[toc] | [prev] | [next] | [standalone]
| From | terry0k <terryok00@gmail.com> |
|---|---|
| Date | 2011-05-12 21:01 +0800 |
| Message-ID | <4dcbda12$0$13394$afc38c87@news.optusnet.com.au> |
| In reply to | #16 |
On 03/30/2011 06:27 PM, Roedy Green wrote:
> has anyone ever thought of writing a regex debugging frame.
>
> You would give it lists of lines that should match and lists of lines
> that should not.
>
> It then shows you the furthest offset in the regex it managed to match
> to for each line, and verifies lines that should match matched (and
> the group(i), and that lines that did not match did not.
>
> Once you have mastered that, consider writing a automatic regex
> composer. Just from the examples it composes a regex. You then add
> more examples to test the regex, or modify it by hand. Perhaps it even
> generates more examples to clarify your intentions. You iterate till
> you have a working regex.
>
> The thing that bothers me is I am never sure a regex is fully
> debugged.
>
Roedy/guys,
I am sort of dealing in this area. My project is based on a Swing GUI-
'TelFormFactory' which generates an XML Schema which defines the
allowable content in a data form. It then generates 'TelForms' which
allow input, data-checking and sending of data to 'TelFormHost'.
Currently TelForm clients are Application and Applet (J2SE1.2 for market
breadth) and Midlet CLDC 1.1 and midp1. All 3 takes about 10 seconds.
The W3C XML Schema standard allows for a subset of the Perl (and Java)
RegEx to restrict, or otherwise re-define, the base datatype of the
TelForm field. TelFormFactory has a facility to input a RegEx and test
data against this. I have about 4 XML Schema validators and this is
definitely an area that needs more work. I have spent some time on
'RegEx Builder' which allows trialling interim expressions. I haven't
looked at this for a year, and largely switched it off but if people are
interested I will put it out. All feedback welcome.
See www.terry-comms.com
Cheers,
Terry O'K.
[toc] | [prev] | [next] | [standalone]
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Date | 2011-05-27 21:38 +0200 |
| Message-ID | <94acthF4dgU1@mid.individual.net> |
| In reply to | #16 |
On 30.03.2011 12:27, Roedy Green wrote: > has anyone ever thought of writing a regex debugging frame. There are tools out there which let you debug a regular expression. This also greatly helps understanding how matching proceeds. Turns out I recommended Regexp Coach to you already: http://www.velocityreviews.com/forums/t648698-debugging-regex.html > You would give it lists of lines that should match and lists of lines > that should not. > > It then shows you the furthest offset in the regex it managed to match > to for each line, and verifies lines that should match matched (and > the group(i), and that lines that did not match did not. Well, basically you can easily write a short program which throws a number of texts against a regular expression and spits out matches, their positions etc. > Once you have mastered that, consider writing a automatic regex > composer. Just from the examples it composes a regex. You then add > more examples to test the regex, or modify it by hand. Perhaps it even > generates more examples to clarify your intentions. You iterate till > you have a working regex. This is basically impossible to do automatically. The formal reason is that all repetition operators match indefinite many strings. The more practical reason is that for a number of inputs there are likely multiple patterns which match. How do you want to decide whether "abc" and "aab" were intended to be matched by /a+b+c*/ or /a+\w+/ or even /\w+/? The story becomes more complicated when adding groups to the mix. You would at least have to identify corresponding regions in all strings which you provide which soon will get messy. Frankly, I'd rather be writing expressions by hand - much quicker. > The thing that bothers me is I am never sure a regex is fully > debugged. Typically you would write proper tests for matches and mismatches which especially ensure that "near matches" do not accidentally match. Other than that probably no piece of software is fully debugged - ever. You need to test until you have gained enough confidence. Well, with regular expressions it may actually be possible to formally prove that they are correct - if you can equally formally specify the possible input. I wouldn't bother to spend the efforts; for me testing produces enough confidence, because I try to carefully pick test cases. Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.java.softwaretools
csiph-web