Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.postscript > #3338
| Newsgroups | comp.lang.postscript |
|---|---|
| Date | 2019-02-14 17:51 -0800 |
| References | <6bf6a0c5-89ca-4fb0-b235-eee5ddd7f3ed@googlegroups.com> |
| Message-ID | <a2e14679-41e4-4f1d-b913-98ca3c9e08af@googlegroups.com> (permalink) |
| Subject | Re: PC Regex |
| From | luser droog <luser.droog@gmail.com> |
On Tuesday, February 12, 2019 at 5:48:45 PM UTC-6, luser droog wrote:
> I've used the Parser Combinators to implement a Regex parser
> and matcher. See the thread "Parser Combinators revisited" for
> the file struct2.ps.
>
> $ cat pc7.ps
[snip]
>
> And then I started the regex code in a new file. The first hurdle was
> finding a simple (and correct) grammar online to look at. First few
> hits in the search were duds. Next problem was recursion in the grammar.
> The top level production, /expression, is also a component of /atom
> if surrounded by parens.
>
> A little investigating discovered that all of the parsers produced
> by the simple combinators were procedures of length 7. So to build
> the recursion in the grammar, I *forward declare* the /expression
> parser as an empty array of length 7 with the executable flag set.
> That much lets me embed it into the parser tree in the appropriate
> place. Then when it comes to fill it in, I create the parser for it
> and then use 'copy'. Parenthesized expressions don't actually work
> yet, but at least we've learned how to define such things.
>
> And the next task was arranging to pass the results of the callbacks
> back out of the whole gizmo. I posted the other day in comp.lang.c
> after I got stumped doing it in C. So turning back to PostScript
> has shown a way forward.
>
> I've made lists for each "level" of the grammar and the callback
> actions operate on those. Then after the regex parser runs, a regex
> matcher is ready to use in the /expr-list which is used for the
> output from the top level.
>
> So, here it is. Pretty short and succinct IMO, despite 6 extra
> functions for managing the lists.
>
> $ cat pc7regex.ps
[snip]
>
> And finally, the output showing some stack dumps, resulting lengths
> of the lists, and 5 test runs on a few matching and non-matching
> strings.
>
>
[snip]
After more revisions and getting the C version to work, it all looks
and works much better now. The code is at
https://github.com/luser-dr00g/pcomb/
So, I'll skip the parser combinators and just show off the regex
builder. It constructs a parser called Expression with embedded
side-effects that build a new parser as a result. Meta-circularity.
$ cat pcomb/pc7re2.ps
(pc7.ps) run
<<
/listpush { exch 1 index load 2 array astore def }
/listpop { dup load aload pop exch 3 1 roll def }
/nodes []
/fetch { /nodes listpop }
/stash { /nodes listpush }
/do-dot {{ pop any stash } action}
/do-char {{ term stash } action}
/do-meta {{ fetch exch cvx exec stash } action}
/do-factors {{ pop fetch fetch exch seq2 stash } action}
/do-terms {{ pop fetch fetch exch alt2 stash } action}
/dpx { dup print cvx exec }
>> begin
/Dot (.) def
/Meta (*+?) char-class def
/Character (*+?.|()) char-class inverse def
/Expression 7 array cvx def
/Atom <<
//Dot do-dot
[ (\() //Expression (\)) ]
//Character do-char
>> def
/Factor [ //Atom //Meta do-meta ? ] def
/Term [ //Factor //Factor do-factors * ] def
[ //Term [ (|) //Term do-terms ] * ] parser
//Expression copy pop
/regex {
//Expression exec length 0 eq { pop /regex cvx /syntaxerror signalerror } if
fetch
} def
/r (a(b.)+) regex def
//r 0 get =
((a) r pc ) dpx
((ab) r pc ) dpx
((abx) r pc ) dpx
((abxb) r pc ) dpx
((abxx) r pc ) dpx
((abxbx) r pc ) dpx
((abxabx) r pc ) dpx
((axbxx) r pc ) dpx
quit
$ cd pcomb && make pc7_test
gsnd -dNOSAFER pc7re2.ps
GPL Ghostscript 9.22 (2017-10-04)
Copyright (C) 2017 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
z
(a) r pc stack:
[]
(ab) r pc stack:
[]
(abx) r pc stack:
[3]
(abxb) r pc stack:
[3]
(abxx) r pc stack:
[3]
(abxbx) r pc stack:
[5 3]
(abxabx) r pc stack:
[3]
(axbxx) r pc stack:
[]
Back to comp.lang.postscript | Previous | Next — Previous in thread | Find similar
PC Regex luser droog <luser.droog@gmail.com> - 2019-02-12 15:48 -0800 Re: PC Regex luser droog <luser.droog@gmail.com> - 2019-02-14 17:51 -0800
csiph-web