Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.postscript > #3337
| Newsgroups | comp.lang.postscript |
|---|---|
| Date | 2019-02-12 15:48 -0800 |
| Message-ID | <6bf6a0c5-89ca-4fb0-b235-eee5ddd7f3ed@googlegroups.com> (permalink) |
| Subject | PC Regex |
| From | luser droog <luser.droog@gmail.com> |
I've used the Parser Combinators to implement a Regex parser
and matcher. See the thread "Parser Combinators revisited" for
the file struct2.ps.
$ cat pc7.ps
(struct2.ps) run {
action{pa ac}{ % parser-desc {proc} -> parser
/pa /pa load parser def
{ % String
z /pa exec passed { % S [#]
dup {max} reduce zy zxy head % [#] S[0..^#]
/ac 3 2 roll % S[0..^#] {proc} [#]
1 aa cvx compose exec % PASS => [#]
}{ % S []
zy pop % FAIL => []
} ifelse
} ll
} @func
parser { /stringtype is { term }{
/arraytype is { sequence }{
/dicttype is { alternate }{ } ifelse } ifelse } ifelse }
* { parser many }
+ { parser some }
? { parser maybe }
char-class { << zy 1 {} fortuple >> }
range { z 0 get y 1 get 1 add 1 exch { 1 string dup 0 4 3 roll put } for }
inverse { << 0 1 127 { 1 string z 0 4 -1 roll put null } for >> exch
{} each-map {fix-up} map << exch {null} forall >> minus-keys }
pass { # 1 aa zy pop }
fail { pop pop [] }
failed { z # 0 eq }
passed { failed not }
next { y tail }
sum-up { zy {add} curry forall }
test { y # y # lt { fail }{
y y # head y eq { pass }{ fail } ifelse } ifelse }
term {str}{ {/str test} ll } @func
any { {# 1 ge {[1]}{[]} ifelse } }
%alt2 {p q}{ {z /p exec zy /q exec compose} ll } @func
alt2 {p q}{ {z /p exec failed { zy /q exec compose }{ zy pop } ifelse } ll } @func
seq2 {p q}{ {z /p exec zy {next /q exec sum-up} curry map} ll } @func
sequence { z xcheck { }{ {parser} map {seq2} reduce } ifelse }
alternate { {} each-map {fix-up} map {parser} map {alt2} reduce }
some { z many seq2 }
maybe { {()pass} alt2 }
many { {{{}exec}exec}{}deep-map z 0 get zxy seq2 maybe y y 0 zy put zy pop }
fix-up { /nulltype is { pop }{
/nametype is { z # string cvx }{ } ifelse } ifelse }
(>>) { [ {counttomark 2 mod 1 eq {null} if
counttomark dup 1 add copy array astore exch pop
mark exch 2 { aload pop pop } fortuple ] /.name-order exch} {} forall
(>>) load dup type /arraytype eq {/exec cvx} if ] cvx } @exec
minus-keys { y zxy { pop dup /.name-order ne {undef}{pop pop} ifelse z } forall pop
dup /.name-order 2 copy get [ exch {
counttomark 2 add index 1 index known not { pop } if
} forall ] put }
# {length}
aa {array astore}
ll { {load-if-literal-name} deep-map }
x {2 index}
y {1 index}
z {dup}
zy {exch}
zxy {3 1 roll}
yzx {3 2 roll}
head {0 zy getinterval}
tail {y # y sub getinterval}
max {y y lt{zy}if pop}
is {y type eq}
ps {(stack:)= pstack}
pc {ps clear}
pq {ps quit}
deep-map { y type /arraytype ne { exec }{
y xcheck 3 1 roll [ 3 1 roll /deep-map cvx 2 array astore cvx forall ] exch {cvx} if } ifelse }
each-map { 1 index xcheck 3 1 roll [ 3 1 roll foreach ] exch {cvx} if }
foreach {d p}{/d load type /dicttype eq {
/d load /.name-order known {
/d load /.name-order get { /d 1 index get /p exec } ll
}{/d load /p load} ifelse
}{/d load /p load} ifelse
end forall } @func-begin
accumulate { below-mark }
to-bottom { count 1 roll }
below-mark { counttomark 1 add 1 roll }
} pairs-begin
And then I started the regex code in a new file. The first hurdle was
finding a simple (and correct) grammar online to look at. First few
hits in the search were duds. Next problem was recursion in the grammar.
The top level production, /expression, is also a component of /atom
if surrounded by parens.
A little investigating discovered that all of the parsers produced
by the simple combinators were procedures of length 7. So to build
the recursion in the grammar, I *forward declare* the /expression
parser as an empty array of length 7 with the executable flag set.
That much lets me embed it into the parser tree in the appropriate
place. Then when it comes to fill it in, I create the parser for it
and then use 'copy'. Parenthesized expressions don't actually work
yet, but at least we've learned how to define such things.
And the next task was arranging to pass the results of the callbacks
back out of the whole gizmo. I posted the other day in comp.lang.c
after I got stumped doing it in C. So turning back to PostScript
has shown a way forward.
I've made lists for each "level" of the grammar and the callback
actions operate on those. Then after the regex parser runs, a regex
matcher is ready to use in the /expr-list which is used for the
output from the top level.
So, here it is. Pretty short and succinct IMO, despite 6 extra
functions for managing the lists.
$ cat pc7regex.ps
(pc7.ps)run
/atom-list [] def
/term-list [] def
/expr-list [] def
/append { [ 3 1 roll exch {} forall counttomark -1 roll ] } def
/last { dup length 1 sub get } def
/butlast { [ exch {} forall pop ] } def
/depend { dup last exch butlast } def
/stash { dup load 3 2 roll append def } def
/fetch { dup load depend exch 3 1 roll def } def
/build-meta { exch cvx exec } def
/meta (*+?) char-class def
/character meta 1 dict copy dup (|) null put inverse parser def
/meta //meta { dup = /atom-list fetch build-meta /atom-list stash } action def
/expression 7 array cvx def
/atom <<
(.) { = any /atom-list stash } action
[ (\() //expression (\)) ]
//character { dup = term /atom-list stash } action
%[ ([) characterclass (]) ]
>> parser def
/factor [ //atom //meta ? ] def
/term_ //factor + { = atom-list sequence /term-list stash /atom-list [] def } action def
[ //term_ [ (|) //term_ ] * ] { = term-list alternate /expr-list stash /term-list [] def } action
//expression copy pop
clear
mark (ab.*) expression pc
/dpx {dup print cvx exec} def
(atom-list length =) dpx
(term-list length =) dpx
(expr-list length =) dpx
/r expr-list 0 get def
//r 0 get =
(a) r pc
(ab) r pc
(abx) r pc
(abxx) r pc
(axbxx) r pc
quit
And finally, the output showing some stack dumps, resulting lengths
of the lists, and 5 test runs on a few matching and non-matching
strings.
$ gsnd pc7regex.ps
GPL Ghostscript 9.22 (2017-10-04)
Copyright (C) 2017 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
a
b
.
*
ab.*
ab.*
stack:
[4]
-mark-
atom-list length =0
term-list length =0
expr-list length =1
z
stack:
[]
stack:
[2]
stack:
[3]
stack:
[4]
stack:
[]
Back to comp.lang.postscript | Previous | Next — Next in thread | Find similar
PC Regex luser droog <luser.droog@gmail.com> - 2019-02-12 15:48 -0800 Re: PC Regex luser droog <luser.droog@gmail.com> - 2019-02-14 17:51 -0800
csiph-web