Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.postscript > #3345
| From | M Joshua Ryan <luser.droog@gmail.com> |
|---|---|
| Newsgroups | comp.lang.postscript |
| Subject | Parsing yet again all over again |
| Date | 2019-03-16 17:00 -0500 |
| Organization | A noiseless patient Spider |
| Message-ID | <vriutvg2qy23.fsf@gmail.com> (permalink) |
I was having difficulty actually building larger tools with my
combinators, so I dug into some more papers, several by Sierstraa and
learned about the importance of lazy evaluation as a tool. And
particularly, the importance of lazy lists as a data structure became
apparent.
I tried digging through Sierstra paper on implementing it in Java-- and
going nowhere fast-- but I finally got a better handle on it from Graham
Hutton's paper "Higher Order Functions for Parsing" which uses the
Miranda language. Miranda seems to have most of the good stuff that
Haskell has but a little easier to grasp, for me anyway.
So, starting over yet again the other profound realization is that my
weird problem of propagating the results from the callbacks is
completely resolved by imitating the example code more closely. In the
Miranda and Haskell parsers, the returned value is an array of
tuples. Each element of the array is a successful parse state which is a
tuple of a *witness* value and the remainder of the input, as a lazy
list. This witness value is the very "result of the callbacks" that I
needed. But they don't need to be orchestrated and managed through
callbacks and whatnot, but built in.
So I modelled the lazy lists with lots of length 2 arrays where element
0 is the *car* of the list and element 1 is *cdr*. My code calls the two
parts 'first' and 'second'. The cdr can be an executable procedure which
must be executed to yield or manifest more of the list.
Right away this makes it trivial to abstract over taking input from a
string vs from a file: they both yield a lazy list of characters. So
they're handled by 1 short function each. :-D
And I expect it will be relatively easy to abstract over lists of
characters or lists of tokens, so I can build 2 strata of lexer/parser
... and this may help when getting it to build a usable AST.
But for the current state of the code, there's no fancy input syntax
stuff. No dictionary syntax hijacking. Thanks to the 'satisfy' parser
constructor which takes a predicate function, stuff like 'range' and
'cclass' can be build *sub-atomically* instead of as molecules of
alternates.
Apologetically, there's probably a lot of bewildering jumps from low
abstraction levels to high and back down again. That's the nature of my
current coding style which teeters tenuously upon that very edge.
In the output from the 'word' parser (the two big blobs), you can
distinguish between the numbers that have been *consumed* by the parser
by the negative signs. The result of the 'many' combinator (the Kleene
* star) is every possible match from the longest to shortest.
My 'll' function which does '{load-if-literal-name}deep-map' is a
lot like a closure in the sense that it is a lambda with all the
free variables filled in.
$ cat pcomb/pc8.ps
(struct2.ps) run {
string-input {s} { [ s {first}stopped{ pop pop 0 0 rep ] }
{ [ s rest /string-input cvx ] cvx ] } ifelse } @func
file-input {f} { [ f read { [ f /file-input cvx ] cvx ] }
{ 0 0 rep ] } ifelse } @func
filename-input {fn} { (r) file file-input } @func
succeed {v} { { [ exch [ exch /v exch ] ] } ll } @func
fail { { pop [] } }
satisfy {pred}{ { dup first /pred exec { xs-x succeed }
{ fail } ifelse exec } ll } @func
literal {x} { { /x eq } ll satisfy } @func
range {a b} { { dup /a ge exch /b le and } ll satisfy } @func
cclass {cc} { { 1 string dup 0 4 3 roll put /cc exch search { pop pop pop true }
{ pop false } ifelse } ll satisfy } @func
using {p f} { { /p exec [ exch { dup 0 2 copy get /f exec put } forall ] } ll } @func
alt2 {p q} { { dup /p exec [ 3 1 roll spill counttomark -1 roll /q exec spill ] } ll } @func
seq2 {p q} { { /p exec dup length 0 gt { [ exch {
spill /q exec exch {
y 0 2 copy get % [] x [] 0 []_0
4 -1 roll exch cons put % [ [ x []_0 ] .. ]
} curry forall
} forall ] } if } ll } @func
? maybe maybe { [] succeed alt2 }
* many many { % p* = p p* | eps
{{{}exec}exec}{}deep-map dup 0 get 3 1 roll seq2 maybe % {{}e} {p then {{{}e}e}}
2 copy 0 exch put exch pop }
+ some some { dup many seq2 }
y { 1 index }
aa { array astore }
ps { (stack:)= pstack }
pc { ps clear }
second { 1 get }
spill { {} forall }
cons { 2 aa }
/@ { { dup xcheck { exec }{ exit } ifelse } loop }
next { dup second xcheck { dup 1 2 copy get @ put second }{ second } ifelse }
x-xs { dup first exch next }
xs-x { dup next exch first }
ll { {load-if-literal-name} deep-map }
deep-map { y type /arraytype ne { exec }{
y xcheck 3 1 roll [ 3 1 roll /deep-map cvx 2 array astore cvx forall ] exch {cvx} if } ifelse }
rep {x} { { [ /x { /x rep } ] } ll } @func
add1 {x} { x @ dup length 0 eq { [] }
{ [ exch dup first 1 add exch next /add1 cvx 2 aa cvx ] } ifelse } @func
take {x n} { n 0 eq { [] }{ [ x @ x-xs n 1 sub take ] } ifelse } @func
} pairs-begin <<
>> begin
/is_a (a) spill literal def
/is_b (b) spill literal def
/alpha (az) spill range def
/digit (09) spill range def
/alnum //alpha //digit alt2 def
/inv //alnum { neg } using def
/word //inv //inv seq2 //inv seq2 //inv seq2 //digit maybe seq2 def
/word //inv many def
{
1 rep pc
1 rep @ pc
1 rep 5 take pc
1 rep add1 5 take pc
(abcdefghjiklm) string-input dup 5 take pc
}exec%pop
(ab34efghjiklm) string-input
is_a
first second inv
first second inv
first second inv
first second inv
pc
(ab345efghjiklm) string-input word pc
{ currentfile file-input word pc } exec
abcdefghijklm
nopq
quit
$ cd pcomb; gsnd -DNOSAFER pc8.ps
GPL Ghostscript 9.22 (2017-10-04)
Copyright (C) 2017 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
stack:
{[ 1 {1 rep} ]}
stack:
[1 {1 rep}]
stack:
[1 [1 [1 [1 [1 []]]]]]
stack:
[2 [2 [2 [2 [2 []]]]]]
stack:
[97 [98 [99 [100 [101 []]]]]]
[97 [98 [99 [100 [101 [102 {(ghjiklm) string-input}]]]]]]
stack:
[[-101 [102 {(ghjiklm) string-input}]]]
stack:
[[[-97 [-98 [-51 [-52 [-53 [-101 [-102 [-103 [-104 [-106 [-105 [-107 [-108 [-109 []]]]]]]]]]]]]]] [0 {[ 0 {0 rep} ]}]] [[-97 [-98 [-51 [-52 [-53 [-101 [-102 [-103 [-104 [-106 [-105 [-107 [-108 []]]]]]]]]]]]]] [109 [0 {[ 0 {0 rep} ]}]]] [[-97 [-98 [-51 [-52 [-53 [-101 [-102 [-103 [-104 [-106 [-105 [-107 []]]]]]]]]]]]] [108 [109 [0 {[ 0 {0 rep} ]}]]]] [[-97 [-98 [-51 [-52 [-53 [-101 [-102 [-103 [-104 [-106 [-105 []]]]]]]]]]]] [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]] [[-97 [-98 [-51 [-52 [-53 [-101 [-102 [-103 [-104 [-106 []]]]]]]]]]] [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]] [[-97 [-98 [-51 [-52 [-53 [-101 [-102 [-103 [-104 []]]]]]]]]] [106 [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]]] [[-97 [-98 [-51 [-52 [-53 [-101 [-102 [-103 []]]]]]]]] [104 [106 [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]]]] [[-97 [-98 [-51 [-52 [-53 [-101 [-102 []]]]]]]] [103 [104 [106 [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]]]]] [[-97 [-98 [-51 [-52 [-53 [-101 []]]]]]] [102 [103 [104 [106 [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]]]]]] [[-97 [-98 [-51 [-52 [-53 []]]]]] [101 [102 [103 [104 [106 [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]]]]]]] [[-97 [-98 [-51 [-52 []]]]] [53 [101 [102 [103 [104 [106 [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]]]]]]]] [[-97 [-98 [-51 []]]] [52 [53 [101 [102 [103 [104 [106 [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]]]]]]]]] [[-97 [-98 []]] [51 [52 [53 [101 [102 [103 [104 [106 [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]]]]]]]]]] [[-97 []] [98 [51 [52 [53 [101 [102 [103 [104 [106 [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]]]]]]]]]]] [[] [97 [98 [51 [52 [53 [101 [102 [103 [104 [106 [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]]]]]]]]]]]]]
stack:
[[[-97 [-98 [-99 [-100 [-101 [-102 [-103 [-104 [-105 [-106 [-107 [-108 [-109 []]]]]]]]]]]]]] [10 {-file- file-input}]] [[-97 [-98 [-99 [-100 [-101 [-102 [-103 [-104 [-105 [-106 [-107 [-108 []]]]]]]]]]]]] [109 [10 {-file- file-input}]]] [[-97 [-98 [-99 [-100 [-101 [-102 [-103 [-104 [-105 [-106 [-107 []]]]]]]]]]]] [108 [109 [10 {-file- file-input}]]]] [[-97 [-98 [-99 [-100 [-101 [-102 [-103 [-104 [-105 [-106 []]]]]]]]]]] [107 [108 [109 [10 {-file- file-input}]]]]] [[-97 [-98 [-99 [-100 [-101 [-102 [-103 [-104 [-105 []]]]]]]]]] [106 [107 [108 [109 [10 {-file- file-input}]]]]]] [[-97 [-98 [-99 [-100 [-101 [-102 [-103 [-104 []]]]]]]]] [105 [106 [107 [108 [109 [10 {-file- file-input}]]]]]]] [[-97 [-98 [-99 [-100 [-101 [-102 [-103 []]]]]]]] [104 [105 [106 [107 [108 [109 [10 {-file- file-input}]]]]]]]] [[-97 [-98 [-99 [-100 [-101 [-102 []]]]]]] [103 [104 [105 [106 [107 [108 [109 [10 {-file- file-input}]]]]]]]]] [[-97 [-98 [-99 [-100 [-101 []]]]]] [102 [103 [104 [105 [106 [107 [108 [109 [10 {-file- file-input}]]]]]]]]]] [[-97 [-98 [-99 [-100 []]]]] [101 [102 [103 [104 [105 [106 [107 [108 [109 [10 {-file- file-input}]]]]]]]]]]] [[-97 [-98 [-99 []]]] [100 [101 [102 [103 [104 [105 [106 [107 [108 [109 [10 {-file- file-input}]]]]]]]]]]]] [[-97 [-98 []]] [99 [100 [101 [102 [103 [104 [105 [106 [107 [108 [109 [10 {-file- file-input}]]]]]]]]]]]]] [[-97 []] [98 [99 [100 [101 [102 [103 [104 [105 [106 [107 [108 [109 [10 {-file- file-input}]]]]]]]]]]]]]] [[] [97 [98 [99 [100 [101 [102 [103 [104 [105 [106 [107 [108 [109 [10 {-file- file-input}]]]]]]]]]]]]]]]]
Error: /undefined in nopq
Operand stack:
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 2015 1 3 %oparray_pop 2014 1 3 %oparray_pop 1998 1 3 %oparray_pop 1884 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval--
Dictionary stack:
--dict:985/1684(ro)(G)-- --dict:0/20(G)-- --dict:78/200(L)-- --dict:34/58(L)-- --dict:34/34(L)-- --dict:7/10(L)--
Current allocation mode is local
Current file position is 2988
GPL Ghostscript 9.22: Unrecoverable error, exit code 1
Back to comp.lang.postscript | Previous | Next — Next in thread | Find similar
Parsing yet again all over again M Joshua Ryan <luser.droog@gmail.com> - 2019-03-16 17:00 -0500 Re: Parsing yet again all over again luser droog <luser.droog@gmail.com> - 2019-03-19 16:29 -0700 Re: Parsing yet again all over again luser droog <luser.droog@gmail.com> - 2019-03-20 13:20 -0700
csiph-web