Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.postscript > #3345

Parsing yet again all over again

From M Joshua Ryan <luser.droog@gmail.com>
Newsgroups comp.lang.postscript
Subject Parsing yet again all over again
Date 2019-03-16 17:00 -0500
Organization A noiseless patient Spider
Message-ID <vriutvg2qy23.fsf@gmail.com> (permalink)

Show all headers | View raw


I was having difficulty actually building larger tools with my
combinators, so I dug into some more papers, several by Sierstraa and
learned about the importance of lazy evaluation as a tool.  And
particularly, the importance of lazy lists as a data structure became
apparent.

I tried digging through Sierstra paper on implementing it in Java-- and
going nowhere fast-- but I finally got a better handle on it from Graham
Hutton's paper "Higher Order Functions for Parsing" which uses the
Miranda language. Miranda seems to have most of the good stuff that
Haskell has but a little easier to grasp, for me anyway.

So, starting over yet again the other profound realization is that my
weird problem of propagating the results from the callbacks is
completely resolved by imitating the example code more closely.  In the
Miranda and Haskell parsers, the returned value is an array of
tuples. Each element of the array is a successful parse state which is a
tuple of a *witness* value and the remainder of the input, as a lazy
list. This witness value is the very "result of the callbacks" that I
needed. But they don't need to be orchestrated and managed through
callbacks and whatnot, but built in.

So I modelled the lazy lists with lots of length 2 arrays where element
0 is the *car* of the list and element 1 is *cdr*. My code calls the two
parts 'first' and 'second'. The cdr can be an executable procedure which
must be executed to yield or manifest more of the list.

Right away this makes it trivial to abstract over taking input from a
string vs from a file: they both yield a lazy list of characters. So
they're handled by 1 short function each. :-D

And I expect it will be relatively easy to abstract over lists of
characters or lists of tokens, so I can build 2 strata of lexer/parser
... and this may help when getting it to build a usable AST.

But for the current state of the code, there's no fancy input syntax
stuff. No dictionary syntax hijacking. Thanks to the 'satisfy' parser
constructor which takes a predicate function, stuff like 'range' and
'cclass' can be build *sub-atomically* instead of as molecules of
alternates.

Apologetically, there's probably a lot of bewildering jumps from low
abstraction levels to high and back down again. That's the nature of my
current coding style which teeters tenuously upon that very edge.

In the output from the 'word' parser (the two big blobs), you can
distinguish between the numbers that have been *consumed* by the parser
by the negative signs.  The result of the 'many' combinator (the Kleene
* star) is every possible match from the longest to shortest.

My 'll' function which does '{load-if-literal-name}deep-map' is a
lot like a closure in the sense that it is a lambda with all the
free variables filled in.


$ cat pcomb/pc8.ps
(struct2.ps) run {
    string-input   {s} { [ s {first}stopped{ pop pop 0 0 rep ] }
                                           { [ s rest /string-input cvx ] cvx ] } ifelse } @func
    file-input     {f} { [ f read { [ f /file-input cvx ] cvx ] }
                                  { 0 0 rep ]                   } ifelse } @func
    filename-input {fn} { (r) file file-input } @func

    succeed {v}   { { [ exch [ exch /v exch ] ] } ll } @func
    fail          { { pop [] } }
    satisfy {pred}{ { dup first /pred exec { xs-x succeed }
                                           {      fail    } ifelse exec } ll } @func
    literal {x}   { { /x eq } ll satisfy } @func
    range   {a b} { { dup /a ge exch /b le and } ll satisfy } @func
    cclass  {cc}  { { 1 string dup 0 4 3 roll put /cc exch search { pop pop pop true }
                                                                  { pop false } ifelse  } ll satisfy } @func
    using   {p f} { { /p exec [ exch { dup 0 2 copy get /f exec put } forall ] } ll } @func
    alt2    {p q} { { dup /p exec [ 3 1 roll spill counttomark -1 roll /q exec spill ] } ll } @func
    seq2    {p q} { { /p exec dup length 0 gt { [ exch {
                                                  spill /q exec exch {
                                                      y 0 2 copy get % [] x [] 0 []_0
                                                      4 -1 roll exch cons put % [ [ x []_0 ] .. ]
                                                  } curry forall
                                              } forall ] } if } ll } @func
    ? maybe  maybe { [] succeed  alt2 }
    * many   many  { % p* = p p* | eps
            {{{}exec}exec}{}deep-map dup 0 get 3 1 roll  seq2 maybe % {{}e} {p then {{{}e}e}}
            2 copy 0 exch put  exch pop }
    + some   some  { dup many seq2 }

    y      { 1 index }
    aa     { array astore }
    ps     { (stack:)= pstack }
    pc     { ps clear }
    second { 1 get }
    spill  { {} forall }
    cons   { 2 aa }
    
    /@ { { dup xcheck { exec }{ exit } ifelse } loop }
    next { dup second xcheck { dup 1 2 copy get @ put second }{ second } ifelse }
    x-xs { dup first exch next }
    xs-x { dup next exch first }

    ll       { {load-if-literal-name} deep-map }
    deep-map { y type /arraytype ne { exec }{
        y xcheck 3 1 roll  [ 3 1 roll /deep-map cvx 2 array astore cvx forall ] exch {cvx} if } ifelse }
    rep  {x} { { [ /x { /x rep } ] } ll } @func
    add1 {x} { x @  dup length 0 eq { [] }
                { [ exch dup first 1 add  exch next /add1 cvx 2 aa cvx ] } ifelse } @func
    take {x n} { n 0 eq { [] }{ [ x @ x-xs n 1 sub take ] } ifelse } @func
} pairs-begin <<
>> begin

/is_a  (a)  spill literal def
/is_b  (b)  spill literal def
/alpha (az) spill range def
/digit (09) spill range def
/alnum //alpha //digit alt2  def
/inv   //alnum { neg } using def
/word  //inv //inv seq2 //inv seq2 //inv seq2 //digit maybe seq2 def
/word  //inv many def

{
1 rep pc
1 rep @ pc
1 rep 5 take pc
1 rep add1 5 take pc
(abcdefghjiklm) string-input dup 5 take pc
}exec%pop

(ab34efghjiklm) string-input
                is_a
                first second inv
                first second inv
                first second inv
                first second inv
                pc
(ab345efghjiklm) string-input word pc

{ currentfile file-input word pc } exec
abcdefghijklm
nopq

quit



$ cd pcomb; gsnd -DNOSAFER pc8.ps
GPL Ghostscript 9.22 (2017-10-04)
Copyright (C) 2017 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
stack:
{[ 1 {1 rep} ]}
stack:
[1 {1 rep}]
stack:
[1 [1 [1 [1 [1 []]]]]]
stack:
[2 [2 [2 [2 [2 []]]]]]
stack:
[97 [98 [99 [100 [101 []]]]]]
[97 [98 [99 [100 [101 [102 {(ghjiklm) string-input}]]]]]]
stack:
[[-101 [102 {(ghjiklm) string-input}]]]
stack:
[[[-97 [-98 [-51 [-52 [-53 [-101 [-102 [-103 [-104 [-106 [-105 [-107 [-108 [-109 []]]]]]]]]]]]]]] [0 {[ 0 {0 rep} ]}]] [[-97 [-98 [-51 [-52 [-53 [-101 [-102 [-103 [-104 [-106 [-105 [-107 [-108 []]]]]]]]]]]]]] [109 [0 {[ 0 {0 rep} ]}]]] [[-97 [-98 [-51 [-52 [-53 [-101 [-102 [-103 [-104 [-106 [-105 [-107 []]]]]]]]]]]]] [108 [109 [0 {[ 0 {0 rep} ]}]]]] [[-97 [-98 [-51 [-52 [-53 [-101 [-102 [-103 [-104 [-106 [-105 []]]]]]]]]]]] [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]] [[-97 [-98 [-51 [-52 [-53 [-101 [-102 [-103 [-104 [-106 []]]]]]]]]]] [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]] [[-97 [-98 [-51 [-52 [-53 [-101 [-102 [-103 [-104 []]]]]]]]]] [106 [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]]] [[-97 [-98 [-51 [-52 [-53 [-101 [-102 [-103 []]]]]]]]] [104 [106 [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]]]] [[-97 [-98 [-51 [-52 [-53 [-101 [-102 []]]]]]]] [103 [104 [106 [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]]]]] [[-97 [-98 [-51 [-52 [-53 [-101 []]]]]]] [102 [103 [104 [106 [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]]]]]] [[-97 [-98 [-51 [-52 [-53 []]]]]] [101 [102 [103 [104 [106 [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]]]]]]] [[-97 [-98 [-51 [-52 []]]]] [53 [101 [102 [103 [104 [106 [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]]]]]]]] [[-97 [-98 [-51 []]]] [52 [53 [101 [102 [103 [104 [106 [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]]]]]]]]] [[-97 [-98 []]] [51 [52 [53 [101 [102 [103 [104 [106 [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]]]]]]]]]] [[-97 []] [98 [51 [52 [53 [101 [102 [103 [104 [106 [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]]]]]]]]]]] [[] [97 [98 [51 [52 [53 [101 [102 [103 [104 [106 [105 [107 [108 [109 [0 {[ 0 {0 rep} ]}]]]]]]]]]]]]]]]]]
stack:
[[[-97 [-98 [-99 [-100 [-101 [-102 [-103 [-104 [-105 [-106 [-107 [-108 [-109 []]]]]]]]]]]]]] [10 {-file- file-input}]] [[-97 [-98 [-99 [-100 [-101 [-102 [-103 [-104 [-105 [-106 [-107 [-108 []]]]]]]]]]]]] [109 [10 {-file- file-input}]]] [[-97 [-98 [-99 [-100 [-101 [-102 [-103 [-104 [-105 [-106 [-107 []]]]]]]]]]]] [108 [109 [10 {-file- file-input}]]]] [[-97 [-98 [-99 [-100 [-101 [-102 [-103 [-104 [-105 [-106 []]]]]]]]]]] [107 [108 [109 [10 {-file- file-input}]]]]] [[-97 [-98 [-99 [-100 [-101 [-102 [-103 [-104 [-105 []]]]]]]]]] [106 [107 [108 [109 [10 {-file- file-input}]]]]]] [[-97 [-98 [-99 [-100 [-101 [-102 [-103 [-104 []]]]]]]]] [105 [106 [107 [108 [109 [10 {-file- file-input}]]]]]]] [[-97 [-98 [-99 [-100 [-101 [-102 [-103 []]]]]]]] [104 [105 [106 [107 [108 [109 [10 {-file- file-input}]]]]]]]] [[-97 [-98 [-99 [-100 [-101 [-102 []]]]]]] [103 [104 [105 [106 [107 [108 [109 [10 {-file- file-input}]]]]]]]]] [[-97 [-98 [-99 [-100 [-101 []]]]]] [102 [103 [104 [105 [106 [107 [108 [109 [10 {-file- file-input}]]]]]]]]]] [[-97 [-98 [-99 [-100 []]]]] [101 [102 [103 [104 [105 [106 [107 [108 [109 [10 {-file- file-input}]]]]]]]]]]] [[-97 [-98 [-99 []]]] [100 [101 [102 [103 [104 [105 [106 [107 [108 [109 [10 {-file- file-input}]]]]]]]]]]]] [[-97 [-98 []]] [99 [100 [101 [102 [103 [104 [105 [106 [107 [108 [109 [10 {-file- file-input}]]]]]]]]]]]]] [[-97 []] [98 [99 [100 [101 [102 [103 [104 [105 [106 [107 [108 [109 [10 {-file- file-input}]]]]]]]]]]]]]] [[] [97 [98 [99 [100 [101 [102 [103 [104 [105 [106 [107 [108 [109 [10 {-file- file-input}]]]]]]]]]]]]]]]]
Error: /undefined in nopq
Operand stack:

Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--   --nostringval--   --nostringval--   false   1   %stopped_push   2015   1   3   %oparray_pop   2014   1   3   %oparray_pop   1998   1   3   %oparray_pop   1884   1   3   %oparray_pop   --nostringval--   %errorexec_pop   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--
Dictionary stack:
   --dict:985/1684(ro)(G)--   --dict:0/20(G)--   --dict:78/200(L)--   --dict:34/58(L)--   --dict:34/34(L)--   --dict:7/10(L)--
Current allocation mode is local
Current file position is 2988
GPL Ghostscript 9.22: Unrecoverable error, exit code 1

Back to comp.lang.postscript | Previous | NextNext in thread | Find similar


Thread

Parsing yet again all over again M Joshua Ryan <luser.droog@gmail.com> - 2019-03-16 17:00 -0500
  Re: Parsing yet again all over again luser droog <luser.droog@gmail.com> - 2019-03-19 16:29 -0700
  Re: Parsing yet again all over again luser droog <luser.droog@gmail.com> - 2019-03-20 13:20 -0700

csiph-web