Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.postscript > #3335

Re: Parser Combinators revisited

Newsgroups comp.lang.postscript
Date 2019-02-05 11:56 -0800
References <6f879c02-caec-49a5-bc6c-a62931a36378@googlegroups.com>
Message-ID <ecdf8380-c4cc-4efc-9b7f-a3504b210931@googlegroups.com> (permalink)
Subject Re: Parser Combinators revisited
From luser droog <luser.droog@gmail.com>

Show all headers | View raw


On Saturday, February 2, 2019 at 9:51:57 PM UTC-6, luser droog wrote:
> I rediscovered some abandoned code for parser combinators
> and I realized that I had rewritten the underlying library
> more recently. So the obvious thing was to rewrite the
> parser combinators to use the new library and then maybe
> everything would work out more nicely.
> 
> And with a few stumbles, it has.
> 
> So here are two files and a very short output. First is 
> the latest version of struct2.ps for functional programming
> features. The most detailed write up on this code is in
> my codereview post, but some tweaks and fixes have been
> applied with the benefit of a little actual use.
> 
> $ cat struct2.ps
> %!
> % struct2.ps   An enhanced PostScript syntax for defining functions with named,
> % type-checked arguments. Using @func within a block or other construct that uses
> % 'pairs' accomplishes a sort of compile-time macro expansion of the shorthand function description.
> <<
>     /pairs-begin { pairs begin        }
>     /pairs-def   { pairs {def} forall } pop { pairs currentdict copy pop }
>     /pairs       { << exch explode >> }
>     /explode     { { @exec } forall   }
>     /@exec       { dup type /nametype eq { exec-if-@ } if }
>     /exec-if-@   { dup dup length string cvs dup first (@) first eq { exec@ }{ pop } ifelse }
>     /first       { 0 get }            /exec@ {  exch pop rest cvn cvx exec  }
>     /rest        { 1 1 index length 1 sub getinterval }
> >> begin {
>     block { pairs-begin main end }
>     func  { 1 index type /stringtype eq { typed-func }{ simple-func } ifelse }
>     simple-func { func-begin { end } compose }
>     typed-func  { exch args-and-types reverse { make-type-name } map check-stack 3 1 roll
>                   exch simple-func   compose }
>     func-begin  { exch reverse /args-begin load curry exch compose }
>     args-begin  { dup length dict begin { exch def } forall }
>     args-and-types { /was_x false def [ exch { each-specifier } fortokens fix-last ] dup args exch types  }
>     each-specifier { dup xcheck /is_x exch def  is_x was_x and { null exch } if  /was_x is_x def }
>     fix-last       { counttomark 2 mod 1 eq { null } if }
>     check-stack    { {pop} 4 index cvlit { cvx /stackunderflow signalerror } curry compose
>                      /if cvx 2 array astore cvx {check-count} exch compose curry
>                      3 index cvlit { cvx /typecheck signalerror } curry
>                      /if cvx 2 array astore cvx {check-types} exch compose compose }
>     check-count { dup length count 2 sub gt }
>     check-types { dup length 1 add copy true exch { check-type and } forall exch pop not }
>     check-type  { dup null eq { 3 -1 roll pop pop true }{ 3 -1 roll type eq } ifelse }
>     make-type-name { dup type /nametype eq { dup length 4 add string dup dup 4 2 roll cvs
>                      2 copy 0 exch putinterval length (type) putinterval cvn } if }
>     args    { [ exch 2 { 0 get } fortuple ] cvx }
>     types   { [ exch 2 { 1 get } fortuple ] }
>     map     { 1 index xcheck 3 1 roll  [ 3 1 roll forall ]  exch {cvx} if }
>     reduce  { exch dup first exch rest 3 -1 roll forall }
>     rreduce { exch aload length 1 sub dup 3 add -1 roll repeat }
>     curry   { [ 3 1 roll {} forall ] cvx } @pop
>     {   dup xcheck 3 1 roll
>         dup length 1 add array dup 0 5 -1 roll put dup 1 4 -1 roll putinterval
>         exch {cvx}if }
>     compose { 2 array astore cvx { {} forall } map } @pop
>     {   1 index xcheck 3 1 roll
>         1 index length 1 index length add array  dup 0 4 index putinterval
>         dup 4 -1 roll length 4 -1 roll putinterval
>         exch {cvx} if }
>     reverse {
>         dup xcheck exch
>         [ exch dup length 1 sub -1 0 { 2 copy get 3 1 roll pop } for pop ]
>         exch {cvx} if }
> } pairs-def {
>     fortokens {src proc}{ { src token {exch /src exch store}{exit}ifelse  proc } loop } @func
>     fortuple {a n p}{ 0 n /a load length 1 sub
>         { /a exch /n getinterval /p exec } {load-if-literal-name} map end for
>     } @func-begin
>     load-if-literal-name { dup type /nametype eq 1 index xcheck not and { load } if }
> } pairs-def
> 
> 
> So using pairs-begin or block lets the code look really nice IMO without
> the slants on all the names being defined.
> 
> Next is the PC code. A few short mnemonics at the start. term, seq2, and 
> alt are the basic primitives, but almost too primitive to use. The actual
> API is: parser + * ? action. 
> 
> The full deal is defining an action which takes a parser and a procedure
> body and returns a parser with the side effect of executing the procedure.
> The proc gets passed the string contents matched by the parser.
> A parser description can also be passed to action without the user needing
> to call parser on it first.
> 
> 

Some improvements. Alternations now have a short-circuiting semantic so
it stops trying branches after one succeeds. Reorganized to be more top-down.
Expanded testing section.

In order to test that the user proc associated with an action wasn't 
causing any stack interference, this version uses a function called
'below-mark' to get the result (a little bit) out of the way.

$ cat pc7.ps
(struct2.ps) run {

    action{pa ac}{ % parser-desc {proc}  ->  parser
        /pa /pa load parser def
        {                                       % String
            z /pa exec passed {                 % S [#]
              dup {max} reduce zy zxy head      % [#] S[0..^#]
              /ac 3 2 roll                      % S[0..^#] {proc} [#]
              1 aa cvx compose exec             % PASS => [#]
            }{                                  % S []
              zy pop                            % FAIL => []
            } ifelse
        } ll
    } @func

    parser { /stringtype is { term      }{
             /arraytype  is { sequence  }{
             /dicttype   is { alternate }{ } ifelse } ifelse } ifelse }

    *  { parser many }
    +  { parser some }
    ?  { parser maybe }
    char-class { << zy 1 {} fortuple >> }
    inverse { << 0 1 127 { 1 string z 0 4 -1 roll put null } for >> exch
              {} each-map {fix-up} map << exch {null} forall >> minus-keys }

    pass   { # 1 aa zy pop }
    fail   { pop pop [] }
    failed { z # 0 eq }
    passed { failed not }

    next   { y tail }
    sum-up { zy {add} curry forall }

    test { y #        y # lt { fail }{
           y y # head y   eq { pass }{ fail }  ifelse }  ifelse }

    term {str}{ {/str test} ll } @func
    %alt2 {p q}{ {z /p exec zy /q exec compose} ll } @func
    alt2 {p q}{ {z /p exec failed { zy /q exec compose }{ zy pop } ifelse } ll } @func
    seq2 {p q}{ {z /p exec zy {next /q exec sum-up} curry map} ll } @func
    sequence { z xcheck { }{ {parser} map {seq2} reduce } ifelse }
    alternate { {} each-map {fix-up} map {parser} map {alt2} reduce }

    some  { z many seq2 }
    maybe { {()pass} alt2 }
    many  { {{{}exec}exec}{}deep-map   z 0 get zxy seq2  maybe   y y 0 zy put zy pop }

    fix-up { /nulltype is { pop            }{
             /nametype is { z # string cvx }{ } ifelse } ifelse }
    (>>) { [ {counttomark 2 mod 1 eq {null} if
              counttomark dup 1 add copy array astore exch pop
              mark exch 2 { aload pop pop } fortuple ] /.name-order exch} {} forall
             (>>) load dup type /arraytype eq {/exec cvx} if ] cvx } @exec
    minus-keys { y zxy {  pop dup /.name-order ne {undef}{pop pop} ifelse z  } forall pop
                 dup /.name-order 2 copy get [ exch {
                     counttomark 2 add index  1 index known not { pop } if
                 } forall ] put }

    #    {length}
    aa   {array astore}
    ll   { {load-if-literal-name} deep-map }
    x    {2 index}
     y   {1 index}
      z  {dup}
     zy  {exch}
    zxy  {3 1 roll}
    yzx  {3 2 roll}
    head {0 zy getinterval}
    tail {y # y sub getinterval}
    max  {y y lt{zy}if pop}
    is   {y type eq}
    ps   {(stack:)= pstack}
    pc   {ps clear}
    pq   {ps quit}

    deep-map { y type /arraytype ne { exec }{
        y xcheck 3 1 roll  [ 3 1 roll /deep-map cvx 2 array astore cvx forall ] exch {cvx} if } ifelse }
    each-map { 1 index xcheck 3 1 roll  [ 3 1 roll foreach ]  exch {cvx} if }
    foreach {d p}{/d load type /dicttype eq {
                      /d load /.name-order known {
                          /d load /.name-order get  { /d 1 index get /p exec } ll
                      }{/d load /p load} ifelse
                  }{/d load /p load} ifelse
                  end forall } @func-begin

    accumulate { below-mark }
    to-bottom { count 1 roll }
    below-mark { counttomark 1 add 1 roll }
} pairs-begin

/delimiter    ( \n()/%[]<>{}) char-class def
/regular-char delimiter inverse def
/digit        (0123456789) char-class def
/number       //digit + def
/opt-number   //digit * def

/rad-integer  [ //digit //digit ? (#) //number ] parser def
/integer      [ (-) ? //number ] parser def
/real         [ (-)? << [ //number (.) //opt-number ] [ (.) //number ] >> ] parser def
/name         //regular-char + def

/on-integer   << //rad-integer //integer >> { [ exch /INT exch dup cvi ] accumulate } action def
/on-real      //real { [ exch /REAL exch dup cvr ] accumulate } action def
/on-name      //name { [ exch /NAME exch dup cvn ] accumulate } action def
/on-delimiter //delimiter { [ exch /DEL exch ] accumulate } action def

/ps-token     << //on-real //on-integer //on-name //on-delimiter >> parser def

mark (16#25) on-integer pc
mark (123) on-integer pc
mark (-123) on-integer pc

mark (123) on-real pc  %% sb. FAIL
mark (-123) on-real pc  %% sb. FAIL
mark (2.685) on-real pc
mark (-.5) on-real pc

mark (name) on-name pc
mark (word) on-name pc

mark (123) ps-token pc
mark (67.5) ps-token pc
mark (aname) ps-token pc
mark (aname) on-name pc
mark ({) ps-token pc
mark (<) ps-token pc
mark ({) on-delimiter pc


So each test lays down a 'mark' object and then if the user proc fires,
its result is tucked behind the mark and the parser's result is returned
normally on top of the stack. The result of a parser is an array, length=0
means failure, any other contents will be matched lengths. If the short-
circuit behavior is not used, then alternations may yield multiple matches.


Output:

$ gsnd pc7.ps
GPL Ghostscript 9.22 (2017-10-04)
Copyright (C) 2017 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
stack:
[5]
-mark-
[/INT (16#25) 37]
stack:
[3]
-mark-
[/INT (123) 123]
stack:
[4]
-mark-
[/INT (-123) -123]
stack:
[]
-mark-
stack:
[]
-mark-
stack:
[5]
-mark-
[/REAL (2.685) 2.685]
stack:
[3]
-mark-
[/REAL (-.5) -0.5]
stack:
[4]
-mark-
[/NAME (name) /name]
stack:
[4]
-mark-
[/NAME (word) /word]
stack:
[3]
-mark-
[/INT (123) 123]
stack:
[4]
-mark-
[/REAL (67.5) 67.5]
stack:
[5]
-mark-
[/NAME (aname) /aname]
stack:
[5]
-mark-
[/NAME (aname) /aname]
stack:
[1]
-mark-
[/DEL ({)]
stack:
[1]
-mark-
[/DEL (<)]
stack:
[1]
-mark-
[/DEL ({)]
GS>

Back to comp.lang.postscript | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Parser Combinators revisited luser droog <luser.droog@gmail.com> - 2019-02-02 19:51 -0800
  Re: Parser Combinators revisited luser droog <luser.droog@gmail.com> - 2019-02-05 11:56 -0800
    Re: Parser Combinators revisited luser droog <luser.droog@gmail.com> - 2019-02-05 12:39 -0800

csiph-web