Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.postscript > #3336

Re: Parser Combinators revisited

Newsgroups comp.lang.postscript
Date 2019-02-05 12:39 -0800
References <6f879c02-caec-49a5-bc6c-a62931a36378@googlegroups.com> <ecdf8380-c4cc-4efc-9b7f-a3504b210931@googlegroups.com>
Message-ID <fe05b3ee-adde-4384-a841-51203907706f@googlegroups.com> (permalink)
Subject Re: Parser Combinators revisited
From luser droog <luser.droog@gmail.com>

Show all headers | View raw


On Tuesday, February 5, 2019 at 1:56:47 PM UTC-6, luser droog wrote:
> On Saturday, February 2, 2019 at 9:51:57 PM UTC-6, luser droog wrote:
> > I rediscovered some abandoned code for parser combinators
> > and I realized that I had rewritten the underlying library
> > more recently. So the obvious thing was to rewrite the
> > parser combinators to use the new library and then maybe
> > everything would work out more nicely.
> > 
> > And with a few stumbles, it has.
> > 
> > So here are two files and a very short output. First is 
> > the latest version of struct2.ps for functional programming
> > features. The most detailed write up on this code is in
> > my codereview post, but some tweaks and fixes have been
> > applied with the benefit of a little actual use.
> > 
> > $ cat struct2.ps
> > %!
> > % struct2.ps   An enhanced PostScript syntax for defining functions with named,
> > % type-checked arguments. Using @func within a block or other construct that uses
> > % 'pairs' accomplishes a sort of compile-time macro expansion of the shorthand function description.
> > <<
> >     /pairs-begin { pairs begin        }
> >     /pairs-def   { pairs {def} forall } pop { pairs currentdict copy pop }
> >     /pairs       { << exch explode >> }
> >     /explode     { { @exec } forall   }
> >     /@exec       { dup type /nametype eq { exec-if-@ } if }
> >     /exec-if-@   { dup dup length string cvs dup first (@) first eq { exec@ }{ pop } ifelse }
> >     /first       { 0 get }            /exec@ {  exch pop rest cvn cvx exec  }
> >     /rest        { 1 1 index length 1 sub getinterval }
> > >> begin {
> >     block { pairs-begin main end }
> >     func  { 1 index type /stringtype eq { typed-func }{ simple-func } ifelse }
> >     simple-func { func-begin { end } compose }
> >     typed-func  { exch args-and-types reverse { make-type-name } map check-stack 3 1 roll
> >                   exch simple-func   compose }
> >     func-begin  { exch reverse /args-begin load curry exch compose }
> >     args-begin  { dup length dict begin { exch def } forall }
> >     args-and-types { /was_x false def [ exch { each-specifier } fortokens fix-last ] dup args exch types  }
> >     each-specifier { dup xcheck /is_x exch def  is_x was_x and { null exch } if  /was_x is_x def }
> >     fix-last       { counttomark 2 mod 1 eq { null } if }
> >     check-stack    { {pop} 4 index cvlit { cvx /stackunderflow signalerror } curry compose
> >                      /if cvx 2 array astore cvx {check-count} exch compose curry
> >                      3 index cvlit { cvx /typecheck signalerror } curry
> >                      /if cvx 2 array astore cvx {check-types} exch compose compose }
> >     check-count { dup length count 2 sub gt }
> >     check-types { dup length 1 add copy true exch { check-type and } forall exch pop not }
> >     check-type  { dup null eq { 3 -1 roll pop pop true }{ 3 -1 roll type eq } ifelse }
> >     make-type-name { dup type /nametype eq { dup length 4 add string dup dup 4 2 roll cvs
> >                      2 copy 0 exch putinterval length (type) putinterval cvn } if }
> >     args    { [ exch 2 { 0 get } fortuple ] cvx }
> >     types   { [ exch 2 { 1 get } fortuple ] }
> >     map     { 1 index xcheck 3 1 roll  [ 3 1 roll forall ]  exch {cvx} if }
> >     reduce  { exch dup first exch rest 3 -1 roll forall }
> >     rreduce { exch aload length 1 sub dup 3 add -1 roll repeat }
> >     curry   { [ 3 1 roll {} forall ] cvx } @pop
> >     {   dup xcheck 3 1 roll
> >         dup length 1 add array dup 0 5 -1 roll put dup 1 4 -1 roll putinterval
> >         exch {cvx}if }
> >     compose { 2 array astore cvx { {} forall } map } @pop
> >     {   1 index xcheck 3 1 roll
> >         1 index length 1 index length add array  dup 0 4 index putinterval
> >         dup 4 -1 roll length 4 -1 roll putinterval
> >         exch {cvx} if }
> >     reverse {
> >         dup xcheck exch
> >         [ exch dup length 1 sub -1 0 { 2 copy get 3 1 roll pop } for pop ]
> >         exch {cvx} if }
> > } pairs-def {
> >     fortokens {src proc}{ { src token {exch /src exch store}{exit}ifelse  proc } loop } @func
> >     fortuple {a n p}{ 0 n /a load length 1 sub
> >         { /a exch /n getinterval /p exec } {load-if-literal-name} map end for
> >     } @func-begin
> >     load-if-literal-name { dup type /nametype eq 1 index xcheck not and { load } if }
> > } pairs-def
> > 
> > 
> > So using pairs-begin or block lets the code look really nice IMO without
> > the slants on all the names being defined.
> > 
> > Next is the PC code. A few short mnemonics at the start. term, seq2, and 
> > alt are the basic primitives, but almost too primitive to use. The actual
> > API is: parser + * ? action. 
> > 
> > The full deal is defining an action which takes a parser and a procedure
> > body and returns a parser with the side effect of executing the procedure.
> > The proc gets passed the string contents matched by the parser.
> > A parser description can also be passed to action without the user needing
> > to call parser on it first.
> > 
> > 
> 
> Some improvements. Alternations now have a short-circuiting semantic so
> it stops trying branches after one succeeds. Reorganized to be more top-down.
> Expanded testing section.
> 
> In order to test that the user proc associated with an action wasn't 
> causing any stack interference, this version uses a function called
> 'below-mark' to get the result (a little bit) out of the way.
> 


Now it's starting to look like something. I've implemented a replacement
for the 'token' operator for a small subset of the PostScript syntax.
Here's the result of parsing a string using both 'token' and 'mytoken'.

$ gsnd pc7.ps
GPL Ghostscript 9.22 (2017-10-04)
Copyright (C) 2017 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
stack:
[
2.65
123
name
37
stack:
[
2.65
123
name
37
GS>

And the code what done it:

$ cat pc7.ps
(struct2.ps) run {

    action{pa ac}{ % parser-desc {proc}  ->  parser
        /pa /pa load parser def
        {                                       % String
            z /pa exec passed {                 % S [#]
              dup {max} reduce zy zxy head      % [#] S[0..^#]
              /ac 3 2 roll                      % S[0..^#] {proc} [#]
              1 aa cvx compose exec             % PASS => [#]
            }{                                  % S []
              zy pop                            % FAIL => []
            } ifelse
        } ll
    } @func

    parser { /stringtype is { term      }{
             /arraytype  is { sequence  }{
             /dicttype   is { alternate }{ } ifelse } ifelse } ifelse }

    *  { parser many }
    +  { parser some }
    ?  { parser maybe }
    char-class { << zy 1 {} fortuple >> }
    inverse { << 0 1 127 { 1 string z 0 4 -1 roll put null } for >> exch
              {} each-map {fix-up} map << exch {null} forall >> minus-keys }

    pass   { # 1 aa zy pop }
    fail   { pop pop [] }
    failed { z # 0 eq }
    passed { failed not }

    next   { y tail }
    sum-up { zy {add} curry forall }

    test { y #        y # lt { fail }{
           y y # head y   eq { pass }{ fail }  ifelse }  ifelse }

    term {str}{ {/str test} ll } @func
    %alt2 {p q}{ {z /p exec zy /q exec compose} ll } @func
    alt2 {p q}{ {z /p exec failed { zy /q exec compose }{ zy pop } ifelse } ll } @func
    seq2 {p q}{ {z /p exec zy {next /q exec sum-up} curry map} ll } @func
    sequence { z xcheck { }{ {parser} map {seq2} reduce } ifelse }
    alternate { {} each-map {fix-up} map {parser} map {alt2} reduce }

    some  { z many seq2 }
    maybe { {()pass} alt2 }
    many  { {{{}exec}exec}{}deep-map   z 0 get zxy seq2  maybe   y y 0 zy put zy pop }

    fix-up { /nulltype is { pop            }{
             /nametype is { z # string cvx }{ } ifelse } ifelse }
    (>>) { [ {counttomark 2 mod 1 eq {null} if
              counttomark dup 1 add copy array astore exch pop
              mark exch 2 { aload pop pop } fortuple ] /.name-order exch} {} forall
             (>>) load dup type /arraytype eq {/exec cvx} if ] cvx } @exec
    minus-keys { y zxy {  pop dup /.name-order ne {undef}{pop pop} ifelse z  } forall pop
                 dup /.name-order 2 copy get [ exch {
                     counttomark 2 add index  1 index known not { pop } if
                 } forall ] put }

    #    {length}
    aa   {array astore}
    ll   { {load-if-literal-name} deep-map }
    x    {2 index}
     y   {1 index}
      z  {dup}
     zy  {exch}
    zxy  {3 1 roll}
    yzx  {3 2 roll}
    head {0 zy getinterval}
    tail {y # y sub getinterval}
    max  {y y lt{zy}if pop}
    is   {y type eq}
    ps   {(stack:)= pstack}
    pc   {ps clear}
    pq   {ps quit}

    deep-map { y type /arraytype ne { exec }{
        y xcheck 3 1 roll  [ 3 1 roll /deep-map cvx 2 array astore cvx forall ] exch {cvx} if } ifelse }
    each-map { 1 index xcheck 3 1 roll  [ 3 1 roll foreach ]  exch {cvx} if }
    foreach {d p}{/d load type /dicttype eq {
                      /d load /.name-order known {
                          /d load /.name-order get  { /d 1 index get /p exec } ll
                      }{/d load /p load} ifelse
                  }{/d load /p load} ifelse
                  end forall } @func-begin

    accumulate { below-mark }
    to-bottom { count 1 roll }
    below-mark { counttomark 1 add 1 roll }
} pairs-begin

/delimiter    ( \t\n()/%[]<>{}) char-class def
/regular-char delimiter inverse def
/digit        (0123456789) char-class def
/number       //digit + def
/opt-number   //digit * def
/opt-spaces   ( \t\n) char-class * def

/rad-integer  [ //digit //digit ? (#) //number ] parser def
/integer      [ (-) ? //number ] parser def
/real         [ (-)? << [ //number (.) //opt-number ] [ (.) //number ] >> ] parser def
/name         //regular-char + def

/on-integer   << //rad-integer //integer >>
                             { [ exch /INT  exch dup cvi       ] accumulate } action def
/on-real      //real         { [ exch /REAL exch dup cvr       ] accumulate } action def
/on-lit-name  [ (/) //name ] { [ exch /NAME exch dup cvn cvlit ] accumulate } action def
/on-name      //name         { [ exch /NAME exch dup cvn cvx   ] accumulate } action def
/on-delimiter //delimiter    { [ exch /DEL  exch dup cvn cvx   ] accumulate } action def

/ps-token     << //on-real //on-integer //on-name //on-delimiter >> parser def

/mytoken {
    dup //opt-spaces exec passed { 0 get tail }{ pop }ifelse
    mark exch dup //ps-token exec passed {
        0 get tail below-mark cleartomark
        exch 2 get true
    }{ cleartomark false } ifelse
} def

(16#25 name 123 2.65 [ ) {
    mytoken {exch}{exit} ifelse
} loop
pc
(16#25 name 123 2.65 [ ) {
    token {exch}{exit} ifelse
} loop
pc

Back to comp.lang.postscript | Previous | NextPrevious in thread | Find similar


Thread

Parser Combinators revisited luser droog <luser.droog@gmail.com> - 2019-02-02 19:51 -0800
  Re: Parser Combinators revisited luser droog <luser.droog@gmail.com> - 2019-02-05 11:56 -0800
    Re: Parser Combinators revisited luser droog <luser.droog@gmail.com> - 2019-02-05 12:39 -0800

csiph-web