Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.postscript > #3708

C scanner

Newsgroups comp.lang.postscript
Date 2021-12-13 15:20 -0800
Message-ID <bc506bd2-6516-4a4e-8e78-caa37fee9c0fn@googlegroups.com> (permalink)
Subject C scanner
From luser droog <luser.droog@gmail.com>

Show all headers | View raw


Sticking with PS version 12 of the parser combinators, I finished the
usual 3 examples (regex, PS scanner, JSON parser) and they seemed
pretty good and concise. So I translated my C scanner over from
the C version 9. It looks pretty good to me. Especially the helper
function `tokendef` which makes the parser add a tag to the return value.

Wrapping a lazy-input function around another lazy-input functions is
just weird. It seems to work when I run it stepwise in my head, but it
still looks weird the way it's written. It makes more sense when you 
look at how `lazy-input` builds the function. But that part isn't new so 
I won't include it here.

The big idea is at the bottom. Calling `token-input` with a string-input
and 2 zeros gives you a lazy stream of tagged token structures. 
Calling `string-input` needs its own 2 zeros. So there's a lot of zeros
to put 'em together.

%errordict/typecheck{ps pe quit}put
(pc12.ps)run {
  tokendef{ 1 index cvlit { exch cons one } curry using  def }
  cvsstr{ dup length string cvs }
  strcat{ 2 copy length exch length add string % a b s
	  3 2 roll 2 copy 0 exch putinterval   % b s a
          length 3 2 roll 3 copy putinterval pop pop }
  prefix{ exch strcat cvn }
} pairs-begin

/keywords {
  int      char
  float    double   struct
  auto     extern
  register static
  goto     return   sizeof
  break    continue
  if       else
  for      do       while
  switch   case     default
} cvlit  def
keywords { cvsstr dup (k_) prefix exch str tokendef } forall
/keyword-names  keywords { cvsstr (k_) prefix } map  def

/symbols {
  star   (*)     plusplus   (++)    plus    (+)    dot    (.)
  arrow  (->)    minusminus (--)    minus   (-)
  bangeq (!=)    bang       (!)     tilde   (~)
  ampamp (&&)    amp        (&)     eqeq    (==)   equal  (=)
  caret  (^)     pipepipe   (||)    pipe    (|)
  slant  (/)     percent    (%)
  ltlt   (<<)    lteq       (<=)    less    (<)
  gtgt   (>>)    gteq       (>=)    greater (>)
  lparen (\()    rparen     (\))
  comma  (,)     semi       (;)     colon   (:)    quest  (?)
  lbrace ({)     rbrace     (})     lbrack  ([)    rbrack (])  
} cvlit  def
symbols 2 { aload pop str tokendef } fortuple
/symbol-names  [ symbols 2 { first } fortuple ]  def

/assignops {
  pluseq (+=)    minuseq    (-=)
  stareq (*=)    slanteq    (/=)    percenteq (%=)
  gtgteq (>>=)   ltlteq     (<<=)
  ampeq  (&=)    careteq    (^=)    pipeeq    (|=)
} cvlit  def
assignops 2 { aload pop str tokendef } fortuple

/comment    (/*) str  (*) noneof many (*) char then some then  (/) then  def
/space      ( \t\n) anyof  //comment  alt  many  def

/alpha_     (a)(z)range (A)(Z)range alt (_)char alt  def
/digit      (0)(9)range  def
/identifier //alpha_  //alpha_ //digit alt many then  tokendef

/integer    //digit some  tokendef
/floating   //digit some (.) char then //digit many then
            (.) char //digit some then  alt
	    (eE) anyof (+-) anyof maybe then //digit some then maybe then  tokendef

/escape     (\\) char
              //digit //digit maybe then //digit maybe then
	      ('"bnrt\\) anyof  alt  then  def
/char_      //escape ('\n) noneof alt  def
/schar_     //escape ("\n) noneof alt  def
/character  (') char //char_ then (') char then  tokendef
/astring    (") char //schar_ many then (") char then  tokendef

/constant   //floating //integer alt //character alt //astring alt  tokendef

/symbolic   [ keyword-names {load} forall
              symbol-names {load} forall
              assignops 2{first load} fortuple
            counttomark 1 sub {alt} repeat exch pop  def

/ctoken //space  //constant //symbolic alt //identifier alt  xthen  def
/token-input{r c in}
  { in dup //ctoken exec +not-ok { true }{ exch pop second xs-x false } ifelse }
  { 4 3 roll } % xs [x[r c]] r' c' -> [x[r c]] r' c' xs
  { token-input } lazy-input def

0 0 ( aname another) string-input //ctoken exec report
0 0 ( ++ / * ) string-input //ctoken exec report
0 0 ( 37,x,y ) string-input //ctoken exec report
0 0  0 0 ( 37,x,y{12+q;} ) string-input  token-input
 dup first ==
 next dup first ==
 next dup first ==
 next dup first ==
 next dup first ==
 next dup first ==
 pc

quit


$ gsnd -q -dNOSAFER pc12ctok.ps
OK
[[/identifier [(a) (n) (a) (m) (e)]]]
remainder:[[( ) [0 6]] {0 7 (another) string-input}]
OK
[[/plusplus [(+) (+)]]]
remainder:{0 3 ( / * ) string-input}
OK
[[/constant [[/integer [(3) (7)]]]]]
remainder:[[(,) [0 3]] {0 4 (x,y ) string-input}]
[[[/constant [[/integer [(3) (7)]]]]] [0 0]]
[[[/comma (,)]] [0 1]]
[[[/identifier (x)]] [0 2]]
[[[/comma (,)]] [0 3]]
[[[/identifier (y)]] [0 4]]
[[[/lbrace ({)]] [0 5]]
stack:
[[[[/lbrace ({)]] [0 5]] {0 6 {0 8 (12+q;} ) string-input} token-input}]

Back to comp.lang.postscript | Previous | Next | Find similar | Unroll thread


Thread

C scanner luser droog <luser.droog@gmail.com> - 2021-12-13 15:20 -0800

csiph-web