Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.postscript > #3338

Re: PC Regex

Newsgroups comp.lang.postscript
Date 2019-02-14 17:51 -0800
References <6bf6a0c5-89ca-4fb0-b235-eee5ddd7f3ed@googlegroups.com>
Message-ID <a2e14679-41e4-4f1d-b913-98ca3c9e08af@googlegroups.com> (permalink)
Subject Re: PC Regex
From luser droog <luser.droog@gmail.com>

Show all headers | View raw


On Tuesday, February 12, 2019 at 5:48:45 PM UTC-6, luser droog wrote:
> I've used the Parser Combinators to implement a Regex parser 
> and matcher. See the thread "Parser Combinators revisited" for
> the file struct2.ps.
> 
> $ cat pc7.ps
[snip]
> 
> And then I started the regex code in a new file. The first hurdle was
> finding a simple (and correct) grammar online to look at. First few
> hits in the search were duds. Next problem was recursion in the grammar.
> The top level production, /expression, is also a component of /atom
> if surrounded by parens. 
> 
> A little investigating discovered that all of the parsers produced
> by the simple combinators were procedures of length 7. So to build
> the recursion in the grammar, I *forward declare* the /expression
> parser as an empty array of length 7 with the executable flag set.
> That much lets me embed it into the parser tree in the appropriate
> place. Then when it comes to fill it in, I create the parser for it
> and then use 'copy'. Parenthesized expressions don't actually work
> yet, but at least we've learned how to define such things.
> 
> And the next task was arranging to pass the results of the callbacks
> back out of the whole gizmo. I posted the other day in comp.lang.c
> after I got stumped doing it in C. So turning back to PostScript 
> has shown a way forward. 
> 
> I've made lists for each "level" of the grammar and the callback
> actions operate on those. Then after the regex parser runs, a regex
> matcher is ready to use in the /expr-list which is used for the
> output from the top level.
> 
> So, here it is. Pretty short and succinct IMO, despite 6 extra 
> functions for managing the lists.
> 
> $ cat pc7regex.ps
[snip]
> 
> And finally, the output showing some stack dumps, resulting lengths
> of the lists, and 5 test runs on a few matching and non-matching 
> strings.
> 
> 
[snip]

After more revisions and getting the C version to work, it all looks
and works much better now. The code is at

  https://github.com/luser-dr00g/pcomb/

So, I'll skip the parser combinators and just show off the regex
builder. It constructs a parser called Expression with embedded
side-effects that build a new parser as a result. Meta-circularity.

$ cat pcomb/pc7re2.ps
(pc7.ps) run

<<
/listpush { exch 1 index load 2 array astore def }
/listpop  { dup load aload pop exch 3 1 roll def }

/nodes []
/fetch    { /nodes listpop }
/stash    { /nodes listpush }

/do-dot     {{ pop                        any   stash } action}
/do-char    {{                            term  stash } action}
/do-meta    {{            fetch exch  cvx exec  stash } action}
/do-factors {{ pop  fetch fetch exch      seq2  stash } action}
/do-terms   {{ pop  fetch fetch exch      alt2  stash } action}

/dpx { dup print cvx exec }
>> begin

/Dot        (.)                                                  def
/Meta       (*+?)     char-class                                 def
/Character  (*+?.|()) char-class inverse                         def
/Expression 7 array cvx                                          def
/Atom       <<
                //Dot  do-dot
                [ (\() //Expression (\)) ]
                //Character  do-char
            >>                                                   def
/Factor     [  //Atom   //Meta  do-meta  ?  ]                    def
/Term       [  //Factor  //Factor  do-factors  *  ]              def

            [  //Term  [  (|)  //Term  do-terms  ] *  ] parser
//Expression                                                copy pop

/regex {
    //Expression exec  length 0 eq { pop /regex cvx /syntaxerror signalerror } if
    fetch
} def

/r (a(b.)+) regex def
//r 0 get =

((a) r pc ) dpx
((ab) r pc ) dpx
((abx) r pc ) dpx
((abxb) r pc ) dpx
((abxx) r pc ) dpx
((abxbx) r pc ) dpx
((abxabx) r pc ) dpx
((axbxx) r pc ) dpx



quit

$ cd pcomb && make pc7_test
gsnd -dNOSAFER pc7re2.ps
GPL Ghostscript 9.22 (2017-10-04)
Copyright (C) 2017 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
z
(a) r pc stack:
[]
(ab) r pc stack:
[]
(abx) r pc stack:
[3]
(abxb) r pc stack:
[3]
(abxx) r pc stack:
[3]
(abxbx) r pc stack:
[5 3]
(abxabx) r pc stack:
[3]
(axbxx) r pc stack:
[]

Back to comp.lang.postscript | Previous | NextPrevious in thread | Find similar


Thread

PC Regex luser droog <luser.droog@gmail.com> - 2019-02-12 15:48 -0800
  Re: PC Regex luser droog <luser.droog@gmail.com> - 2019-02-14 17:51 -0800

csiph-web