Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #2566

Re: Supporting multiple input syntaxes

From luser droog <mijoryx@yahoo.com.dmarc.email>
Newsgroups comp.compilers
Subject Re: Supporting multiple input syntaxes
Date 2020-08-23 19:35 -0700
Organization Compilers Central
Message-ID <20-08-014@comp.compilers> (permalink)
References <20-08-002@comp.compilers> <20-08-009@comp.compilers> <20-08-010@comp.compilers> <20-08-011@comp.compilers> <20-08-012@comp.compilers>

Show all headers | View raw


On Sunday, August 23, 2020 at 1:39:30 PM UTC-5, luser droog wrote:
> On Sunday, August 16, 2020 at 10:53:24 AM UTC-5, davidl...@gmail.com wrote:
> > My friend, reporting the furthest position examined by the parser I have [found]
> > useful in error cases as a simple stop gap when using a combinator approach.
> >
> > Thinking about it you kind of want to see the furthest failed position and the
> > stack of rules above it. Such requires meta information when the code is
> > written in the most natural way. For this reason and others I believe it is
> > good to represent your grammar in data structures which is further in the
> > direction of a compiler compiler tool (or compiler interpreter tool).
>
> Thanks. I've done some further investigating. I built my parsers following
> two papers. Hutton and Meijer, Monadic Parser Combinators
>   https://www.cs.nott.ac.uk/~pszgmh/monparsing.pdf
> and Hutton, Higher-Order Functions for Parsing
>   https://pdfs.semanticscholar.org/6669/f223fba59edaeed7fabe02b667809a5744d9.pdf
>
> The first adds error reporting using Monad Transformers. [...]>
> But the second paper does it differently, and maybe something I can do
> more easily. It redefines the parsers to no longer produce a list of results,
> so there's no longer support for ambiguity. Then it defines them to
> return a Maybe,
>
>   maybe * ::= Fail [char] | Error [char] | OK *
> .
> where the OK branch has the parse tree, and Fail or Error both contain an error
> message. It describes how a Fail can be transformed into an Error. But it isn't
> entirely clear where the messages get injected.
>
> Still need to do some thinking on it, but I think I can rewrite the parsers
> to follow this model, and then decorate my grammar with possible errors
> at each node.

I've made some progress. I wrote a new prototype following Hutton and
modified it to add position information to the character stream.
And then rewrote the parsers to produce the maybe structure and then
to collect rudimentary error messages.

For these, a positive and negative case in PostScript,

0 0 (abcd\ne) input-pos
(abc) str exec
pc
0 0 (abcd\ne) input-pos
(abd) str nofail exec
pq

I get this output:

$ gsnd -q -dNOSAFER pc11.ps
stack:
[/OK [[(a) [(b) [(c) []]]] [[(d) [0 3]] [[(\n) [0 4]] [[(e) [1 0]] null]]]]]
stack:
[/Error [[(after) (a)] [[(after) (b)] [[{(d) eq} (not satisfied)] [[(c) [0 2]] [[(d) [0 3]] [[(\n) [0 4]] [[(e) [1 0]] null]]]]]]]]

So, this indicates that I *can* modify my C parsers to produce error
messages. The remaining input list has the position information for
where the error occurred, [(c) [0 2]].

Following the prototype, I modified the input functions to add positions
for each character and modified the base parser item() to detect and
remove the position stuff before it passes into the rest of the machinery.

The next hurdle was making the extra position information work with
the Unicode filters ucs4_from_utf8() and utf8_from_ucs4(). And that
all appears to be working now.

But that's probably the end of the story for now. Got to gear up for
Operating Systems and Advanced Web Dev with Jave.

Thanks to everyone for the help, esp. Kaz with the brilliant suggestion
to pass a language id token between tokenizer and parser.


Ps. the prototype is written in PostScript extended with function syntax.
https://github.com/luser-dr00g/pcomb/blob/master/ps/pc11.ps
https://codereview.stackexchange.com/questions/193520/an-enhanced-syntax-for-defining-functions-in-postscript

--
l droog
[Why Postscript?  I realize it's Turing complete, but it seems odd to run ones parser on a printer. -John]

Back to comp.compilers | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-12 15:20 -0700
  Re: Supporting multiple input syntaxes Kaz Kylheku <793-849-0957@kylheku.com> - 2020-08-13 00:43 +0000
  Re: Supporting multiple input syntaxes Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-08-13 10:27 +0200
    Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-13 21:36 -0700
    Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-23 22:16 -0700
  Re: Supporting multiple input syntaxes minforth@arcor.de - 2020-08-13 14:15 -0700
    Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-13 21:37 -0700
    RE: Supporting mulitple input syntaxes Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2020-08-14 12:44 +0300
  Supporting multiple input syntaxes David Lovemore <davidlovemore@gmail.com> - 2020-08-15 06:42 -0700
    Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-15 15:20 -0700
      Re: Supporting multiple input syntaxes David Lovemore <davidlovemore@gmail.com> - 2020-08-16 02:21 -0700
        Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-20 14:45 -0700
          Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-23 19:35 -0700
            Re: implementation languages, was Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-23 20:39 -0700
              Re: implementation languages, was Supporting multiple input syntaxes Thomas Koenig <tkoenig@netcologne.de> - 2020-08-24 17:01 +0000
                Re: implementation languages, was Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-28 10:56 -0700
              Re: stack languages, was Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-24 12:14 -0700
  Re: Supporting multiple input syntaxes gah4 <gah4@u.washington.edu> - 2020-08-23 14:26 -0700
  Re: Supporting multiple input syntaxes antispam@math.uni.wroc.pl - 2021-02-11 23:27 +0000
    Re: Supporting multiple input syntaxes Elijah Stone <elronnd@elronnd.net> - 2021-02-17 01:44 -0800
      Re: Supporting multiple input syntaxes antispam@math.uni.wroc.pl - 2021-02-23 23:28 +0000
        Re: Supporting multiple input syntaxes "ltc...@gmail.com" <ltcmelo@gmail.com> - 2021-03-14 21:08 -0400
  Re: Supporting multiple input syntaxes Rock Brentwood <rockbrentwood@gmail.com> - 2021-03-14 17:36 -0700

csiph-web