Path: csiph.com!xmission!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: luser droog Newsgroups: comp.compilers Subject: Re: Supporting multiple input syntaxes Date: Sun, 23 Aug 2020 19:35:00 -0700 (PDT) Organization: Compilers Central Lines: 81 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <20-08-014@comp.compilers> References: <20-08-002@comp.compilers> <20-08-009@comp.compilers> <20-08-010@comp.compilers> <20-08-011@comp.compilers> <20-08-012@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="1290"; mail-complaints-to="abuse@iecc.com" Keywords: C, parse, comment Posted-Date: 23 Aug 2020 23:15:04 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com In-Reply-To: <20-08-012@comp.compilers> Xref: csiph.com comp.compilers:2566 On Sunday, August 23, 2020 at 1:39:30 PM UTC-5, luser droog wrote: > On Sunday, August 16, 2020 at 10:53:24 AM UTC-5, davidl...@gmail.com wrote: > > My friend, reporting the furthest position examined by the parser I have [found] > > useful in error cases as a simple stop gap when using a combinator approach. > > > > Thinking about it you kind of want to see the furthest failed position and the > > stack of rules above it. Such requires meta information when the code is > > written in the most natural way. For this reason and others I believe it is > > good to represent your grammar in data structures which is further in the > > direction of a compiler compiler tool (or compiler interpreter tool). > > Thanks. I've done some further investigating. I built my parsers following > two papers. Hutton and Meijer, Monadic Parser Combinators > https://www.cs.nott.ac.uk/~pszgmh/monparsing.pdf > and Hutton, Higher-Order Functions for Parsing > https://pdfs.semanticscholar.org/6669/f223fba59edaeed7fabe02b667809a5744d9.pdf > > The first adds error reporting using Monad Transformers. [...]> > But the second paper does it differently, and maybe something I can do > more easily. It redefines the parsers to no longer produce a list of results, > so there's no longer support for ambiguity. Then it defines them to > return a Maybe, > > maybe * ::= Fail [char] | Error [char] | OK * > . > where the OK branch has the parse tree, and Fail or Error both contain an error > message. It describes how a Fail can be transformed into an Error. But it isn't > entirely clear where the messages get injected. > > Still need to do some thinking on it, but I think I can rewrite the parsers > to follow this model, and then decorate my grammar with possible errors > at each node. I've made some progress. I wrote a new prototype following Hutton and modified it to add position information to the character stream. And then rewrote the parsers to produce the maybe structure and then to collect rudimentary error messages. For these, a positive and negative case in PostScript, 0 0 (abcd\ne) input-pos (abc) str exec pc 0 0 (abcd\ne) input-pos (abd) str nofail exec pq I get this output: $ gsnd -q -dNOSAFER pc11.ps stack: [/OK [[(a) [(b) [(c) []]]] [[(d) [0 3]] [[(\n) [0 4]] [[(e) [1 0]] null]]]]] stack: [/Error [[(after) (a)] [[(after) (b)] [[{(d) eq} (not satisfied)] [[(c) [0 2]] [[(d) [0 3]] [[(\n) [0 4]] [[(e) [1 0]] null]]]]]]]] So, this indicates that I *can* modify my C parsers to produce error messages. The remaining input list has the position information for where the error occurred, [(c) [0 2]]. Following the prototype, I modified the input functions to add positions for each character and modified the base parser item() to detect and remove the position stuff before it passes into the rest of the machinery. The next hurdle was making the extra position information work with the Unicode filters ucs4_from_utf8() and utf8_from_ucs4(). And that all appears to be working now. But that's probably the end of the story for now. Got to gear up for Operating Systems and Advanced Web Dev with Jave. Thanks to everyone for the help, esp. Kaz with the brilliant suggestion to pass a language id token between tokenizer and parser. Ps. the prototype is written in PostScript extended with function syntax. https://github.com/luser-dr00g/pcomb/blob/master/ps/pc11.ps https://codereview.stackexchange.com/questions/193520/an-enhanced-syntax-for-defining-functions-in-postscript -- l droog [Why Postscript? I realize it's Turing complete, but it seems odd to run ones parser on a printer. -John]