Path: csiph.com!aioe.org!peer02.ams4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Kaz Kylheku <793-849-0957@kylheku.com> Newsgroups: comp.compilers Subject: Re: Supporting multiple input syntaxes Date: Thu, 13 Aug 2020 00:43:47 +0000 (UTC) Organization: Aioe.org NNTP Server Lines: 48 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <20-08-003@comp.compilers> References: <20-08-002@comp.compilers> Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="68682"; mail-complaints-to="abuse@iecc.com" Keywords: C, parse Posted-Date: 13 Aug 2020 18:22:32 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com X-Received-Bytes: 3213 X-Received-Body-CRC: 2589618188 Xref: csiph.com comp.compilers:2555 On 2020-08-12, luser droog wrote: > I've got my project successfully parsing the circa-1975 C syntax > from that old manual. I'd like to add parsers for K&R1 and c90 > syntaxes. > > How separate should these be? Should they be complete > separate grammars, or more piecewise selection? > > My feeling is that separating them will be less headache, but maybe > there's some advantage to changing out smaller pieces of the grammar > in that it might be easier to make sure that they produce the same > structure compatible with the backend. > > Any guidance in this area? > > https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c I'd say that since you're not using a parser generator, but using code statements to construct the grammar objects at initialization time, you have the flexibility to merge the implementation, because you can check the value of some dialect-selecting variable, and construct the parser accordingly, and elsewhere check that same variable to do whatever else needs to be done conditionally. The trick is to find a way to embed the *semantics* of the older dialects into the new so then everything after the parsing can be shared. Similar remarks would apply to recursive descent. If you were using something clunky like a Yacc, there are still ways to combine everything into a single grammar. The input stream can be primed with one of several "secret internal token" objects that has no lexeme. (Primed, meaming that the first call to the lexer yields this secret token instead of processing actual input.) Each token indicates a dialect to parse. The top-level grammar production can then pick one of several subordinate production rules corresponding to the entry points for the respective dialects. Those can then share common rules as much as possible. translation_unit : C75_TOKEN c75_translation_unit /* orig flavor */ | C79_TOKEN c79_translation_unit /* "K&R" */ | C90_TOKEN c90_translation_unit /* ANSI/ISO */ ; -- TXR Programming Lanuage: http://nongnu.org/txr Music DIY Mailing List: http://www.kylheku.com/diy ADA MP-1 Mailing List: http://www.kylheku.com/mp1