Path: csiph.com!aioe.org!peer02.ams4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: Kaz Kylheku <793-849-0957@kylheku.com>
Newsgroups: comp.compilers
Subject: Re: Supporting multiple input syntaxes
Date: Thu, 13 Aug 2020 00:43:47 +0000 (UTC)
Organization: Aioe.org NNTP Server
Lines: 48
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <20-08-003@comp.compilers>
References: <20-08-002@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="68682"; mail-complaints-to="abuse@iecc.com"
Keywords: C, parse
Posted-Date: 13 Aug 2020 18:22:32 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
X-Received-Bytes: 3213
X-Received-Body-CRC: 2589618188
Xref: csiph.com comp.compilers:2555

On 2020-08-12, luser droog <mijoryx@yahoo.com.dmarc.email.dmarc.email> wrote:
> I've got my project successfully parsing the circa-1975 C syntax
> from that old manual. I'd like to add parsers for K&R1 and c90
> syntaxes.
>
> How separate should these be? Should they be complete
> separate grammars, or more piecewise selection?
>
> My feeling is that separating them will be less headache, but maybe
> there's some advantage to changing out smaller pieces of the grammar
> in that it might be easier to make sure that they produce the same
> structure compatible with the backend.
>
> Any guidance in this area?
>
> https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c

I'd say that since you're not using a parser generator, but using code
statements to construct the grammar objects at initialization time, you
have the flexibility to merge the implementation, because you can check
the value of some dialect-selecting variable, and construct the parser
accordingly, and elsewhere check that same variable to do whatever else
needs to be done conditionally.

The trick is to find a way to embed the *semantics* of the older dialects
into the new so then everything after the parsing can be shared.

Similar remarks would apply to recursive descent.

If you were using something clunky like a Yacc, there are still ways
to combine everything into a single grammar. The input stream can be
primed with one of several "secret internal token" objects that has no
lexeme. (Primed, meaming that the first call to the lexer yields this
secret token instead of processing actual input.) Each token indicates
a dialect to parse. The top-level grammar production can then pick
one of several subordinate production rules corresponding to the entry
points for the respective dialects.  Those can then share common rules
as much as possible.

  translation_unit : C75_TOKEN c75_translation_unit /* orig flavor */
                   | C79_TOKEN c79_translation_unit /* "K&R" */
                   | C90_TOKEN c90_translation_unit /* ANSI/ISO */
                   ;

--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List:  http://www.kylheku.com/diy
ADA MP-1 Mailing List:   http://www.kylheku.com/mp1