Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Newsgroups: comp.compilers
Subject: Re: How do you create a grammar for a multi-language language?
Date: Mon, 7 Mar 2022 05:08:33 +0100
Organization: Compilers Central
Lines: 28
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-015@comp.compilers>
References: <22-03-004@comp.compilers> <22-03-009@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="36239"; mail-complaints-to="abuse@iecc.com"
Keywords: parse
Posted-Date: 06 Mar 2022 23:43:40 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-03-009@comp.compilers>
Xref: csiph.com comp.compilers:2920

On 3/6/22 12:23 PM, Hans-Peter Diettrich wrote:
> I don't think that was what he was asking about.

The question is too unspecific for me. A (traditional) grammar covers
the parser part, while the lexer is specified differently. Languages
based on the same lexer can be merged into one grammar, no problem so
far. But if the second language shall apply to a single token
(string...) of the primary language then both languages are independent
and can not share a single grammar. Like a document can contain parts of
several natural languages, where each language applies only to specific
parts of the documents and is subject to special lexer rules (RTL/LTR
reading...).

In C/C++ and its preprocessor we have a special construct where the
preprocessor tokens are refined/extended by the C/C++ lexer. And there
were discussions whether the preprocessor should look into string
literals (Siemens?), or whether the preprocessor can generate C comments
(Microsoft). A separate preprocessor run at least allows for the latter,
as the preprocessed source code is lexed again and can build tokens
differently from the tokens in the original source code.

My conclusion:
A single (formal) grammar can not contain multiple languages. Unless you
specify that e.g. statements and expressions in a programming language
shall be considered subject to different languages. Such nitpicking is
not worth further thoughts :-(

DoDi