Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Hans-Peter Diettrich Newsgroups: comp.compilers Subject: Re: How do you create a grammar for a multi-language language? Date: Mon, 7 Mar 2022 05:08:33 +0100 Organization: Compilers Central Lines: 28 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-03-015@comp.compilers> References: <22-03-004@comp.compilers> <22-03-009@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="36239"; mail-complaints-to="abuse@iecc.com" Keywords: parse Posted-Date: 06 Mar 2022 23:43:40 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com In-Reply-To: <22-03-009@comp.compilers> Xref: csiph.com comp.compilers:2920 On 3/6/22 12:23 PM, Hans-Peter Diettrich wrote: > I don't think that was what he was asking about. The question is too unspecific for me. A (traditional) grammar covers the parser part, while the lexer is specified differently. Languages based on the same lexer can be merged into one grammar, no problem so far. But if the second language shall apply to a single token (string...) of the primary language then both languages are independent and can not share a single grammar. Like a document can contain parts of several natural languages, where each language applies only to specific parts of the documents and is subject to special lexer rules (RTL/LTR reading...). In C/C++ and its preprocessor we have a special construct where the preprocessor tokens are refined/extended by the C/C++ lexer. And there were discussions whether the preprocessor should look into string literals (Siemens?), or whether the preprocessor can generate C comments (Microsoft). A separate preprocessor run at least allows for the latter, as the preprocessed source code is lexed again and can build tokens differently from the tokens in the original source code. My conclusion: A single (formal) grammar can not contain multiple languages. Unless you specify that e.g. statements and expressions in a programming language shall be considered subject to different languages. Such nitpicking is not worth further thoughts :-( DoDi