Path: csiph.com!xmission!usenet.csail.mit.edu!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: antispam@math.uni.wroc.pl Newsgroups: comp.compilers Subject: Re: Supporting multiple input syntaxes Date: Thu, 11 Feb 2021 23:27:43 +0000 (UTC) Organization: Politechnika Wroclawska Lines: 54 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <21-02-004@comp.compilers> References: <20-08-002@comp.compilers> Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="1495"; mail-complaints-to="abuse@iecc.com" Keywords: parse, design Posted-Date: 11 Feb 2021 19:05:51 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2629 luser droog wrote: > I've got my project successfully parsing the circa-1975 C syntax > from that old manual. I'd like to add parsers for K&R1 and c90 > syntaxes. > > How separate should these be? Should they be complete > separate grammars, or more piecewise selection? > > My feeling is that separating them will be less headache, but maybe > there's some advantage to changing out smaller pieces of the grammar > in that it might be easier to make sure that they produce the same > structure compatible with the backend. > > Any guidance in this area? > > https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c > > [Really, it's up to you. My inclination would be to make them > separate but use some sort of macro setup so you can insert > common pieces into each of the grammars. -John] Gnu Pascal supports several Pascal dialects. Gnu Pascal uses unified parser for all dialects. Some ideas used: - flags in scanner decide if dialect specific tokens are recognized - superset parsing: several constructs are generalized so that single construct represents things that othewise would lead to conflits. Later semantic stage looks at dialects flags, prunes things not allowed in given dialect. Example of superset contruction is rule 'call_or_cast', it handles several syntactically similar constructs that are usually given by separate syntax rules. Semantic rules beside dialect flags use types to decide of meaning. - even after usin two tricks above grammar still have LALR conflicts, they are resolved using GLR option of Bison. All conflicts are resolvable using lookahead, and AFAICS some are only resolvable with lookahead. Parser lookahead means that traditional trick of passing semantic info back to scanner does not work (parser actions are delayed, so scanner may be forced to produce token before semantic info is available). Still, it seems that GLR leads to cleaner parser. My impression is that variation in Pascal dialects is larger than in C dialects, so case for unified parser in C IMHO is much stronger. OTOH Gnu Pascal is full compiler with semantic actions invoked from grammar rules. Semantic code embedded in the parser changed much more than grammar rules, so maintaining separate parsers probably would be a nightmare. -- Waldek Hebisch