Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: Alain Ketterlin <alain@universite-de-strasbourg.fr.invalid>
Newsgroups: comp.compilers
Subject: Re: Some questions about recursive descent?
Date: Mon, 28 Feb 2022 21:52:54 +0100
Organization: =?utf-8?Q?Universit=C3=A9?= de Strasbourg
Lines: 60
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-02-026@comp.compilers>
References: <22-02-021@comp.compilers> <22-02-024@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="15798"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, LL(1), comment
Posted-Date: 28 Feb 2022 16:19:54 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: csiph.com comp.compilers:2904

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

> Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.com> writes:
>>My first question is: how are production recursive descent parsers con-
>>structed?  From these two books, and other things I have read in the
>>past, recursive descent is so simple it's mostly just brushed off with
>>an example or two, but there seems to be a whole lot more to it.  For
>>instance, [Appel] mentions a parse table but mostly does not explain how
>>to construct one.
>
> I don't know what such a table would be good for.  My parser generator
> Gray <https://www.complang.tuwien.ac.at/forth/gray.zip>, which
> generates recursive-descent parsers, certainly does not need one.

A parse table for recursive descent/LL(1) simply tells which rule to
expand given the non-terminal currently parsed and the incoming token
type. The building of this table is described in, e.g., the dragon book
(for sure in the 2nd edition).

> The way Gray works is: It computes first and follow sets for all nodes
> in the grammar; follow sets are only used for computing first

You probably mean the reverse (first is used to compute follow).

> sets and for reporting conflicts, first sets are used for deciding how
> to parse. It generates code as follows:
>
> grammar  code
> terminal if sym in first(terminal) then sym=scan(); else syntax_error(); end
> a b      code(a) code(b)
> a|b      if sym in first(a) then code(a); else code(b); end
> a?       if sym in first(a) do code(a); end
> a*       while sym in first(a) do code(a); end
> a+       do code(a) while sym in first(a)
> nonterm  call(function(nonterm))
> action   action

The parsing table embodies all that logic. Parsing works by maintaining
a stack of symbols, and the table indicates what actions need to be
taken (either replace one non-terminal with a rhs on the stack, or match
one input symbol). The algorithm works with any table, and the whole
thing (algorithm+table) is certainly much shorter than generating code
for each and every rule. EBNF constructions like ?/*/+/... have to be
translated first into "regular" rules, but that's trivial pre-processing
(along with left factoring/recursivity elimination).

Building a parse tree in this context is less obvious, because
non-terminal symbols seem to "disappear" during the parsing. The
solution (at least the one I came up with) consists in keeping special
entries on the stack signaling the end of a rhs. Then, special actions
(i.e., "reduce") can be taken when finding these special entries during
parsing, much in the same way as it is done in bottom-up (e.g., LR)
parsing: the "semantic values" of the various symbols can be pushed onto
a separate stack, and you end up with something really similar to
LR-parsing.

-- Alain.
[I think the point is that if you have a table, you have a table driven
LL(1) parser rather than recursive descent in which the grammar is embedded
in the code. -John]