Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: Alain Ketterlin <alain@universite-de-strasbourg.fr>
Newsgroups: comp.compilers
Subject: Re: Some questions about recursive descent?
Date: Tue, 01 Mar 2022 23:13:25 +0100
Organization: =?utf-8?Q?Universit=C3=A9?= de Strasbourg
Lines: 113
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-002@comp.compilers>
References: <22-02-021@comp.compilers> <22-02-024@comp.compilers> <22-02-026@comp.compilers> <22-02-027@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="42163"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, LL(1)
Posted-Date: 01 Mar 2022 17:40:05 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: csiph.com comp.compilers:2907

Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> writes:

> Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.com> writes:
>>>> My first question is: how are production recursive descent parsers con-
>>>> structed?  From these two books, and other things I have read in the
>>>> past, recursive descent is so simple it's mostly just brushed off with
>>>> an example or two, but there seems to be a whole lot more to it.  For
>>>> instance, [Appel] mentions a parse table but mostly does not explain how
>>>> to construct one. ...

[...]
> The demonstration starts with grammar 3.12, page 47.
>
>   Z -> d         Y ->        X -> Y
>   Z -> X Y Z     Y -> c      X -> a
>
> where Y goes to epsilon in the centre of the first line.
>     Then [Appel] shows how FIRST and FOLLOW sets are computed
> given this grammar, ending with
>
>      nullable    FIRST   FOLLOW
>   X    yes       a c     a c d
>   Y    yes         c     a c d
>   Z    no        a c d
>
> where previous steps are also shown[1].  Then figure 3.14 shows a
> predictive parsing table, presumably constructed using the FIRST
> and FOLLOW sets.
>
>           a            c             d
>     +-----------------------------------------+
>   X |  X -> a        X -> Y        X -> Y     |
>     |  Y -> Y                                 |
>     |                                         |
>   Y |  Y ->          Y ->          Y ->       |
>     |                Y -> c                   |
>     |                                         |
>   Z |  Z -> X Y Z    Z -> X Y Z    Z -> d     |
>     |                              Z -> X Y Z |
>     +-----------------------------------------+
>
> Where {X,a}, {Y,c}, and {Z,d} show conflicts, so this cannot be a
> recursive descent parser; or LL(1) grammar per the book.
>
> I am not clear on how to construct such a table from the description
> in the book, and would not presume to be able to do so by hand, using
> a more complex grammar.

That's a 2-dimensional table M[N,t] where N is a non-terminal, and t
is a terminal symbol. It is build by the following algorithm:

for each rule L -> R1...Rk
    for each terminal f in FIRST (R1...Rk)
        add the rule to M[L,f]
    if R1...Rk is nullable
        for each terminal f in FOLLOW (L)
            add the rule to M[L,f]

In essence: FIRST lets you know when to expand a non-terminal, and
FOLLOW lets you know when to apply a rule that matches nothing in
the input.

The grammar is LL(1) is all entries of the table contain at most one
rule. That is not the case in your example.

Parsing goes as follows: maintain a stack on which you initially push
the start symbol. Then repeat:

if top(stack) is a terminal symbol
    if it matches the next input
        pop it from the stack and advance input
    else
        signal error
else
    if M[top(stack),next_input] is empty
        signal error
    else // M[...] is a rule L -> R1...Rk
        pop an element from the stack and push Rk, ..., R1

(note that you push the rhs in reverse order, such as the leftmost
symbol appears on the top of the stack).

The parsing succeeds if you end up with an empty stack and an empty
input.

I don't have my copy of the dragon book with me, but from pearson's web
site I find this is the object of chapter 4 section 4.

> So I was wondering if people who create /recursive descent/ parsers for
> production compilers use such a table as an intermediary step, or not.
[...]

As John says, if your language fits the requirements (that is, it has a
LL(1) grammar, or LR(1)), you are better off using a generator. It turns
out that "big" compiler parsers are often written by hand, because 1)
grammars are often ambiguous (sometimes at the lexical level), 2)
ambiguities are easier to resolve "by hand" because you can take more
context into account, and 3) hand-written parsers give you more control
on error recovery. (Also because memory is now big enough to hold a
parse tree.)

For C/C++ parsers, add the fact that there are several grammatical
layers (e.g., OpenMP pragmas), and/or you need details that automated
parsing usually omits (e.g., for diagnostics and tools). That's why both
gcc and clang parsers are hand-written.

The official Python implementation used an LL(1) (auto-generated) parser
until recently, where they switched to a PEG-parser (I have to admit I
don't understand their motivation for doing this -- as exposed in a
document called PEP 617). It is still auto-generated as far as I
understand.

-- Alain.