Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Alain Ketterlin Newsgroups: comp.compilers Subject: Re: Some questions about recursive descent? Date: Tue, 01 Mar 2022 23:13:25 +0100 Organization: =?utf-8?Q?Universit=C3=A9?= de Strasbourg Lines: 113 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-03-002@comp.compilers> References: <22-02-021@comp.compilers> <22-02-024@comp.compilers> <22-02-026@comp.compilers> <22-02-027@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="42163"; mail-complaints-to="abuse@iecc.com" Keywords: parse, LL(1) Posted-Date: 01 Mar 2022 17:40:05 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2907 Johann 'Myrkraverk' Oskarsson writes: > Johann 'Myrkraverk' Oskarsson writes: >>>> My first question is: how are production recursive descent parsers con- >>>> structed? From these two books, and other things I have read in the >>>> past, recursive descent is so simple it's mostly just brushed off with >>>> an example or two, but there seems to be a whole lot more to it. For >>>> instance, [Appel] mentions a parse table but mostly does not explain how >>>> to construct one. ... [...] > The demonstration starts with grammar 3.12, page 47. > > Z -> d Y -> X -> Y > Z -> X Y Z Y -> c X -> a > > where Y goes to epsilon in the centre of the first line. > Then [Appel] shows how FIRST and FOLLOW sets are computed > given this grammar, ending with > > nullable FIRST FOLLOW > X yes a c a c d > Y yes c a c d > Z no a c d > > where previous steps are also shown[1]. Then figure 3.14 shows a > predictive parsing table, presumably constructed using the FIRST > and FOLLOW sets. > > a c d > +-----------------------------------------+ > X | X -> a X -> Y X -> Y | > | Y -> Y | > | | > Y | Y -> Y -> Y -> | > | Y -> c | > | | > Z | Z -> X Y Z Z -> X Y Z Z -> d | > | Z -> X Y Z | > +-----------------------------------------+ > > Where {X,a}, {Y,c}, and {Z,d} show conflicts, so this cannot be a > recursive descent parser; or LL(1) grammar per the book. > > I am not clear on how to construct such a table from the description > in the book, and would not presume to be able to do so by hand, using > a more complex grammar. That's a 2-dimensional table M[N,t] where N is a non-terminal, and t is a terminal symbol. It is build by the following algorithm: for each rule L -> R1...Rk for each terminal f in FIRST (R1...Rk) add the rule to M[L,f] if R1...Rk is nullable for each terminal f in FOLLOW (L) add the rule to M[L,f] In essence: FIRST lets you know when to expand a non-terminal, and FOLLOW lets you know when to apply a rule that matches nothing in the input. The grammar is LL(1) is all entries of the table contain at most one rule. That is not the case in your example. Parsing goes as follows: maintain a stack on which you initially push the start symbol. Then repeat: if top(stack) is a terminal symbol if it matches the next input pop it from the stack and advance input else signal error else if M[top(stack),next_input] is empty signal error else // M[...] is a rule L -> R1...Rk pop an element from the stack and push Rk, ..., R1 (note that you push the rhs in reverse order, such as the leftmost symbol appears on the top of the stack). The parsing succeeds if you end up with an empty stack and an empty input. I don't have my copy of the dragon book with me, but from pearson's web site I find this is the object of chapter 4 section 4. > So I was wondering if people who create /recursive descent/ parsers for > production compilers use such a table as an intermediary step, or not. [...] As John says, if your language fits the requirements (that is, it has a LL(1) grammar, or LR(1)), you are better off using a generator. It turns out that "big" compiler parsers are often written by hand, because 1) grammars are often ambiguous (sometimes at the lexical level), 2) ambiguities are easier to resolve "by hand" because you can take more context into account, and 3) hand-written parsers give you more control on error recovery. (Also because memory is now big enough to hold a parse tree.) For C/C++ parsers, add the fact that there are several grammatical layers (e.g., OpenMP pragmas), and/or you need details that automated parsing usually omits (e.g., for diagnostics and tools). That's why both gcc and clang parsers are hand-written. The official Python implementation used an LL(1) (auto-generated) parser until recently, where they switched to a PEG-parser (I have to admit I don't understand their motivation for doing this -- as exposed in a document called PEP 617). It is still auto-generated as far as I understand. -- Alain.