Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: "Paul Robinson" Newsgroups: comp.compilers Subject: Re:Some questions about recursive descent? Date: 5 Mar 2022 14:46:08 -0600 Organization: Compilers Central Lines: 108 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-03-005@comp.compilers> References: <22-02-021@comp.compilers> <22-02-024@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="17429"; mail-complaints-to="abuse@iecc.com" Keywords: parse, LL(1) Posted-Date: 05 Mar 2022 16:53:47 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2910 On 27 Feb 2022 16:37:05 EST, Johann 'Myrkraverk' Oskarsson asked in Newsgroup comp.compilers: | I have some questions about the construction of | recursive descent parsers. The reason for my questions | is that I know production compilers /use/ recursive | descent. My first question is: how are production | recursive descent parsers constructed? If you want to see another example of this, since I added features to it, is the XDPascal compiler, available from https://github.com/electric-socket/xdpw or https://XDpascal.com. Now, what I'm going to do is explain why and how this is done. Pascal syntax requires that, after a procedure or function signature, and any constants, types or variables are declared, it must have a block. This is a BEGIN, zero or more statements, each separated by a semicolon, then an optional semicolon, then END and a semicolon if it's a procedure or function, a period if it's the end of a program. A syntax diagram may help here: Block: ^______________>| | V --->BEGIN---+-->Statement---+-->V--------->^----> END ---> ^_______ ; <____V |__> ;_>__| In the code example below, any identifier ending in TOK is the token for that symbol or keyword, e.g. DOTOK for DO, DOTTOK for period, etc. So at the end of the compile, the compiler would have a piece of code similar to this: block; nexttoken; if thistoken<>DOTTOK then error('Period expected'); finishcompiling; Procedure "block" would have something similar to the following: repeat statement; until thistoken = ENDTOK or END_OF_FILE ; Procedure "Statement" would look like this: nextToken; Case thisToken of Identifier : Identstmt; FORTOK: ForStmt REPEATTOK: RepeatStmt; BEGINTOK: Block; // [1] ... WHILETOK: WhileStmt ELSE Error('Syntax error'); FOR statement: ----> FOR --> := --> ident ------> expression -->| |<-----------------------------------------------V V--V------- TO ----->----> expression --> DO --| V____> DOWNTO -->| | | |<---------------------------------------------V V--------- statement -----------------> Now, procedure ForStmt might be coded like this: nexttoken; if thistok <> BECOMESTOK then error(' := expected'); nexttoken; if thistok <> identTok then error(' Identifier expected '); ... and so on until: nexttoken; if thistok <> DOTOK then error(' DO expected'); nexttoken; statement; Now, if you look, at // [1], the case selector for the token BEGIN calls procedure block again, while it itself was called by block earlier. And, after the DO token in FOR, statement calls itself. If you notice, recursive descent works because each statement in the program is processed until a "trigger" token exits that state back to where it was called. On return, syntax checking continues with the next token after the one that caused a block or statement procedure to be called. I hope this example helps you to understand recursive descent parsing. XDPascal.com: The (new) home of the XDPascal Compiler. ---- "The lessons of history teach us - if they teach us anything - that no one learns the lessons that history teaches us." Paul Robinson /