Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: "Paul Robinson" <xdpascal@xdpascal.com>
Newsgroups: comp.compilers
Subject: Re:Some questions about recursive descent?
Date: 5 Mar 2022 14:46:08 -0600
Organization: Compilers Central
Lines: 108
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-03-005@comp.compilers>
References: <22-02-021@comp.compilers> <22-02-024@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="17429"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, LL(1)
Posted-Date: 05 Mar 2022 16:53:47 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: csiph.com comp.compilers:2910

On 27 Feb 2022 16:37:05 EST, Johann 'Myrkraverk' Oskarsson
<johann@myrkraverk.com> asked in Newsgroup comp.compilers:

| I have some questions about the construction of
| recursive descent parsers.  The reason for my questions
| is that I know production compilers /use/ recursive
| descent. My first question is: how are production
| recursive descent parsers constructed?

If you want to see another example of this, since I added features to
it, is the XDPascal compiler, available from
https://github.com/electric-socket/xdpw or https://XDpascal.com. Now,
what I'm going to do is explain why and how this is done.

Pascal syntax requires that, after a procedure or function signature,
and any constants, types or variables are declared, it must have a
block. This is a BEGIN, zero or more statements, each separated by a
semicolon, then an optional semicolon, then END and a semicolon if it's
a procedure or function, a period if it's the end of a program. A syntax
diagram may help here:

Block:      ^______________>|
            |               V
--->BEGIN---+-->Statement---+-->V--------->^----> END --->
            ^_______ ; <____V   |__> ;_>__|

In the code example below, any identifier ending
in TOK is the token for that symbol or keyword,
e.g. DOTOK for DO, DOTTOK for period, etc.

So at the end of the compile, the compiler would
have a piece of code similar to this:

  block;
  nexttoken;
  if thistoken<>DOTTOK then error('Period expected');
  finishcompiling;

  Procedure "block" would have something similar to
  the following:

  repeat
       statement;
  until thistoken = ENDTOK or END_OF_FILE ;


Procedure "Statement" would look like this:

  nextToken;
   Case thisToken of
      Identifier : Identstmt;
      FORTOK: ForStmt
      REPEATTOK: RepeatStmt;
      BEGINTOK: Block;           // [1]
      ...
      WHILETOK: WhileStmt
    ELSE
       Error('Syntax error');


FOR statement:
   ----> FOR --> := --> ident ------> expression -->|
   |<-----------------------------------------------V
   V--V------- TO ----->----> expression --> DO --|
      V____> DOWNTO -->|                          |
                                                  |
   |<---------------------------------------------V
   V--------- statement ----------------->

Now, procedure ForStmt might be coded like this:

    nexttoken;
    if thistok <> BECOMESTOK then
      error(' := expected');
    nexttoken;
    if thistok <> identTok then
       error(' Identifier expected ');
    ...

and so on until:

    nexttoken;
    if thistok <> DOTOK then
      error(' DO expected');
    nexttoken;
    statement;

Now, if you look, at // [1], the case selector for
the token BEGIN calls procedure block again, while
it itself was called by block earlier. And, after the
DO token in FOR, statement calls itself.

If you notice, recursive descent works because each
statement in the program is processed until a "trigger"
token exits that state back to where it was called.
On return, syntax checking continues with the next
token after the one that caused a block or statement
procedure to be called.

I hope this example helps you to understand recursive
descent parsing.


XDPascal.com: The (new) home of the XDPascal Compiler.
----
"The lessons of history teach us - if they teach us anything - that no
one learns the lessons that history teaches us."
Paul Robinson <XDpascal@XDPascal.com> / <paul@paul-robinson.us>