Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Subject: Re: Are compiler developers light-years ahead of other software development?
Date: Sat, 22 Jan 2022 10:43:39 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 91
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-01-088@comp.compilers>
References: <22-01-059@comp.compilers> <22-01-083@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="59534"; mail-complaints-to="abuse@iecc.com"
Keywords: theory, parse, comment
Posted-Date: 22 Jan 2022 10:59:01 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: csiph.com comp.compilers:2855

Kaz Kylheku <480-992-1380@kylheku.com> writes:
>None of the buzzwords you mentioned related to parsing are needed in
>building a compiler: grammars, LL, LR, first() function, following()
>function, parsing tables, etc.
>
>All that goes away if you write recursive descent parser. The grammar is
>then in the documentation and code comments only, and somewhat expressed
>in its code structure. There is no LR, first or follow, no tables.

As our moderator notes, recursive descent is an implementation
technique for LL(1) grammars.  Moreover, you need first-sets to
implement them.  If your grammar is

A -> B|C

your recursive-descent parser becomes:

if next_symbol in first(B) then parse_B else parse_C fi

If you want to learn about conflicts in your grammar early on (rather
than finding it out later by having some input be parsed other than
intended), you also need follow-sets: if your grammar is

A -> B|epsilon

there is a conflict if "intersection(first(b),follow(A))" (where "U"
signifies the union) is not empty: if one of the symbols in the
intersection comes up at the start of an A, do you parse B or not?

Of course, if you are freshly designing a programming language, you
may be fine with declaring the actual behaviour of the
redursive-descent parser to be the intended behaviour.  But if you
have to implement a given grammar, things become more complex.

You can also design the syntax to make recursive-descent parsing easy,
like Wirth did.  E.g., almost all statements start with a keyword, and
there is AFAIK only one statement that may be started by several
symbols (none of them conflicting with one of the other statement
keywords), so you can easily implement statements like this:

if next_symbol=IF_SYM then ifstatement()
else if next_symbol=WHILE_SYM then whilestatement()
else if ...
...
else assignmentstatement()

All statements are followed by ";" or END, making it easy to know that
there are no conflicts with following statements.

These properties also make it easier for humans to understand
grammars, so they are a good idea in syntax design.  If you go further
in that direction, you get to Lisp (which does not need a BNF-based
parser) or Forth (where the parser just looks for the next white
space).  Of course, if you look at textbooks for these languages, you
find that you have patterns at a different ("static semantics") level,
but at least it's always clear in these language from the syntax which
if an else-branch belongs to.

>The GNU C++ compiler, undeniably a production compiler and a
>major/known/widely-used one, has a big recursive-descent parser
>maintained by hand: over a megabyte of code.
>
>In other words, a major compiler for probably the programming language
>with the most complicated syntax ever, eschews pretty much all that we
>have learned and accumulated about parsing between around 1968 and now.

C++ is the language that inspired Terrence Parr to add synactic and
semantic predicates to PCCTS/ANTLR (not sure what has been added since
I heard him talk about it in the early 1990s).  So basically parser
generators like yacc were not sufficient for the problems posed by the
C++ syntax; when G++ was started in 1987, no free parser generator had
such features, so the decision to go with a hand-written parser is
understandable, and of course since then there were two good reasons
to stay there: 1) Transition costs.  2) Backwards compatibility.

Looking further back, AFAIK Cfront also used a hand-written parser;
this (together with the requirement of C compatibility) resulted in a
syntax design that was not constrained by the parser generators of the
time, but of course also a syntax that was beyond their capabilities.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[Nope, cfront used a yacc parser with a whole lot of %left and %right
declarations to make shift/reduce warnings go away. Legend says that
much odd C++ syntax is from poorly understood consequences of those
declarations. See it at
http://www.softwarepreservation.org/projects/c_plus_plus/index.html#cfront
-John]