Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.compilers Subject: Re: Are compiler developers light-years ahead of other software development? Date: Sat, 22 Jan 2022 10:43:39 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 91 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-01-088@comp.compilers> References: <22-01-059@comp.compilers> <22-01-083@comp.compilers> Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="59534"; mail-complaints-to="abuse@iecc.com" Keywords: theory, parse, comment Posted-Date: 22 Jan 2022 10:59:01 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2855 Kaz Kylheku <480-992-1380@kylheku.com> writes: >None of the buzzwords you mentioned related to parsing are needed in >building a compiler: grammars, LL, LR, first() function, following() >function, parsing tables, etc. > >All that goes away if you write recursive descent parser. The grammar is >then in the documentation and code comments only, and somewhat expressed >in its code structure. There is no LR, first or follow, no tables. As our moderator notes, recursive descent is an implementation technique for LL(1) grammars. Moreover, you need first-sets to implement them. If your grammar is A -> B|C your recursive-descent parser becomes: if next_symbol in first(B) then parse_B else parse_C fi If you want to learn about conflicts in your grammar early on (rather than finding it out later by having some input be parsed other than intended), you also need follow-sets: if your grammar is A -> B|epsilon there is a conflict if "intersection(first(b),follow(A))" (where "U" signifies the union) is not empty: if one of the symbols in the intersection comes up at the start of an A, do you parse B or not? Of course, if you are freshly designing a programming language, you may be fine with declaring the actual behaviour of the redursive-descent parser to be the intended behaviour. But if you have to implement a given grammar, things become more complex. You can also design the syntax to make recursive-descent parsing easy, like Wirth did. E.g., almost all statements start with a keyword, and there is AFAIK only one statement that may be started by several symbols (none of them conflicting with one of the other statement keywords), so you can easily implement statements like this: if next_symbol=IF_SYM then ifstatement() else if next_symbol=WHILE_SYM then whilestatement() else if ... ... else assignmentstatement() All statements are followed by ";" or END, making it easy to know that there are no conflicts with following statements. These properties also make it easier for humans to understand grammars, so they are a good idea in syntax design. If you go further in that direction, you get to Lisp (which does not need a BNF-based parser) or Forth (where the parser just looks for the next white space). Of course, if you look at textbooks for these languages, you find that you have patterns at a different ("static semantics") level, but at least it's always clear in these language from the syntax which if an else-branch belongs to. >The GNU C++ compiler, undeniably a production compiler and a >major/known/widely-used one, has a big recursive-descent parser >maintained by hand: over a megabyte of code. > >In other words, a major compiler for probably the programming language >with the most complicated syntax ever, eschews pretty much all that we >have learned and accumulated about parsing between around 1968 and now. C++ is the language that inspired Terrence Parr to add synactic and semantic predicates to PCCTS/ANTLR (not sure what has been added since I heard him talk about it in the early 1990s). So basically parser generators like yacc were not sufficient for the problems posed by the C++ syntax; when G++ was started in 1987, no free parser generator had such features, so the decision to go with a hand-written parser is understandable, and of course since then there were two good reasons to stay there: 1) Transition costs. 2) Backwards compatibility. Looking further back, AFAIK Cfront also used a hand-written parser; this (together with the requirement of C compatibility) resulted in a syntax design that was not constrained by the parser generators of the time, but of course also a syntax that was beyond their capabilities. - anton -- M. Anton Ertl anton@mips.complang.tuwien.ac.at http://www.complang.tuwien.ac.at/anton/ [Nope, cfront used a yacc parser with a whole lot of %left and %right declarations to make shift/reduce warnings go away. Legend says that much odd C++ syntax is from poorly understood consequences of those declarations. See it at http://www.softwarepreservation.org/projects/c_plus_plus/index.html#cfront -John]