Groups > comp.compilers > #2849 > unrolled thread

Re: Are compiler developers light-years ahead of other software development?

Started by	Christopher F Clark <christopher.f.clark@compiler-resources.com>
First post	2022-01-20 21:11 +0200
Last post	2022-01-22 09:34 +0000
Articles	4 — 3 participants

Back to article view | Back to comp.compilers

  Re: Are compiler developers light-years ahead of other software development? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-01-20 21:11 +0200
    Re: Are compiler developers light-years ahead of other software development? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-01-21 17:38 +0000
      Re: Are compiler developers light-years ahead of other software development? gah4 <gah4@u.washington.edu> - 2022-01-21 17:40 -0800
        Re: Are compiler developers light-years ahead of other software development? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2022-01-22 09:34 +0000

#2849 — Re: Are compiler developers light-years ahead of other software development?

From	Christopher F Clark <christopher.f.clark@compiler-resources.com>
Date	2022-01-20 21:11 +0200
Subject	Re: Are compiler developers light-years ahead of other software development?
Message-ID	<22-01-077@comp.compilers>

Sadly, I would like to offer an alternate perspective on how far
compilers are than other areas of software development.

Yes, LL and LR parser generators have been around for decades. My
compiler class in the late 1970s had me implement both of them.
However, shortly after that parser technology was considered basically
a settled problem, while in actuality many parser generators are hard
to use and people don't understand it when they output state machines
and conflict messages freak compiler writers out, etc.

As a result, despite all the technology we could be bringing to bare
on compilers, as far as I can tell, more than half the compilers in
actual use are done via hand-written recursive descent with little
hacks here and there to get them to work. The prevalence of "ad hack"
compilers should be embarrassing to anyone who wants to talk about
software "engineering".

There is good theory we could put to use building better compilers and
more reliable languages. However, most of it is languishing and going
to waste.

At least that's what this compiler developer sees when he looks at the
landscape. But maybe I am biased in that perspective....

******************************************************************************
Chris Clark                  email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc.  Web Site: http://world.std.com/~compres
23 Bailey Rd                 voice: (508) 435-5016
Berlin, MA  01503 USA      twitter: @intel_chris
------------------------------------------------------------------------------
[There's a definite "too busy chopping down trees to sharpen our axes" vibe
here. -John]

[toc] | [next] | [standalone]

#2850

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2022-01-21 17:38 +0000
Message-ID	<22-01-079@comp.compilers>
In reply to	#2849

Christopher F Clark <christopher.f.clark@compiler-resources.com> writes:
>Yes, LL and LR parser generators have been around for decades. My
>compiler class in the late 1970s had me implement both of them.
>However, shortly after that parser technology was considered basically
>a settled problem, while in actuality many parser generators are hard
>to use and people don't understand it when they output state machines
>and conflict messages freak compiler writers out, etc.

This points to one reason why some people don't use these generators.
You need to know quite a bit about the implementation technique behind
it to understand what (for LR-based parser generators) a shift-reduce
or reduce-reduce conflict means and how to fix that problem.

>As a result, despite all the technology we could be bringing to bare
>on compilers, as far as I can tell, more than half the compilers in
>actual use are done via hand-written recursive descent with little
>hacks here and there to get them to work.

The "little hacks" point to another reason: How to deal with cases
where LL or LR technology, or (probably more often) context-free
grammars don't quite fit, e.g., when you want to do a compatible
extension of an existing language, like C++ (which was intended to be
compatible with C).  My impression is that Terrence Parr attacked this
problem head-on by formalizing these hacks (to some extent) with
syntactic and semantic predicates in PCCTS/ANTLR.

There is also the problem of interfacing between the parser and the
semantic analysis and back end of the compiler.  The yacc interface is
not that great, but for some reason improvements such as ox have not
taken over the world (maybe because ox is based on attribute grammars
which is quite a leap for imperative programmers).

Finally, a reason is that some people don't want to depend on software
written by others.  E.g., Wirth apparently considered it simpler to
write a recursive-descent parser by hand than to write a parser
generator and use that for implementing a parser.  And he even wrote a
compiler text book that teaches this approach.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

[toc] | [prev] | [next] | [standalone]

#2852

From	gah4 <gah4@u.washington.edu>
Date	2022-01-21 17:40 -0800
Message-ID	<22-01-082@comp.compilers>
In reply to	#2850

On Friday, January 21, 2022 at 2:30:42 PM UTC-8, Anton Ertl wrote:

(snip)

> This points to one reason why some people don't use these generators.
> You need to know quite a bit about the implementation technique behind
> it to understand what (for LR-based parser generators) a shift-reduce
> or reduce-reduce conflict means and how to fix that problem.

Interesting reason, as I know exactly how I learned about that.
(And not from compiler class, where I might have.)

In compiling a program, not actually changing it, the instructions warn
that it will get a shift-reduce conflict warning, and to ignore it.

There is a whole separate thread on ambiguous languages, which is
where these come from. In a language with an if-then-optional else
construct, and which can be nested, there is an ambiguity in which
if an else belongs to.  In most such languages it goes to the nearest
(deepest nesting) if, and this is the default from the warning.

reduce-reduce results from an actual mistake in the language,
and does need to be fixed.

I believe the program was Kermit, but I am not so sure now the
details.  I believe that it has an implementation language that
translates to C, using a lex/yacc parser.  The Makefile does
it all for you, but you have to know to ignore the message.

[toc] | [prev] | [next] | [standalone]

#2854

From	anton@mips.complang.tuwien.ac.at (Anton Ertl)
Date	2022-01-22 09:34 +0000
Message-ID	<22-01-086@comp.compilers>
In reply to	#2852

gah4 <gah4@u.washington.edu> writes:
>On Friday, January 21, 2022 at 2:30:42 PM UTC-8, Anton Ertl wrote:
>> You need to know quite a bit about the implementation technique behind
>> it to understand what (for LR-based parser generators) a shift-reduce
>> or reduce-reduce conflict means and how to fix that problem.
>
>Interesting reason, as I know exactly how I learned about that.
>(And not from compiler class, where I might have.)
>
>In compiling a program, not actually changing it, the instructions warn
>that it will get a shift-reduce conflict warning, and to ignore it.

That means that the program's authors have (hopefully) determined that
the default resolution (shift) for the shift/reduce conflicts in that
program was what they intended.  It does not tell you that shift
resulution is the intended behaviour for all cases (otherwise there
would be no point in reporting the conflict).

>There is a whole separate thread on ambiguous languages, which is
>where these come from. In a language with an if-then-optional else
>construct, and which can be nested, there is an ambiguity in which
>if an else belongs to.  In most such languages it goes to the nearest
>(deepest nesting) if, and this is the default from the warning.

That's not the only occurence of shift/reduce conflicts, but it's a
relatively prominent one, because many compiler writers leave it to
the parser generator to resolve it.  However, when developing a
grammar, I have seen cases (even for relatively small languages) with
many conflicts (both shift/reduce and reduce/reduce).  I always change
the grammar to eliminate all the conflicts.  Often the syntax is
changed by these changes, which is outside the scope of pure compiler
writers, but inside the scope of language designers.

Concerning the Algol 60 case, the syntax is unambiguous (from
<https://www.masswerk.at/algol60/report.htm>):

|<if clause> ::= if <Boolean expression> then
|<unconditional statement> ::= <basic statement> |
|        <compound statement> | <block>
|<if statement> ::= <if clause> <unconditional statement>
|<conditional statement> ::= <if statement> |
|        <if statement> else <statement> |
|        <if clause> <for statement> |
|        <label>: <conditional statement>

Statements such as

if B1 then if B2 then S1 else S1

and even

if B1 then if B2 then S1

are syntax errors.  You cannot put an if-statement or a conditional
statement in a then-branch without wrapping it in a begin...end.

Languages like Pascal and C (and its descendants) changed this to
allow either, making the grammar ambiguous, and using prose to resolve
the ambiguity.  E.g., for Pascal:

|IfStatement = "if" BooleanExpression "then" Statement [ "else" Statement ].
|
|Note: The syntactic ambiguity arising from the construct
|
|if e1 then if e2 then sl else s2
|
|is resolved by interpreting the construct as equivalent to
|
|if e1 then begin if e2 then sl else s2 end

I consider either grammar to be flawed; Algol fixed the flaw in Algol
68, Wirth fixed it in Modula, but the C-syntax descendants have
unfortunately never fixed it.

<https://en.wikipedia.org/wiki/Dangling_else#Avoiding_the_conflict_in_LR_parsers>
describes how to express the intent in the grammar rather than in
prose, but looking at the example grammars makes it clear why compiler
writers prefer to rely on the parser-generator resolution of the
shift/reduce conflict rather than expressing the intent in the
grammar.

Yacc has precedence and associativity declarations to provide
additional information to the parser generator on how to resolve
ambiguity in the BNF (and for suppressing the warnings about conflicts
that are resolved by these declarations).  Something like this might
be a good solution for dealing with the dangling else if you cannot
avoid it by fixing the syntax.

>reduce-reduce results from an actual mistake in the language,
>and does need to be fixed.

There can be cases where the default behaviour is the intended
behaviour, too, but IIRC the default depends on the order of rules, so
could be destroyed by a seemingly innocuous change to the parser
generator input.  And there are (probably many) cases where either
reduction is intended in some cases, so the generator default is wrong
for some cases for all rule orders.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[The original sin of dangling else may have been COBOL.  A 1964 IBM
manual for 7080 COBOL says ELSE matches the closest IF. PL/I inherited
it shortly aftrwards.  -John]

[toc] | [prev] | [standalone]

csiph-web

Re: Are compiler developers light-years ahead of other software development?

Contents

#2849 — Re: Are compiler developers light-years ahead of other software development?

#2850

#2852

#2854