Groups > comp.compilers > #2769 > unrolled thread

Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars?

Started by	gah4 <gah4@u.washington.edu>
First post	2021-12-27 04:18 -0800
Last post	2021-12-28 16:18 +0200
Articles	3 — 2 participants

Back to article view | Back to comp.compilers

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? gah4 <gah4@u.washington.edu> - 2021-12-27 04:18 -0800
    Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? gah4 <gah4@u.washington.edu> - 2021-12-27 19:45 -0800
    Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2021-12-28 16:18 +0200

#2769 — Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars?

From	gah4 <gah4@u.washington.edu>
Date	2021-12-27 04:18 -0800
Subject	Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars?
Message-ID	<21-12-008@comp.compilers>

On Sunday, December 26, 2021 at 7:19:08 PM UTC-8, Roger L Costello wrote:

> I am reading the (excellent) book "flex & bison" by John Levine. Chapter 7
> talks about conflicts in grammars: shift/reduce and reduce/reduce conflicts.

> At the end of the chapter are a few exercises/questions. I'd like to check
> with you on whether my answers to the questions are accurate and complete.

As far as I know, the main cause of ambiguous grammar in programming languages
is the nested if-then-optional-else structure.  If you require else, then it isn't ambiguous,
but people like the optional else.  That usually comes out as a shift-reduce conflict,
and parser generators know how to handle that.

Otherwise, the usual regular expression has an ambiguity which is often cured
by taking the longest of the possible matches.  It seems to me that more often
I want the shorter match, though.

But you already have the reply from John Levine ...
[See previous message, where we fixed that with "fi" in 1968. -John]

[toc] | [next] | [standalone]

#2770

From	gah4 <gah4@u.washington.edu>
Date	2021-12-27 19:45 -0800
Message-ID	<21-12-011@comp.compilers>
In reply to	#2769

(snip about conflicts in grammars)
(then I wrote)
> As far as I know, the main cause of ambiguous grammar in programming languages
> is the nested if-then-optional-else structure. If you require else, then it isn't ambiguous,
> but people like the optional else. That usually comes out as a shift-reduce conflict,
> and parser generators know how to handle that.

(snip)

> But you already have the reply from John Levine ...
> [See previous message, where we fixed that with "fi" in 1968. -John]

I first learned if-then-else from PL/I, where it was designed in
about 1963.    I now suspect that people get used to the one they
learned first, and find it more obvious.

There is also some language, and I forget now which, with an ELIF
construct, such that you can make a nested if-then-else sequence
without the increase of nesting level, and so need only one ENDIF.

IF e THEN s; ELIF e THEN s; ELIF e THEN s; ENDIF;

(No comment on my preference for ENDIF vs. FI.)

[That would also be Algol 68, using "fi"
It didn't need the semicolons because they are separators,
not terminators as in PL/I and C.
If you've written scripts in the Bourne shell or its descendants
such as bash and zsh, it deliberately looks a lot like Algol 68.
A request from your moderator: if anyone is planning another round
in the semicolon terminator vs. separator war, please don't.
-John]

[toc] | [prev] | [next] | [standalone]

#2771

From	Christopher F Clark <christopher.f.clark@compiler-resources.com>
Date	2021-12-28 16:18 +0200
Message-ID	<21-12-013@comp.compilers>
In reply to	#2769

You already have lots of relevant commentary on your answer.

However, I want to offer some counterpoint.

The 1973 paper by Aho, Johnson, and Ullman: Deterministic Parsing of
Ambiguous Grammars,
gives the background for how this is handled in yacc.

https://www.researchgate.net/publication/234791287_Deterministic_parsing_of_ambiguous_grammars

Thus, the use of the various directives is a long-known solution to
the issue. However, LR conflicts frighten people. Their resolution
hides the fact that the grammar is ambiguous and at least two parses
are possible. But, with practice, one can get good at using them
judiciously and knowing when they are simply expressing the desired
intent or hiding a potential source of confusion.

More recently, GLR parsing has deferred the resolution and builds a
parse forest when the grammar is truly ambiguous and not the result of
the limitations of the LR parsing methodology. When that happens, one
can use semantics to determine which parse was correct or declare a
relevant error.

By the way, an LR conflict doesn't guarantee that the grammar is
actually ambiguous, just that resolving whether it is or not is beyond
the capacity of the LR algorithm.

So, while it is good to know when your grammar is ambiguous (or even
just potentially ambiguous) is useful in knowing whether your grammar
describes the language you think it describes, it is not always the
worst thing.

Ambiguous grammars can have valuable properties. As you noted, writing
an unambiguous grammar can be quite complicated (and in some cases not
even possible), and the comparable ambiguous grammar can be quite
concise, simple, and easy to read. And, putting the disambiguation
outside the grammar can be a more maintainable solution.

This is particularly, true if you want a language with user defined
operators or ones where the user might want to adjust the precedence
or associativity of the existing operators. Using numbers to indicate
precedence is a well-known technique. It maps well onto things people
understand. And the ability to compare the precedence of two operators
based upon an associated number is easy to implement (while strictly
outside the capacity of most parser generators).

Kind regards,
Chris

--
******************************************************************************
Chris Clark email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------

[toc] | [prev] | [standalone]

csiph-web

Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars?

Contents

#2769 — Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars?

#2770

#2771