Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #2773

Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars?

From Kaz Kylheku <480-992-1380@kylheku.com>
Newsgroups comp.compilers
Subject Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars?
Date 2021-12-29 18:48 +0000
Organization A noiseless patient Spider
Message-ID <21-12-017@comp.compilers> (permalink)
References <21-12-003@comp.compilers>

Show all headers | View raw


On 2021-12-16, Roger L Costello <costello@mitre.org> wrote:
> Question: Opine about why languages are usually defined and implemented with
> ambiguous grammars.

But they aren't.

Languages are processed as a stream of characters or tokens, with hidden
rules about how those relate together and the meaning that emerges.
All of the rules are hidden, including the entire grammar.

If you're only aware of some of the hidden rules, but not others, then
you see ambiguity.

But if you're only aware of some of the hidden rules, but not others,
then you are not working with the correct language.

For instance, I don't know of any mainstream language in which if/else
is actually ambiguous. They have a hidden rule like that the else goes
with the closest preceding if statement.  This is no more or less hidden
than the rule which says that the token "if" heads a phrase structure
called an if statement.

I think what the question is really asking is why computer languages are
designed in layers, such as an ambiguous grammar, with rules added to
it.

That simply has to do with the convenience of specification in relation
to the available tooling.

If you have a parser generator which lets you write an ambiguous grammar
like:

  E := E + E | E - E | E * E | E / E | E ** E | (E) | id | num

and then add precedence/associativity specifications, then it behooves
you to take advantage of it, rather than breaking out separate rules
like "additive expression",  "multiplicative expression", ...

When you add those rules, though, you no longer have an ambiguous
grammar.

There is another effect at play which is that designers are infatuated
with complicated grammars that have lots of hidden rules. Thus we have
languages whose programs can look ambiguous to someone who isn't an
expert in all their rules. Keeping up the full expertise can require
regular practice: constantly working with the language. (Use it or lose
it).

Thus, even though, two languages we may be looking at are formally
unambiguous, one may be informally more ambigous than the other, due to
being more loaded with hidden rules of syntax that one must internalize
to read the code.

So we can interpret the question as, why do we have all these languages
with baroque syntax which give rise to ambiguity the moment you forget
any of it?

Languages are designed this way because of the belief that there is a
notational advantage in it. If you have some hidden rule which causes
some symbols to be related in such and such a way, it means that you
have omitted the need for additional symbols which would otherwise
indicate that structure. For instance in C, we can deduce from the
hidden rules that A << B | C means (A << B) | C which is obvious to
someone who has memorized the precedence rules and works with this
stuff daily. Yet, we tend to more or less reject the philosphy in our
coding standards; we call for disambiguating parentheses. The GNU C
compiler won't let you write  A && B || C if you have -Wall warnings
enabled: you get the "suggest parentheses" warning.

(It's a kind of ironic situation: why do we have hidden rules that allow
parentheses to be omitted, only to turn around and write tooling and
coding standards which asks for them to be put in.)

Novice programmers have historically been attracted to cryptic-looking
languages. It is one of the main reasons for the success of languages
like C and Perl.

For novice programmers, syntax is a barrier before semantics, and
if you make the barrier sufficiently, though not impossibly high, that
creates motivation.  Novices feel they are really learning something and
getting ahead when all they are doing is absorbing the rules of syntax.
Simply being able to work out the syntax of some code example, or write
one that has no errors, is an accomplishment.

If you give most people a language in which the syntax is easy with few
opportunities for informal ambiguity, they will just rush through the
syntax and hit the brick wall of semantics: confronting the fact that
programming is semantically hard. Of course, because people most often
blame external factors for their failings, they will blame the language.
Since they are not heavily invested it in, they can easily move on to
something else. Maybe they will return to programming later, using
a different language, and then pin their better success on that language
rather than their own improved maturity.

Informally amibiguous languages are needed to create a kind of tar pit
to slow down newbies and keep the motivated. Then by the time they
hit the real difficulties, thay are too invested in it to quit.

"But I know all this syntax after months of learning! How can it be that
my program doesn't work? I'm too far along not to stick with it and get
it working. Doggone it, I now have a self-image as a programmer to
defend!"

I also believe there is one more element at play: mathematics. People
study mathematics in school, and those who go on to do programming tend
to be ones who were more exposed to it or paid more attention.

People who are programmers actually had a first contact with formal
syntax in mathematics.

The conflation between syntax and semantics may ultimately come from
that place.  Mathematicians design their notations deliberately, in such
ways that when they manipulate symbols, while observing certain rules,
they are actually preserving semantics. The notation directly enables
semantically meaningful manipulation, as a tool of thought.

There is a psychological effect at play that a programming language
designed with lots of syntactic rules will somehow also serve as a tool
of thought, similarly to math notation.  It cannot be denied that, to
some extent, that plan pans out. Programmers play wiuth the symbols and
discover idioms similar to algebraic rules.  You look at C code and
recognize "Duff's device" similarly to how you might recognize some
Lagrangian thing in a math formula.


--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Back to comp.compilers | Previous | NextNext in thread | Find similar


Thread

Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? Kaz Kylheku <480-992-1380@kylheku.com> - 2021-12-29 18:48 +0000
  Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2021-12-29 16:05 -0800
    Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? Kaz Kylheku <480-992-1380@kylheku.com> - 2021-12-30 18:00 +0000
      Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? Kaz Kylheku <480-992-1380@kylheku.com> - 2021-12-30 20:08 +0000
  Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? gah4 <gah4@u.washington.edu> - 2021-12-29 18:41 -0800
    Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? Kaz Kylheku <480-992-1380@kylheku.com> - 2021-12-30 18:14 +0000
      Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2021-12-30 13:47 -0800
        Re: What does = mean, was Why are ambiguous grammars usually a bad idea? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2021-12-30 17:10 -0800
        Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? mac <acolvin@efunct.com> - 2022-01-03 19:51 +0000
          Re: for or against equality, was Why are ambiguous grammars usually a bad idea? gah4 <gah4@u.washington.edu> - 2022-01-03 21:07 -0800
            Re: for or against equality, was Why are ambiguous grammars usually a bad idea? Thomas Koenig <tkoenig@netcologne.de> - 2022-01-04 19:23 +0000
            Re: for or against equality, was Why are ambiguous grammars usually a bad idea? gah4 <gah4@u.washington.edu> - 2022-01-04 13:26 -0800
    Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? gah4 <gah4@u.washington.edu> - 2021-12-30 13:40 -0800
  Re: why do people choose a language, was Why are ambiguous grammars usually a bad idea? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2021-12-30 20:19 -0800

csiph-web