Groups > comp.compilers > #2773 > unrolled thread

Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars?

Started by	Kaz Kylheku <480-992-1380@kylheku.com>
First post	2021-12-29 18:48 +0000
Last post	2021-12-30 20:19 -0800
Articles	14 — 5 participants

Back to article view | Back to comp.compilers

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

#2773 — Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars?

From	Kaz Kylheku <480-992-1380@kylheku.com>
Date	2021-12-29 18:48 +0000
Subject	Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars?
Message-ID	<21-12-017@comp.compilers>

On 2021-12-16, Roger L Costello <costello@mitre.org> wrote:
> Question: Opine about why languages are usually defined and implemented with
> ambiguous grammars.

But they aren't.

Languages are processed as a stream of characters or tokens, with hidden
rules about how those relate together and the meaning that emerges.
All of the rules are hidden, including the entire grammar.

If you're only aware of some of the hidden rules, but not others, then
you see ambiguity.

But if you're only aware of some of the hidden rules, but not others,
then you are not working with the correct language.

For instance, I don't know of any mainstream language in which if/else
is actually ambiguous. They have a hidden rule like that the else goes
with the closest preceding if statement.  This is no more or less hidden
than the rule which says that the token "if" heads a phrase structure
called an if statement.

I think what the question is really asking is why computer languages are
designed in layers, such as an ambiguous grammar, with rules added to
it.

That simply has to do with the convenience of specification in relation
to the available tooling.

If you have a parser generator which lets you write an ambiguous grammar
like:

  E := E + E | E - E | E * E | E / E | E ** E | (E) | id | num

and then add precedence/associativity specifications, then it behooves
you to take advantage of it, rather than breaking out separate rules
like "additive expression",  "multiplicative expression", ...

When you add those rules, though, you no longer have an ambiguous
grammar.

There is another effect at play which is that designers are infatuated
with complicated grammars that have lots of hidden rules. Thus we have
languages whose programs can look ambiguous to someone who isn't an
expert in all their rules. Keeping up the full expertise can require
regular practice: constantly working with the language. (Use it or lose
it).

Thus, even though, two languages we may be looking at are formally
unambiguous, one may be informally more ambigous than the other, due to
being more loaded with hidden rules of syntax that one must internalize
to read the code.

So we can interpret the question as, why do we have all these languages
with baroque syntax which give rise to ambiguity the moment you forget
any of it?

Languages are designed this way because of the belief that there is a
notational advantage in it. If you have some hidden rule which causes
some symbols to be related in such and such a way, it means that you
have omitted the need for additional symbols which would otherwise
indicate that structure. For instance in C, we can deduce from the
hidden rules that A << B | C means (A << B) | C which is obvious to
someone who has memorized the precedence rules and works with this
stuff daily. Yet, we tend to more or less reject the philosphy in our
coding standards; we call for disambiguating parentheses. The GNU C
compiler won't let you write  A && B || C if you have -Wall warnings
enabled: you get the "suggest parentheses" warning.

(It's a kind of ironic situation: why do we have hidden rules that allow
parentheses to be omitted, only to turn around and write tooling and
coding standards which asks for them to be put in.)

Novice programmers have historically been attracted to cryptic-looking
languages. It is one of the main reasons for the success of languages
like C and Perl.

For novice programmers, syntax is a barrier before semantics, and
if you make the barrier sufficiently, though not impossibly high, that
creates motivation.  Novices feel they are really learning something and
getting ahead when all they are doing is absorbing the rules of syntax.
Simply being able to work out the syntax of some code example, or write
one that has no errors, is an accomplishment.

If you give most people a language in which the syntax is easy with few
opportunities for informal ambiguity, they will just rush through the
syntax and hit the brick wall of semantics: confronting the fact that
programming is semantically hard. Of course, because people most often
blame external factors for their failings, they will blame the language.
Since they are not heavily invested it in, they can easily move on to
something else. Maybe they will return to programming later, using
a different language, and then pin their better success on that language
rather than their own improved maturity.

Informally amibiguous languages are needed to create a kind of tar pit
to slow down newbies and keep the motivated. Then by the time they
hit the real difficulties, thay are too invested in it to quit.

"But I know all this syntax after months of learning! How can it be that
my program doesn't work? I'm too far along not to stick with it and get
it working. Doggone it, I now have a self-image as a programmer to
defend!"

I also believe there is one more element at play: mathematics. People
study mathematics in school, and those who go on to do programming tend
to be ones who were more exposed to it or paid more attention.

People who are programmers actually had a first contact with formal
syntax in mathematics.

The conflation between syntax and semantics may ultimately come from
that place.  Mathematicians design their notations deliberately, in such
ways that when they manipulate symbols, while observing certain rules,
they are actually preserving semantics. The notation directly enables
semantically meaningful manipulation, as a tool of thought.

There is a psychological effect at play that a programming language
designed with lots of syntactic rules will somehow also serve as a tool
of thought, similarly to math notation.  It cannot be denied that, to
some extent, that plan pans out. Programmers play wiuth the symbols and
discover idioms similar to algebraic rules.  You look at C code and
recognize "Duff's device" similarly to how you might recognize some
Lagrangian thing in a math formula.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

[toc] | [next] | [standalone]

#2775

From	Jan Ziak <0xe2.0x9a.0x9b@gmail.com>
Date	2021-12-29 16:05 -0800
Message-ID	<21-12-020@comp.compilers>
In reply to	#2773

On Wednesday, December 29, 2021 at 11:28:34 PM UTC+1, Kaz Kylheku wrote:
> On 2021-12-16, Roger L Costello wrote:
> > Question: Opine about why languages are usually defined and implemented with
> > ambiguous grammars.
> But they aren't.
>
> Languages are processed as a stream of characters or tokens, with hidden
> rules about how those relate together and the meaning that emerges.
> All of the rules are hidden, including the entire grammar.
>
> If you're only aware of some of the hidden rules, but not others, then
> you see ambiguity.
>
> But if you're only aware of some of the hidden rules, but not others,
> then you are not working with the correct language.
>
> For instance, I don't know of any mainstream language in which if/else
> is actually ambiguous. They have a hidden rule like that the else goes
> with the closest preceding if statement.

When designing a grammar and implementing a parser: the grammar can
either be unambiguous by design or unambiguous by accident. The
viewpoint that "there isn't any mainstream language in which if/else
is actually ambiguous" is actually the latter option: unambiguous by
accident.

A primary reason why grammars in many mainstream languages (that don't
have a parser generated straight from a verified grammar) are
unambiguous isn't intentional design, but rather it is a consequence
of the fact that those parsers are directly implemented in a language
that is executing statements/expressions/instructions without
verifying consequences of the executions. Some examples of languages
with such execution properties: assembly language (such as: i386,
ARM), C, Haskell. Contrary to the accidental approach, a parser
generator by design cares about consequences and it is verifying that
the specification actually is unambiguous despite the fact that in the
end the parser gets compiled down into machine instructions.
Verification means to search the whole search space (or at least a
reasonably large subspace of it) - but asm/C/Haskell will run a search
only if it is explicitly (step by step) forced by the programmer to
perform a search.

-atom

[toc] | [prev] | [next] | [standalone]

#2778

From	Kaz Kylheku <480-992-1380@kylheku.com>
Date	2021-12-30 18:00 +0000
Message-ID	<21-12-025@comp.compilers>
In reply to	#2775

On 2021-12-30, Jan Ziak <0xe2.0x9a.0x9b@gmail.com> wrote:
> On Wednesday, December 29, 2021 at 11:28:34 PM UTC+1, Kaz Kylheku wrote:
>> On 2021-12-16, Roger L Costello wrote:
>> > Question: Opine about why languages are usually defined and implemented with
>> > ambiguous grammars.
>> But they aren't.
>>
>> Languages are processed as a stream of characters or tokens, with hidden
>> rules about how those relate together and the meaning that emerges.
>> All of the rules are hidden, including the entire grammar.
>>
>> If you're only aware of some of the hidden rules, but not others, then
>> you see ambiguity.
>>
>> But if you're only aware of some of the hidden rules, but not others,
>> then you are not working with the correct language.
>>
>> For instance, I don't know of any mainstream language in which if/else
>> is actually ambiguous. They have a hidden rule like that the else goes
>> with the closest preceding if statement.
>
> When designing a grammar and implementing a parser: the grammar can
> either be unambiguous by design or unambiguous by accident. The
> viewpoint that "there isn't any mainstream language in which if/else
> is actually ambiguous" is actually the latter option: unambiguous by
> accident.

Languages can be ambiguous at the specification level, even if a
given implementation behaves unambiguouisly:

E.g.:

1. The implementation is completely unambiguous and says that else goes
   with closest preceding if.
2. The documentation says something else. (So the users figure this out
   and get their code working based on the implementation.)
3. (But) a new implementation appears, based on the documentation.

Or:

1. The language spec doesn't say anything about which way something
   is parsed.
2. Mutiple implementations do it willy-nilly.

Clearly, for instance in C, we have semantic ambiguities like
a[i] = i++. (A parser depending on the behavior of something like that
could have a grammar ambiguity: something is parsed in two or more
different ways based on which way the undefined construct behaves. That
would be a defect, of course.)

> A primary reason why grammars in many mainstream languages (that don't
> have a parser generated straight from a verified grammar) are
> unambiguous isn't intentional design, but rather it is a consequence
> of the fact that those parsers are directly implemented in a language
> that is executing statements/expressions/instructions without
> verifying consequences of the executions. Some examples of languages
> with such execution properties: assembly language (such as: i386,
> ARM), C, Haskell.

I don't quite follow this; you seem to be saying that Turing machines
are deterministic, and so if we implement a parser as a Turing process,
it will be "accidentally" unambiguous because of determinism.

However, it may so happen that the lack of ambiguity depends on a whole
lot of context. For instance, a Turing process parsing a language,
proceeding left to right, could decide the rule for "if/then/else" on
a case-by-case basis, influenced by everything it has parsed before.

Suppose you have a language which parses a sequence of top-level
expressions from a terminal or file, and executes each one before moving
to the next. Those expressions could be used to invoke an API in the
language implementation to change the treatment of subsequent syntax.

Sure, everything is still unambiguous, if we take into account the
entire stream of expressions from the beginning and understand its
effect on the parsing machine.

If you don't do anything of this sort, and just write, say, a recursive
descent parser which isn't influenced by any weird state flags (whether
purely internal or external too) that change the parsing, and in a safe,
very high level language in which there are few risks of nonportable or
undefined behaviors, then you end up with a "simply unambiguous"
grammar. You should be able to investigate the behavior of if/else
with several test cases and be confident that the observations hold
in all relevant contexts where that construct can appear.
You might not exactly know what the grammar is for the entire language,
but if you figure it out from the code and accurately document it,
then you're good. (Unless the thing is required to conform to some
external specification and fails.)

> Contrary to the accidental approach, a parser
> generator by design cares about consequences and it is verifying that
> the specification actually is unambiguous despite the fact that in the
> end the parser gets compiled down into machine instructions.
> Verification means to search the whole search space (or at least a
> reasonably large subspace of it) - but asm/C/Haskell will run a search
> only if it is explicitly (step by step) forced by the programmer to
> perform a search.

Yes, e.g. a LALR(1)  shift-reduce parser generator generates the entire
space of LR(0) items that drive the stack machine, and then when it
populates tables, it discovers conflicts there.

With someone's hand-written parser, we cannot be sure without
searching with test cases, which are language instances, which is
intractable to do exhaustively.

Just because if/else is behaving in certain ways in certain test cases
doesn't mean that a new, more complex test case with more/different
context cannot be found in which the if/else behaves differently.

The procedural/recursive parser code doesn't declare to us what the
grammar is. It could be hiding multiple rules for if/else active in
different contexts which differ from each other.

That's not an actual ambiguity, but for the purposes of working with the
language, it's an informal ambiguity (related to my earlier observation
of the user not knowing what all hidden rules are).

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
[If you write a hand written recursive descent parser, which was typical
before yacc made it easy to use a LALR parser, it was common
for the language the parser accepted to be somewhat different from the
one the programmer intended to accept due to less than complete checking
for unexpected inputs.
On the other hand, that C example isn't ambiguous, it's deliberately
indeterminate.  Early Fortran allowed the compiler to compile any
mathematically equivalent, not just numerically equivalent, version
of an expression so A*B+A*C could turn into A*(B+C) which was great
for optimizing, not so much for predictable results. -John]

[toc] | [prev] | [next] | [standalone]

#2780

From	Kaz Kylheku <480-992-1380@kylheku.com>
Date	2021-12-30 20:08 +0000
Message-ID	<21-12-029@comp.compilers>
In reply to	#2778

On 2021-12-30, Kaz Kylheku <480-992-1380@kylheku.com> wrote:
John> On the other hand, that C example isn't ambiguous, it's deliberately
John> indeterminate.

The C example isn't ambiguous in its parse, but the semantics is
ambiguous. Expressions can be evaluated in multiple orders in C,
and if we follow different orders for that expression, we get different
results (in such a way that it's deemed undefined).

Here is something merely unspecified:

  a() + b() + c()

Suppose a prints "a" to stdout, b prints "b" and c prints "c":

  int a() { putchar('a'); return 0; }

The behavior is not undefined, but unspecified: we don't know
which of six permutations is printed, but we know it's one of them:
abc, acb, bac, bca, cab, cba.

That's a kind of ambiguity in the language (syntax + semantics).

We know that the parse is

  (a() + b()) + c()

But the meaning requires that a, b and c are executed in order
to produce the operands to the + operator, the order in which
those calls take place is not specified.

That's a clear ambiguity.

Just like if I say "pick up the dry cleaning and fill up the gas
tank", there is no grammar ambiguity; yet we don't know whether
the sequencing is required, or whether the gas tank can be
filled first. The request describes multiple possible scenarios,
including going to gas station that does dry-cleaning, and
picking up while an attendant fills the tank.

> indeterminate.  Early Fortran allowed the compiler to compile any
> mathematically equivalent, not just numerically equivalent, version
> of an expression so A*B+A*C could turn into A*(B+C) which was great
> for optimizing, not so much for predictable results. -John]

Why wait for early Fortran to arrive? If you write in C, you can
use gcc -ffast-math (an umbrella option for turning on a whole lot
of stuff, among it -fassociative-math and -freciprocal-math).

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

[toc] | [prev] | [next] | [standalone]

#2776

From	gah4 <gah4@u.washington.edu>
Date	2021-12-29 18:41 -0800
Message-ID	<21-12-022@comp.compilers>
In reply to	#2773

On Wednesday, December 29, 2021 at 2:28:34 PM UTC-8, Kaz Kylheku wrote:

(snip)

> I also believe there is one more element at play: mathematics. People
> study mathematics in school, and those who go on to do programming tend
> to be ones who were more exposed to it or paid more attention.

This reminds me of learning associativity of exponentiation (**)
in Fortran IV (I believe it isn't in the Fortran 66 standard) before I
learned it in algebra class.  I suspect that there are others I learned
from programming before learning them in math class

> People who are programmers actually had a first contact with formal
> syntax in mathematics.

> The conflation between syntax and semantics may ultimately come from
> that place. Mathematicians design their notations deliberately, in such
> ways that when they manipulate symbols, while observing certain rules,
> they are actually preserving semantics. The notation directly enables
> semantically meaningful manipulation, as a tool of thought.

I suspect that people learn some things in the first programming language
that they learn, and then expect it to be the same in others. When it isn't,
people get surprised or confused.

When I started with unix, I learned csh programming, and mostly
avoided sh (and successors).  One reason for that is, as well as I
knew at the time, differences in the syntax and semantics of them.
[Fortran has always had ** exponentiation, starting with the original
version in 1956. It always bound tighter than +-*/ but wasn't
associative, A**B**C not allowed, -John]

[toc] | [prev] | [next] | [standalone]

#2779

From	Kaz Kylheku <480-992-1380@kylheku.com>
Date	2021-12-30 18:14 +0000
Message-ID	<21-12-026@comp.compilers>
In reply to	#2776

On 2021-12-30, gah4 <gah4@u.washington.edu> wrote:
> On Wednesday, December 29, 2021 at 2:28:34 PM UTC-8, Kaz Kylheku wrote:
>
> (snip)
>
>> I also believe there is one more element at play: mathematics. People
>> study mathematics in school, and those who go on to do programming tend
>> to be ones who were more exposed to it or paid more attention.
>
> This reminds me of learning associativity of exponentiation (**)
> in Fortran IV (I believe it isn't in the Fortran 66 standard) before I
> learned it in algebra class.  I suspect that there are others I learned
> from programming before learning them in math class

In Common Lisp, the expt function is strictly binary, so it eliminates
the question of associativity. Some basic arithmetic functions are
n-ary, like (+ a b c d e f), where that is documented (and readily
understood) that in cases where it matters, it is left-to-right
reduction.

In TXR Lisp, I made expt n-ary, so you can write

  (expt x y z w)

But! The associativity is right-to-left, making that equivalent to:

  (expt x (expt y (expt z w)))

This is for two reasons. One is math: (expt x y z w) defined this
way follows:

     w
    z
   y
  x

secondly, it is more useful, becuase the left-to-right interpretation
is:

 (((  y) z) w)
 (((x  )  )  )

and that is just

 (expt x (* y z w))

which is easy enough to write if that's what you want! It's not much
more verbiage than the (expt x y z w) you may have wanted. Regardless of
the number of operands, it's just an extra set of parentheses and a *
operator.

Whereas if you want (expt x (expt y (expt z w)) and (expt x y z w)
doesn't give it to you, that *is* a lot of verbiage, whose nesting grows
with each additional argument.

The associativity rule that saves the most verbiage is the better one,
even if it is opposite to many other arithmetic functions.

>> People who are programmers actually had a first contact with formal
>> syntax in mathematics.
>
>> The conflation between syntax and semantics may ultimately come from
>> that place. Mathematicians design their notations deliberately, in such
>> ways that when they manipulate symbols, while observing certain rules,
>> they are actually preserving semantics. The notation directly enables
>> semantically meaningful manipulation, as a tool of thought.
>
> I suspect that people learn some things in the first programming language
> that they learn, and then expect it to be the same in others. When it isn't,
> people get surprised or confused.
>
> When I started with unix, I learned csh programming, and mostly
> avoided sh (and successors).  One reason for that is, as well as I
> knew at the time, differences in the syntax and semantics of them.
> [Fortran has always had ** exponentiation, starting with the original
> version in 1956. It always bound tighter than +-*/ but wasn't
> associative, A**B**C not allowed, -John]

When I started programming from nothing, I saw BASIC examples in a
book which was doing things like:

  10 X = 2
  20 X = X + 1

The only language with formulas that I was coming from was math.
(Though I was only in grade 6, I know how to solve systems of linear
equations from school, because I had recently come to Canada from
Slovakia.)

So, I thought, what? How can X be equal to X + 1; you cannot solve
this absurdity!

From then I knew that the people who program computers to understand
symbols are free thinkers who make them mean anything they want.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

[toc] | [prev] | [next] | [standalone]

#2782

From	Jan Ziak <0xe2.0x9a.0x9b@gmail.com>
Date	2021-12-30 13:47 -0800
Message-ID	<21-12-033@comp.compilers>
In reply to	#2779

On Thursday, December 30, 2021 at 7:56:15 PM UTC+1, Kaz Kylheku wrote:
> When I started programming from nothing, I saw BASIC examples in a
> book which was doing things like:
>
> 10 X = 2
> 20 X = X + 1
>
> The only language with formulas that I was coming from was math.
>
> So, I thought, what? How can X be equal to X + 1; you cannot solve
> this absurdity!
>
> From then I knew that the people who program computers to understand
> symbols are free thinkers who make them mean anything they want.

"X = X + Y" means "X[t+1] = X[t] + Y[t]" where t is time. Time had to be
omitted from the notation of the BASIC programming language because otherwise
the source code would consume a much larger amount of computer memory and it
would complicate GOTO and FOR/NEXT statements.

-atom
[Interesting take.  In reality, of couse, BASIC borrowed that from Fortran.  Algol
used := for assignment, different from = for equality comparison. -John]

[toc] | [prev] | [next] | [standalone]

#2783 — Re: What does = mean, was Why are ambiguous grammars usually a bad idea?

From	Jan Ziak <0xe2.0x9a.0x9b@gmail.com>
Date	2021-12-30 17:10 -0800
Subject	Re: What does = mean, was Why are ambiguous grammars usually a bad idea?
Message-ID	<21-12-035@comp.compilers>
In reply to	#2782

On Friday, December 31, 2021 at 12:45:56 AM UTC+1, Jan Ziak wrote:
> On Thursday, December 30, 2021 at 7:56:15 PM UTC+1, Kaz Kylheku wrote:
> > When I started programming from nothing, I saw BASIC examples in a
> > book which was doing things like:
> >
> > 10 X = 2
> > 20 X = X + 1
> >
> > The only language with formulas that I was coming from was math.
> >
> > So, I thought, what? How can X be equal to X + 1; you cannot solve
> > this absurdity!
> >
> > From then I knew that the people who program computers to understand
> > symbols are free thinkers who make them mean anything they want.
>
> "X = X + Y" means "X[t+1] = X[t] + Y[t]" where t is time. Time had to be
> omitted from the notation of the BASIC programming language because otherwise
> the source code would consume a much larger amount of computer memory and it
> would complicate GOTO and FOR/NEXT statements.
>
> -atom
>
> [Interesting take. In reality, of course, BASIC borrowed that from Fortran.
> Algol used := for assignment, different from = for equality comparison. -John]

@John: Indeed, BASIC wasn't the 1st programming language. To generalize, I
wanted to point out that the notion of time is implicit to almost all
programming languages, of course not just BASIC. In my opinion, contrary to
the Kaz's opinion, most children who will later become programmers can quite
easily understand what "X=X+1" means in a language like BASIC/Python/etc.
(Thus, I disagree with the belief that "people who program computers to
understand symbols are free thinkers who make them mean anything they want".)

-atom

[toc] | [prev] | [next] | [standalone]

#2790

From	mac <acolvin@efunct.com>
Date	2022-01-03 19:51 +0000
Message-ID	<22-01-007@comp.compilers>
In reply to	#2782

> [Interesting take.  In reality, of couse, BASIC borrowed that from Fortran.  Algol
> used := for assignment, different from = for equality comparison. -John]

Indeed.
Unfortunately, assignment is probably the single most common operator.
The ASCII committee should have kept the left-arrow character instead of
replacing it with underscore.

[toc] | [prev] | [next] | [standalone]

#2792 — Re: for or against equality, was Why are ambiguous grammars usually a bad idea?

From	gah4 <gah4@u.washington.edu>
Date	2022-01-03 21:07 -0800
Subject	Re: for or against equality, was Why are ambiguous grammars usually a bad idea?
Message-ID	<22-01-010@comp.compilers>
In reply to	#2790

On Monday, January 3, 2022 at 11:58:39 AM UTC-8, mac wrote:
> > [Interesting take. In reality, of couse, BASIC borrowed that from Fortran. Algol
> > used := for assignment, different from = for equality comparison. -John]

> Indeed.
> Unfortunately, assignment is probably the single most common operator.
> The ASCII committee should have kept the left-arrow character instead of
> replacing it with underscore.

The assignment statement in BASIC, at least the ones I know, has an
(optional) LET keyword, so it might say:

10  LET A=3

Most people leave it off, though.

Is PL/I the only language that uses = for both assignment and the relational operator?
Since expressions are not statements, it avoids the ambiguity that would otherwise occur.
I believe some BASIC also use = for both.

Underscore is a pretty useful character.

The two ASCII characters that don't exist in EBCDIC are ^ and ~.
Two EBCDIC characters that don't exist in ASCII are 𝇍 (cent)
and ¬ (logical NOT sign).  Conversion tables usually cross
map those pairs.  (PL/I, at least, uses ¬ and ¬= operators.)

[In original Dartmouth BASIC the LET was mandatory, but it was a considerably
smaller and fully compiled language than the later dialects.  On the other
hand, PL/I made a fetish of nothing being a reserved word, e.g.

  IF IF = THEN THEN ELSE = BEGIN; ELSE END = IF;

-John]

[toc] | [prev] | [next] | [standalone]

#2793 — Re: for or against equality, was Why are ambiguous grammars usually a bad idea?

From	Thomas Koenig <tkoenig@netcologne.de>
Date	2022-01-04 19:23 +0000
Subject	Re: for or against equality, was Why are ambiguous grammars usually a bad idea?
Message-ID	<22-01-012@comp.compilers>
In reply to	#2792

> [In original Dartmouth BASIC the LET was mandatory, but it was a considerably
> smaller and fully compiled language than the later dialects.  On the other
> hand, PL/I made a fetish of nothing being a reserved word, e.g.
>
>   IF IF = THEN THEN ELSE = BEGIN; ELSE END = IF;

Fortran shares this property.

This may sound slightly odd to people brought up on languages with
reserved keywords, but it has a big advantage:  You can extend the
language with new keywords without making existing programs invalid.

There is a cost to this, in several aspects:

- More CPU time needed for parsing (important earlier, now it
  can generally be neglected).

- More complexity in the parser.  This cost is paid once, and
  by the compiler developers, not the users.

- Similarity to FORTRAN may induce fear and loathing in computer
  scientists (the last remark is not to be taken too seriously :-)

[Fortran barely had tokens since it ignored spaces outside of Hollerith
strings.  Having written a few F77 parsers, I can say it was possible
to tokenize using hints from the parser about what lexical kludge to
apply, but it wasn't pleasant. The yacc parser was straightforward
other than figuring out when to send which kludge ID to the lexer.

It also meant that one character typos could cause large semantic
changes, notably DO 5 I = 1,10 is a loop while DO 5 I = 1.10
is an assignment. Legend says we lost a satellite to that one.

-John]

[toc] | [prev] | [next] | [standalone]

#2794 — Re: for or against equality, was Why are ambiguous grammars usually a bad idea?

From	gah4 <gah4@u.washington.edu>
Date	2022-01-04 13:26 -0800
Subject	Re: for or against equality, was Why are ambiguous grammars usually a bad idea?
Message-ID	<22-01-014@comp.compilers>
In reply to	#2792

On Tuesday, January 4, 2022 at 10:15:42 AM UTC-8, gah4 wrote:

(snip, our moderator wrote)

> [In original Dartmouth BASIC the LET was mandatory, but it was a considerably
> smaller and fully compiled language than the later dialects. On the other
> hand, PL/I made a fetish of nothing being a reserved word, e.g.
>
> IF IF = THEN THEN ELSE = BEGIN; ELSE END = IF;
>
> -John]

I never used any close to the original BASIC, but did use, for some time,
the HP TSB2000 version.  HP stores programs after tokenizing, so I suspect
that even if you don't put in LET, the tokenizer will add it.

As for PL/I, it borrowed many features from COBOL, but not the use
of reserved words.  For one, they wanted people not to have to know the whole
language, and not even the words.  Stories are that COBOL programmers always
keep the list of reserved words nearby, to avoid using them.

Counting from a recent IBM web page on their COBOL compiler, there are
over 400 reserved words, many common English words that people might
like to use.  Somehow out of 50 years of programming, I have managed
never to even type in and run a COBOL program, and especially not to
write one.

As for Fortran parsing, I do remember that WATFIV reserves the sequence
'FORMAT(' at the beginning of a statement for actual FORMAT statements.
You can't assign to elements of an array named FORMAT.  That might not
be so bad, except that Fortran 66, in its run-time format feature, requires the
format data to be in an array.  And the obvious name is FORMAT!
[COBOL doesn't have that many reserved words.  See https://www.ibm.com/docs/en/i/7.1?topic=list-reserved-words
Re FORMAT statements, WATFOR/FIV punted for some reason. It's not that
hard to tell a format statement from a statement like FORMAT(I5,A4) =
42 but I realize no sane programmer would do that. -John]

[toc] | [prev] | [next] | [standalone]

#2781

From	gah4 <gah4@u.washington.edu>
Date	2021-12-30 13:40 -0800
Message-ID	<21-12-031@comp.compilers>
In reply to	#2776

On Wednesday, December 29, 2021 at 7:28:33 PM UTC-8, gah4 wrote:

(snip, I wrote)

> This reminds me of learning associativity of exponentiation (**)
> in Fortran IV (I believe it isn't in the Fortran 66 standard) before I
> learned it in algebra class. I suspect that there are others I learned
> from programming before learning them in math class

(snip)

> [Fortran has always had ** exponentiation, starting with the original
> version in 1956. It always bound tighter than +-*/ but wasn't
> associative, A**B**C not allowed, -John]

It was, at least, in Fortran IV for IBM 360/370:

https://atariwiki.org/wiki/attach/Fortran/IBM_FORTRAN_IV-Language_1973.pdf

My 8th grade graduation present was the above manual, though maybe
one year earlier.   I used to read IBM reference manuals like books,
from start to finish.  By the end of summer, I had run many Fortran programs.

As well as I know it, IBM Fortran IV was the input to the 1966 standard,
but not all features were included.  It might also be that extensions were
added later.
[I used Fortran H on Princeston's 360/91 in a summer job I had in
college in about 1973. -John]

[toc] | [prev] | [next] | [standalone]

#2784 — Re: why do people choose a language, was Why are ambiguous grammars usually a bad idea?

From	Jan Ziak <0xe2.0x9a.0x9b@gmail.com>
Date	2021-12-30 20:19 -0800
Subject	Re: why do people choose a language, was Why are ambiguous grammars usually a bad idea?
Message-ID	<21-12-037@comp.compilers>
In reply to	#2773

On Wednesday, December 29, 2021 at 11:28:34 PM UTC+1, Kaz Kylheku wrote:
> On 2021-12-16, Roger L Costello
> > Question: Opine about why languages are usually defined and implemented with
> > ambiguous grammars.
>
> Novice programmers have historically been attracted to cryptic-looking
> languages. It is one of the main reasons for the success of languages
> like C and Perl.
> ....

I know that what I am about to write does not answer the original question
about ambiguous grammars, but I feel I have to respond to the claim that
novices are attracted to cryptic-looking languages. If that was true then the
brainf**k language would be in the top 10 languages in use today.

People new to programming aren't attracted to C because it is cryptic, but
because - for example - in the 1990-ties they learned that C was used to
implement the game Doom with only a few elements of assembly
(https://en.wikipedia.org/wiki/Development_of_Doom#Programming). Doom was
implemented in C and wasn't implemented in Lisp/Pascal/Smalltalk - which
increases the popularity of C and decreases the popularity of
Lisp/Pascal/Smalltalk.

Some young programmers were attracted to Smalltalk after the year 2002 because
they watched the Squeakers movie (I believe it is this one:
https://www.imdb.com/title/tt2172065/).

In summary: Novice programmers are attracted to particular programming
languages because those languages are popular in their social networks.

-atom
[Sigh. You're probably right. Historically, novices started with a toy
language which left out more advanced but important ideas like data
structures and name scope, and gave them an unfortunately blinkered
idea of what programming involves. One time when I was a grad student
I had to explain to one of the undergrads why you really didn't want
to write all your programs in Tiny Basic. -John]

[toc] | [prev] | [standalone]

csiph-web

Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars?

Contents

#2773 — Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars?

#2775

#2778

#2780

#2776

#2779

#2782

#2783 — Re: What does = mean, was Why are ambiguous grammars usually a bad idea?

#2790

#2792 — Re: for or against equality, was Why are ambiguous grammars usually a bad idea?

#2793 — Re: for or against equality, was Why are ambiguous grammars usually a bad idea?

#2794 — Re: for or against equality, was Why are ambiguous grammars usually a bad idea?

#2781

#2784 — Re: why do people choose a language, was Why are ambiguous grammars usually a bad idea?