Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #2773 > unrolled thread
| Started by | Kaz Kylheku <480-992-1380@kylheku.com> |
|---|---|
| First post | 2021-12-29 18:48 +0000 |
| Last post | 2021-12-30 20:19 -0800 |
| Articles | 14 — 5 participants |
Back to article view | Back to comp.compilers
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? Kaz Kylheku <480-992-1380@kylheku.com> - 2021-12-29 18:48 +0000
Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2021-12-29 16:05 -0800
Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? Kaz Kylheku <480-992-1380@kylheku.com> - 2021-12-30 18:00 +0000
Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? Kaz Kylheku <480-992-1380@kylheku.com> - 2021-12-30 20:08 +0000
Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? gah4 <gah4@u.washington.edu> - 2021-12-29 18:41 -0800
Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? Kaz Kylheku <480-992-1380@kylheku.com> - 2021-12-30 18:14 +0000
Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2021-12-30 13:47 -0800
Re: What does = mean, was Why are ambiguous grammars usually a bad idea? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2021-12-30 17:10 -0800
Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? mac <acolvin@efunct.com> - 2022-01-03 19:51 +0000
Re: for or against equality, was Why are ambiguous grammars usually a bad idea? gah4 <gah4@u.washington.edu> - 2022-01-03 21:07 -0800
Re: for or against equality, was Why are ambiguous grammars usually a bad idea? Thomas Koenig <tkoenig@netcologne.de> - 2022-01-04 19:23 +0000
Re: for or against equality, was Why are ambiguous grammars usually a bad idea? gah4 <gah4@u.washington.edu> - 2022-01-04 13:26 -0800
Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? gah4 <gah4@u.washington.edu> - 2021-12-30 13:40 -0800
Re: why do people choose a language, was Why are ambiguous grammars usually a bad idea? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2021-12-30 20:19 -0800
| From | Kaz Kylheku <480-992-1380@kylheku.com> |
|---|---|
| Date | 2021-12-29 18:48 +0000 |
| Subject | Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? |
| Message-ID | <21-12-017@comp.compilers> |
On 2021-12-16, Roger L Costello <costello@mitre.org> wrote: > Question: Opine about why languages are usually defined and implemented with > ambiguous grammars. But they aren't. Languages are processed as a stream of characters or tokens, with hidden rules about how those relate together and the meaning that emerges. All of the rules are hidden, including the entire grammar. If you're only aware of some of the hidden rules, but not others, then you see ambiguity. But if you're only aware of some of the hidden rules, but not others, then you are not working with the correct language. For instance, I don't know of any mainstream language in which if/else is actually ambiguous. They have a hidden rule like that the else goes with the closest preceding if statement. This is no more or less hidden than the rule which says that the token "if" heads a phrase structure called an if statement. I think what the question is really asking is why computer languages are designed in layers, such as an ambiguous grammar, with rules added to it. That simply has to do with the convenience of specification in relation to the available tooling. If you have a parser generator which lets you write an ambiguous grammar like: E := E + E | E - E | E * E | E / E | E ** E | (E) | id | num and then add precedence/associativity specifications, then it behooves you to take advantage of it, rather than breaking out separate rules like "additive expression", "multiplicative expression", ... When you add those rules, though, you no longer have an ambiguous grammar. There is another effect at play which is that designers are infatuated with complicated grammars that have lots of hidden rules. Thus we have languages whose programs can look ambiguous to someone who isn't an expert in all their rules. Keeping up the full expertise can require regular practice: constantly working with the language. (Use it or lose it). Thus, even though, two languages we may be looking at are formally unambiguous, one may be informally more ambigous than the other, due to being more loaded with hidden rules of syntax that one must internalize to read the code. So we can interpret the question as, why do we have all these languages with baroque syntax which give rise to ambiguity the moment you forget any of it? Languages are designed this way because of the belief that there is a notational advantage in it. If you have some hidden rule which causes some symbols to be related in such and such a way, it means that you have omitted the need for additional symbols which would otherwise indicate that structure. For instance in C, we can deduce from the hidden rules that A << B | C means (A << B) | C which is obvious to someone who has memorized the precedence rules and works with this stuff daily. Yet, we tend to more or less reject the philosphy in our coding standards; we call for disambiguating parentheses. The GNU C compiler won't let you write A && B || C if you have -Wall warnings enabled: you get the "suggest parentheses" warning. (It's a kind of ironic situation: why do we have hidden rules that allow parentheses to be omitted, only to turn around and write tooling and coding standards which asks for them to be put in.) Novice programmers have historically been attracted to cryptic-looking languages. It is one of the main reasons for the success of languages like C and Perl. For novice programmers, syntax is a barrier before semantics, and if you make the barrier sufficiently, though not impossibly high, that creates motivation. Novices feel they are really learning something and getting ahead when all they are doing is absorbing the rules of syntax. Simply being able to work out the syntax of some code example, or write one that has no errors, is an accomplishment. If you give most people a language in which the syntax is easy with few opportunities for informal ambiguity, they will just rush through the syntax and hit the brick wall of semantics: confronting the fact that programming is semantically hard. Of course, because people most often blame external factors for their failings, they will blame the language. Since they are not heavily invested it in, they can easily move on to something else. Maybe they will return to programming later, using a different language, and then pin their better success on that language rather than their own improved maturity. Informally amibiguous languages are needed to create a kind of tar pit to slow down newbies and keep the motivated. Then by the time they hit the real difficulties, thay are too invested in it to quit. "But I know all this syntax after months of learning! How can it be that my program doesn't work? I'm too far along not to stick with it and get it working. Doggone it, I now have a self-image as a programmer to defend!" I also believe there is one more element at play: mathematics. People study mathematics in school, and those who go on to do programming tend to be ones who were more exposed to it or paid more attention. People who are programmers actually had a first contact with formal syntax in mathematics. The conflation between syntax and semantics may ultimately come from that place. Mathematicians design their notations deliberately, in such ways that when they manipulate symbols, while observing certain rules, they are actually preserving semantics. The notation directly enables semantically meaningful manipulation, as a tool of thought. There is a psychological effect at play that a programming language designed with lots of syntactic rules will somehow also serve as a tool of thought, similarly to math notation. It cannot be denied that, to some extent, that plan pans out. Programmers play wiuth the symbols and discover idioms similar to algebraic rules. You look at C code and recognize "Duff's device" similarly to how you might recognize some Lagrangian thing in a math formula. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
[toc] | [next] | [standalone]
| From | Jan Ziak <0xe2.0x9a.0x9b@gmail.com> |
|---|---|
| Date | 2021-12-29 16:05 -0800 |
| Message-ID | <21-12-020@comp.compilers> |
| In reply to | #2773 |
On Wednesday, December 29, 2021 at 11:28:34 PM UTC+1, Kaz Kylheku wrote: > On 2021-12-16, Roger L Costello wrote: > > Question: Opine about why languages are usually defined and implemented with > > ambiguous grammars. > But they aren't. > > Languages are processed as a stream of characters or tokens, with hidden > rules about how those relate together and the meaning that emerges. > All of the rules are hidden, including the entire grammar. > > If you're only aware of some of the hidden rules, but not others, then > you see ambiguity. > > But if you're only aware of some of the hidden rules, but not others, > then you are not working with the correct language. > > For instance, I don't know of any mainstream language in which if/else > is actually ambiguous. They have a hidden rule like that the else goes > with the closest preceding if statement. When designing a grammar and implementing a parser: the grammar can either be unambiguous by design or unambiguous by accident. The viewpoint that "there isn't any mainstream language in which if/else is actually ambiguous" is actually the latter option: unambiguous by accident. A primary reason why grammars in many mainstream languages (that don't have a parser generated straight from a verified grammar) are unambiguous isn't intentional design, but rather it is a consequence of the fact that those parsers are directly implemented in a language that is executing statements/expressions/instructions without verifying consequences of the executions. Some examples of languages with such execution properties: assembly language (such as: i386, ARM), C, Haskell. Contrary to the accidental approach, a parser generator by design cares about consequences and it is verifying that the specification actually is unambiguous despite the fact that in the end the parser gets compiled down into machine instructions. Verification means to search the whole search space (or at least a reasonably large subspace of it) - but asm/C/Haskell will run a search only if it is explicitly (step by step) forced by the programmer to perform a search. -atom
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <480-992-1380@kylheku.com> |
|---|---|
| Date | 2021-12-30 18:00 +0000 |
| Message-ID | <21-12-025@comp.compilers> |
| In reply to | #2775 |
On 2021-12-30, Jan Ziak <0xe2.0x9a.0x9b@gmail.com> wrote: > On Wednesday, December 29, 2021 at 11:28:34 PM UTC+1, Kaz Kylheku wrote: >> On 2021-12-16, Roger L Costello wrote: >> > Question: Opine about why languages are usually defined and implemented with >> > ambiguous grammars. >> But they aren't. >> >> Languages are processed as a stream of characters or tokens, with hidden >> rules about how those relate together and the meaning that emerges. >> All of the rules are hidden, including the entire grammar. >> >> If you're only aware of some of the hidden rules, but not others, then >> you see ambiguity. >> >> But if you're only aware of some of the hidden rules, but not others, >> then you are not working with the correct language. >> >> For instance, I don't know of any mainstream language in which if/else >> is actually ambiguous. They have a hidden rule like that the else goes >> with the closest preceding if statement. > > When designing a grammar and implementing a parser: the grammar can > either be unambiguous by design or unambiguous by accident. The > viewpoint that "there isn't any mainstream language in which if/else > is actually ambiguous" is actually the latter option: unambiguous by > accident. Languages can be ambiguous at the specification level, even if a given implementation behaves unambiguouisly: E.g.: 1. The implementation is completely unambiguous and says that else goes with closest preceding if. 2. The documentation says something else. (So the users figure this out and get their code working based on the implementation.) 3. (But) a new implementation appears, based on the documentation. Or: 1. The language spec doesn't say anything about which way something is parsed. 2. Mutiple implementations do it willy-nilly. Clearly, for instance in C, we have semantic ambiguities like a[i] = i++. (A parser depending on the behavior of something like that could have a grammar ambiguity: something is parsed in two or more different ways based on which way the undefined construct behaves. That would be a defect, of course.) > A primary reason why grammars in many mainstream languages (that don't > have a parser generated straight from a verified grammar) are > unambiguous isn't intentional design, but rather it is a consequence > of the fact that those parsers are directly implemented in a language > that is executing statements/expressions/instructions without > verifying consequences of the executions. Some examples of languages > with such execution properties: assembly language (such as: i386, > ARM), C, Haskell. I don't quite follow this; you seem to be saying that Turing machines are deterministic, and so if we implement a parser as a Turing process, it will be "accidentally" unambiguous because of determinism. However, it may so happen that the lack of ambiguity depends on a whole lot of context. For instance, a Turing process parsing a language, proceeding left to right, could decide the rule for "if/then/else" on a case-by-case basis, influenced by everything it has parsed before. Suppose you have a language which parses a sequence of top-level expressions from a terminal or file, and executes each one before moving to the next. Those expressions could be used to invoke an API in the language implementation to change the treatment of subsequent syntax. Sure, everything is still unambiguous, if we take into account the entire stream of expressions from the beginning and understand its effect on the parsing machine. If you don't do anything of this sort, and just write, say, a recursive descent parser which isn't influenced by any weird state flags (whether purely internal or external too) that change the parsing, and in a safe, very high level language in which there are few risks of nonportable or undefined behaviors, then you end up with a "simply unambiguous" grammar. You should be able to investigate the behavior of if/else with several test cases and be confident that the observations hold in all relevant contexts where that construct can appear. You might not exactly know what the grammar is for the entire language, but if you figure it out from the code and accurately document it, then you're good. (Unless the thing is required to conform to some external specification and fails.) > Contrary to the accidental approach, a parser > generator by design cares about consequences and it is verifying that > the specification actually is unambiguous despite the fact that in the > end the parser gets compiled down into machine instructions. > Verification means to search the whole search space (or at least a > reasonably large subspace of it) - but asm/C/Haskell will run a search > only if it is explicitly (step by step) forced by the programmer to > perform a search. Yes, e.g. a LALR(1) shift-reduce parser generator generates the entire space of LR(0) items that drive the stack machine, and then when it populates tables, it discovers conflicts there. With someone's hand-written parser, we cannot be sure without searching with test cases, which are language instances, which is intractable to do exhaustively. Just because if/else is behaving in certain ways in certain test cases doesn't mean that a new, more complex test case with more/different context cannot be found in which the if/else behaves differently. The procedural/recursive parser code doesn't declare to us what the grammar is. It could be hiding multiple rules for if/else active in different contexts which differ from each other. That's not an actual ambiguity, but for the purposes of working with the language, it's an informal ambiguity (related to my earlier observation of the user not knowing what all hidden rules are). -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal [If you write a hand written recursive descent parser, which was typical before yacc made it easy to use a LALR parser, it was common for the language the parser accepted to be somewhat different from the one the programmer intended to accept due to less than complete checking for unexpected inputs. On the other hand, that C example isn't ambiguous, it's deliberately indeterminate. Early Fortran allowed the compiler to compile any mathematically equivalent, not just numerically equivalent, version of an expression so A*B+A*C could turn into A*(B+C) which was great for optimizing, not so much for predictable results. -John]
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <480-992-1380@kylheku.com> |
|---|---|
| Date | 2021-12-30 20:08 +0000 |
| Message-ID | <21-12-029@comp.compilers> |
| In reply to | #2778 |
On 2021-12-30, Kaz Kylheku <480-992-1380@kylheku.com> wrote:
John> On the other hand, that C example isn't ambiguous, it's deliberately
John> indeterminate.
The C example isn't ambiguous in its parse, but the semantics is
ambiguous. Expressions can be evaluated in multiple orders in C,
and if we follow different orders for that expression, we get different
results (in such a way that it's deemed undefined).
Here is something merely unspecified:
a() + b() + c()
Suppose a prints "a" to stdout, b prints "b" and c prints "c":
int a() { putchar('a'); return 0; }
The behavior is not undefined, but unspecified: we don't know
which of six permutations is printed, but we know it's one of them:
abc, acb, bac, bca, cab, cba.
That's a kind of ambiguity in the language (syntax + semantics).
We know that the parse is
(a() + b()) + c()
But the meaning requires that a, b and c are executed in order
to produce the operands to the + operator, the order in which
those calls take place is not specified.
That's a clear ambiguity.
Just like if I say "pick up the dry cleaning and fill up the gas
tank", there is no grammar ambiguity; yet we don't know whether
the sequencing is required, or whether the gas tank can be
filled first. The request describes multiple possible scenarios,
including going to gas station that does dry-cleaning, and
picking up while an attendant fills the tank.
> indeterminate. Early Fortran allowed the compiler to compile any
> mathematically equivalent, not just numerically equivalent, version
> of an expression so A*B+A*C could turn into A*(B+C) which was great
> for optimizing, not so much for predictable results. -John]
Why wait for early Fortran to arrive? If you write in C, you can
use gcc -ffast-math (an umbrella option for turning on a whole lot
of stuff, among it -fassociative-math and -freciprocal-math).
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2021-12-29 18:41 -0800 |
| Message-ID | <21-12-022@comp.compilers> |
| In reply to | #2773 |
On Wednesday, December 29, 2021 at 2:28:34 PM UTC-8, Kaz Kylheku wrote: (snip) > I also believe there is one more element at play: mathematics. People > study mathematics in school, and those who go on to do programming tend > to be ones who were more exposed to it or paid more attention. This reminds me of learning associativity of exponentiation (**) in Fortran IV (I believe it isn't in the Fortran 66 standard) before I learned it in algebra class. I suspect that there are others I learned from programming before learning them in math class > People who are programmers actually had a first contact with formal > syntax in mathematics. > The conflation between syntax and semantics may ultimately come from > that place. Mathematicians design their notations deliberately, in such > ways that when they manipulate symbols, while observing certain rules, > they are actually preserving semantics. The notation directly enables > semantically meaningful manipulation, as a tool of thought. I suspect that people learn some things in the first programming language that they learn, and then expect it to be the same in others. When it isn't, people get surprised or confused. When I started with unix, I learned csh programming, and mostly avoided sh (and successors). One reason for that is, as well as I knew at the time, differences in the syntax and semantics of them. [Fortran has always had ** exponentiation, starting with the original version in 1956. It always bound tighter than +-*/ but wasn't associative, A**B**C not allowed, -John]
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <480-992-1380@kylheku.com> |
|---|---|
| Date | 2021-12-30 18:14 +0000 |
| Message-ID | <21-12-026@comp.compilers> |
| In reply to | #2776 |
On 2021-12-30, gah4 <gah4@u.washington.edu> wrote:
> On Wednesday, December 29, 2021 at 2:28:34 PM UTC-8, Kaz Kylheku wrote:
>
> (snip)
>
>> I also believe there is one more element at play: mathematics. People
>> study mathematics in school, and those who go on to do programming tend
>> to be ones who were more exposed to it or paid more attention.
>
> This reminds me of learning associativity of exponentiation (**)
> in Fortran IV (I believe it isn't in the Fortran 66 standard) before I
> learned it in algebra class. I suspect that there are others I learned
> from programming before learning them in math class
In Common Lisp, the expt function is strictly binary, so it eliminates
the question of associativity. Some basic arithmetic functions are
n-ary, like (+ a b c d e f), where that is documented (and readily
understood) that in cases where it matters, it is left-to-right
reduction.
In TXR Lisp, I made expt n-ary, so you can write
(expt x y z w)
But! The associativity is right-to-left, making that equivalent to:
(expt x (expt y (expt z w)))
This is for two reasons. One is math: (expt x y z w) defined this
way follows:
w
z
y
x
secondly, it is more useful, becuase the left-to-right interpretation
is:
((( y) z) w)
(((x ) ) )
and that is just
(expt x (* y z w))
which is easy enough to write if that's what you want! It's not much
more verbiage than the (expt x y z w) you may have wanted. Regardless of
the number of operands, it's just an extra set of parentheses and a *
operator.
Whereas if you want (expt x (expt y (expt z w)) and (expt x y z w)
doesn't give it to you, that *is* a lot of verbiage, whose nesting grows
with each additional argument.
The associativity rule that saves the most verbiage is the better one,
even if it is opposite to many other arithmetic functions.
>> People who are programmers actually had a first contact with formal
>> syntax in mathematics.
>
>> The conflation between syntax and semantics may ultimately come from
>> that place. Mathematicians design their notations deliberately, in such
>> ways that when they manipulate symbols, while observing certain rules,
>> they are actually preserving semantics. The notation directly enables
>> semantically meaningful manipulation, as a tool of thought.
>
> I suspect that people learn some things in the first programming language
> that they learn, and then expect it to be the same in others. When it isn't,
> people get surprised or confused.
>
> When I started with unix, I learned csh programming, and mostly
> avoided sh (and successors). One reason for that is, as well as I
> knew at the time, differences in the syntax and semantics of them.
> [Fortran has always had ** exponentiation, starting with the original
> version in 1956. It always bound tighter than +-*/ but wasn't
> associative, A**B**C not allowed, -John]
When I started programming from nothing, I saw BASIC examples in a
book which was doing things like:
10 X = 2
20 X = X + 1
The only language with formulas that I was coming from was math.
(Though I was only in grade 6, I know how to solve systems of linear
equations from school, because I had recently come to Canada from
Slovakia.)
So, I thought, what? How can X be equal to X + 1; you cannot solve
this absurdity!
From then I knew that the people who program computers to understand
symbols are free thinkers who make them mean anything they want.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
[toc] | [prev] | [next] | [standalone]
| From | Jan Ziak <0xe2.0x9a.0x9b@gmail.com> |
|---|---|
| Date | 2021-12-30 13:47 -0800 |
| Message-ID | <21-12-033@comp.compilers> |
| In reply to | #2779 |
On Thursday, December 30, 2021 at 7:56:15 PM UTC+1, Kaz Kylheku wrote: > When I started programming from nothing, I saw BASIC examples in a > book which was doing things like: > > 10 X = 2 > 20 X = X + 1 > > The only language with formulas that I was coming from was math. > > So, I thought, what? How can X be equal to X + 1; you cannot solve > this absurdity! > > From then I knew that the people who program computers to understand > symbols are free thinkers who make them mean anything they want. "X = X + Y" means "X[t+1] = X[t] + Y[t]" where t is time. Time had to be omitted from the notation of the BASIC programming language because otherwise the source code would consume a much larger amount of computer memory and it would complicate GOTO and FOR/NEXT statements. -atom [Interesting take. In reality, of couse, BASIC borrowed that from Fortran. Algol used := for assignment, different from = for equality comparison. -John]
[toc] | [prev] | [next] | [standalone]
| From | Jan Ziak <0xe2.0x9a.0x9b@gmail.com> |
|---|---|
| Date | 2021-12-30 17:10 -0800 |
| Subject | Re: What does = mean, was Why are ambiguous grammars usually a bad idea? |
| Message-ID | <21-12-035@comp.compilers> |
| In reply to | #2782 |
On Friday, December 31, 2021 at 12:45:56 AM UTC+1, Jan Ziak wrote: > On Thursday, December 30, 2021 at 7:56:15 PM UTC+1, Kaz Kylheku wrote: > > When I started programming from nothing, I saw BASIC examples in a > > book which was doing things like: > > > > 10 X = 2 > > 20 X = X + 1 > > > > The only language with formulas that I was coming from was math. > > > > So, I thought, what? How can X be equal to X + 1; you cannot solve > > this absurdity! > > > > From then I knew that the people who program computers to understand > > symbols are free thinkers who make them mean anything they want. > > "X = X + Y" means "X[t+1] = X[t] + Y[t]" where t is time. Time had to be > omitted from the notation of the BASIC programming language because otherwise > the source code would consume a much larger amount of computer memory and it > would complicate GOTO and FOR/NEXT statements. > > -atom > > [Interesting take. In reality, of course, BASIC borrowed that from Fortran. > Algol used := for assignment, different from = for equality comparison. -John] @John: Indeed, BASIC wasn't the 1st programming language. To generalize, I wanted to point out that the notion of time is implicit to almost all programming languages, of course not just BASIC. In my opinion, contrary to the Kaz's opinion, most children who will later become programmers can quite easily understand what "X=X+1" means in a language like BASIC/Python/etc. (Thus, I disagree with the belief that "people who program computers to understand symbols are free thinkers who make them mean anything they want".) -atom
[toc] | [prev] | [next] | [standalone]
| From | mac <acolvin@efunct.com> |
|---|---|
| Date | 2022-01-03 19:51 +0000 |
| Message-ID | <22-01-007@comp.compilers> |
| In reply to | #2782 |
> [Interesting take. In reality, of couse, BASIC borrowed that from Fortran. Algol > used := for assignment, different from = for equality comparison. -John] Indeed. Unfortunately, assignment is probably the single most common operator. The ASCII committee should have kept the left-arrow character instead of replacing it with underscore.
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2022-01-03 21:07 -0800 |
| Subject | Re: for or against equality, was Why are ambiguous grammars usually a bad idea? |
| Message-ID | <22-01-010@comp.compilers> |
| In reply to | #2790 |
On Monday, January 3, 2022 at 11:58:39 AM UTC-8, mac wrote: > > [Interesting take. In reality, of couse, BASIC borrowed that from Fortran. Algol > > used := for assignment, different from = for equality comparison. -John] > Indeed. > Unfortunately, assignment is probably the single most common operator. > The ASCII committee should have kept the left-arrow character instead of > replacing it with underscore. The assignment statement in BASIC, at least the ones I know, has an (optional) LET keyword, so it might say: 10 LET A=3 Most people leave it off, though. Is PL/I the only language that uses = for both assignment and the relational operator? Since expressions are not statements, it avoids the ambiguity that would otherwise occur. I believe some BASIC also use = for both. Underscore is a pretty useful character. The two ASCII characters that don't exist in EBCDIC are ^ and ~. Two EBCDIC characters that don't exist in ASCII are 𝇍 (cent) and ¬ (logical NOT sign). Conversion tables usually cross map those pairs. (PL/I, at least, uses ¬ and ¬= operators.) [In original Dartmouth BASIC the LET was mandatory, but it was a considerably smaller and fully compiled language than the later dialects. On the other hand, PL/I made a fetish of nothing being a reserved word, e.g. IF IF = THEN THEN ELSE = BEGIN; ELSE END = IF; -John]
[toc] | [prev] | [next] | [standalone]
| From | Thomas Koenig <tkoenig@netcologne.de> |
|---|---|
| Date | 2022-01-04 19:23 +0000 |
| Subject | Re: for or against equality, was Why are ambiguous grammars usually a bad idea? |
| Message-ID | <22-01-012@comp.compilers> |
| In reply to | #2792 |
> [In original Dartmouth BASIC the LET was mandatory, but it was a considerably > smaller and fully compiled language than the later dialects. On the other > hand, PL/I made a fetish of nothing being a reserved word, e.g. > > IF IF = THEN THEN ELSE = BEGIN; ELSE END = IF; Fortran shares this property. This may sound slightly odd to people brought up on languages with reserved keywords, but it has a big advantage: You can extend the language with new keywords without making existing programs invalid. There is a cost to this, in several aspects: - More CPU time needed for parsing (important earlier, now it can generally be neglected). - More complexity in the parser. This cost is paid once, and by the compiler developers, not the users. - Similarity to FORTRAN may induce fear and loathing in computer scientists (the last remark is not to be taken too seriously :-) [Fortran barely had tokens since it ignored spaces outside of Hollerith strings. Having written a few F77 parsers, I can say it was possible to tokenize using hints from the parser about what lexical kludge to apply, but it wasn't pleasant. The yacc parser was straightforward other than figuring out when to send which kludge ID to the lexer. It also meant that one character typos could cause large semantic changes, notably DO 5 I = 1,10 is a loop while DO 5 I = 1.10 is an assignment. Legend says we lost a satellite to that one. -John]
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2022-01-04 13:26 -0800 |
| Subject | Re: for or against equality, was Why are ambiguous grammars usually a bad idea? |
| Message-ID | <22-01-014@comp.compilers> |
| In reply to | #2792 |
On Tuesday, January 4, 2022 at 10:15:42 AM UTC-8, gah4 wrote:
(snip, our moderator wrote)
> [In original Dartmouth BASIC the LET was mandatory, but it was a considerably
> smaller and fully compiled language than the later dialects. On the other
> hand, PL/I made a fetish of nothing being a reserved word, e.g.
>
> IF IF = THEN THEN ELSE = BEGIN; ELSE END = IF;
>
> -John]
I never used any close to the original BASIC, but did use, for some time,
the HP TSB2000 version. HP stores programs after tokenizing, so I suspect
that even if you don't put in LET, the tokenizer will add it.
As for PL/I, it borrowed many features from COBOL, but not the use
of reserved words. For one, they wanted people not to have to know the whole
language, and not even the words. Stories are that COBOL programmers always
keep the list of reserved words nearby, to avoid using them.
Counting from a recent IBM web page on their COBOL compiler, there are
over 400 reserved words, many common English words that people might
like to use. Somehow out of 50 years of programming, I have managed
never to even type in and run a COBOL program, and especially not to
write one.
As for Fortran parsing, I do remember that WATFIV reserves the sequence
'FORMAT(' at the beginning of a statement for actual FORMAT statements.
You can't assign to elements of an array named FORMAT. That might not
be so bad, except that Fortran 66, in its run-time format feature, requires the
format data to be in an array. And the obvious name is FORMAT!
[COBOL doesn't have that many reserved words. See https://www.ibm.com/docs/en/i/7.1?topic=list-reserved-words
Re FORMAT statements, WATFOR/FIV punted for some reason. It's not that
hard to tell a format statement from a statement like FORMAT(I5,A4) =
42 but I realize no sane programmer would do that. -John]
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2021-12-30 13:40 -0800 |
| Message-ID | <21-12-031@comp.compilers> |
| In reply to | #2776 |
On Wednesday, December 29, 2021 at 7:28:33 PM UTC-8, gah4 wrote: (snip, I wrote) > This reminds me of learning associativity of exponentiation (**) > in Fortran IV (I believe it isn't in the Fortran 66 standard) before I > learned it in algebra class. I suspect that there are others I learned > from programming before learning them in math class (snip) > [Fortran has always had ** exponentiation, starting with the original > version in 1956. It always bound tighter than +-*/ but wasn't > associative, A**B**C not allowed, -John] It was, at least, in Fortran IV for IBM 360/370: https://atariwiki.org/wiki/attach/Fortran/IBM_FORTRAN_IV-Language_1973.pdf My 8th grade graduation present was the above manual, though maybe one year earlier. I used to read IBM reference manuals like books, from start to finish. By the end of summer, I had run many Fortran programs. As well as I know it, IBM Fortran IV was the input to the 1966 standard, but not all features were included. It might also be that extensions were added later. [I used Fortran H on Princeston's 360/91 in a summer job I had in college in about 1973. -John]
[toc] | [prev] | [next] | [standalone]
| From | Jan Ziak <0xe2.0x9a.0x9b@gmail.com> |
|---|---|
| Date | 2021-12-30 20:19 -0800 |
| Subject | Re: why do people choose a language, was Why are ambiguous grammars usually a bad idea? |
| Message-ID | <21-12-037@comp.compilers> |
| In reply to | #2773 |
On Wednesday, December 29, 2021 at 11:28:34 PM UTC+1, Kaz Kylheku wrote: > On 2021-12-16, Roger L Costello > > Question: Opine about why languages are usually defined and implemented with > > ambiguous grammars. > > Novice programmers have historically been attracted to cryptic-looking > languages. It is one of the main reasons for the success of languages > like C and Perl. > .... I know that what I am about to write does not answer the original question about ambiguous grammars, but I feel I have to respond to the claim that novices are attracted to cryptic-looking languages. If that was true then the brainf**k language would be in the top 10 languages in use today. People new to programming aren't attracted to C because it is cryptic, but because - for example - in the 1990-ties they learned that C was used to implement the game Doom with only a few elements of assembly (https://en.wikipedia.org/wiki/Development_of_Doom#Programming). Doom was implemented in C and wasn't implemented in Lisp/Pascal/Smalltalk - which increases the popularity of C and decreases the popularity of Lisp/Pascal/Smalltalk. Some young programmers were attracted to Smalltalk after the year 2002 because they watched the Squeakers movie (I believe it is this one: https://www.imdb.com/title/tt2172065/). In summary: Novice programmers are attracted to particular programming languages because those languages are popular in their social networks. -atom [Sigh. You're probably right. Historically, novices started with a toy language which left out more advanced but important ideas like data structures and name scope, and gave them an unfortunately blinkered idea of what programming involves. One time when I was a grad student I had to explain to one of the undergrads why you really didn't want to write all your programs in Tiny Basic. -John]
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web