Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #2778
| Path | csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end |
|---|---|
| From | Kaz Kylheku <480-992-1380@kylheku.com> |
| Newsgroups | comp.compilers |
| Subject | Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? |
| Date | Thu, 30 Dec 2021 18:00:47 -0000 (UTC) |
| Organization | A noiseless patient Spider |
| Lines | 132 |
| Sender | news@iecc.com |
| Approved | comp.compilers@iecc.com |
| Message-ID | <21-12-025@comp.compilers> (permalink) |
| References | <21-12-003@comp.compilers> <21-12-017@comp.compilers> <21-12-020@comp.compilers> |
| Injection-Info | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="55875"; mail-complaints-to="abuse@iecc.com" |
| Keywords | parse, design, comment |
| Posted-Date | 30 Dec 2021 13:55:01 EST |
| X-submission-address | compilers@iecc.com |
| X-moderator-address | compilers-request@iecc.com |
| X-FAQ-and-archives | http://compilers.iecc.com |
| Xref | csiph.com comp.compilers:2778 |
Show key headers only | View raw
On 2021-12-30, Jan Ziak <0xe2.0x9a.0x9b@gmail.com> wrote: > On Wednesday, December 29, 2021 at 11:28:34 PM UTC+1, Kaz Kylheku wrote: >> On 2021-12-16, Roger L Costello wrote: >> > Question: Opine about why languages are usually defined and implemented with >> > ambiguous grammars. >> But they aren't. >> >> Languages are processed as a stream of characters or tokens, with hidden >> rules about how those relate together and the meaning that emerges. >> All of the rules are hidden, including the entire grammar. >> >> If you're only aware of some of the hidden rules, but not others, then >> you see ambiguity. >> >> But if you're only aware of some of the hidden rules, but not others, >> then you are not working with the correct language. >> >> For instance, I don't know of any mainstream language in which if/else >> is actually ambiguous. They have a hidden rule like that the else goes >> with the closest preceding if statement. > > When designing a grammar and implementing a parser: the grammar can > either be unambiguous by design or unambiguous by accident. The > viewpoint that "there isn't any mainstream language in which if/else > is actually ambiguous" is actually the latter option: unambiguous by > accident. Languages can be ambiguous at the specification level, even if a given implementation behaves unambiguouisly: E.g.: 1. The implementation is completely unambiguous and says that else goes with closest preceding if. 2. The documentation says something else. (So the users figure this out and get their code working based on the implementation.) 3. (But) a new implementation appears, based on the documentation. Or: 1. The language spec doesn't say anything about which way something is parsed. 2. Mutiple implementations do it willy-nilly. Clearly, for instance in C, we have semantic ambiguities like a[i] = i++. (A parser depending on the behavior of something like that could have a grammar ambiguity: something is parsed in two or more different ways based on which way the undefined construct behaves. That would be a defect, of course.) > A primary reason why grammars in many mainstream languages (that don't > have a parser generated straight from a verified grammar) are > unambiguous isn't intentional design, but rather it is a consequence > of the fact that those parsers are directly implemented in a language > that is executing statements/expressions/instructions without > verifying consequences of the executions. Some examples of languages > with such execution properties: assembly language (such as: i386, > ARM), C, Haskell. I don't quite follow this; you seem to be saying that Turing machines are deterministic, and so if we implement a parser as a Turing process, it will be "accidentally" unambiguous because of determinism. However, it may so happen that the lack of ambiguity depends on a whole lot of context. For instance, a Turing process parsing a language, proceeding left to right, could decide the rule for "if/then/else" on a case-by-case basis, influenced by everything it has parsed before. Suppose you have a language which parses a sequence of top-level expressions from a terminal or file, and executes each one before moving to the next. Those expressions could be used to invoke an API in the language implementation to change the treatment of subsequent syntax. Sure, everything is still unambiguous, if we take into account the entire stream of expressions from the beginning and understand its effect on the parsing machine. If you don't do anything of this sort, and just write, say, a recursive descent parser which isn't influenced by any weird state flags (whether purely internal or external too) that change the parsing, and in a safe, very high level language in which there are few risks of nonportable or undefined behaviors, then you end up with a "simply unambiguous" grammar. You should be able to investigate the behavior of if/else with several test cases and be confident that the observations hold in all relevant contexts where that construct can appear. You might not exactly know what the grammar is for the entire language, but if you figure it out from the code and accurately document it, then you're good. (Unless the thing is required to conform to some external specification and fails.) > Contrary to the accidental approach, a parser > generator by design cares about consequences and it is verifying that > the specification actually is unambiguous despite the fact that in the > end the parser gets compiled down into machine instructions. > Verification means to search the whole search space (or at least a > reasonably large subspace of it) - but asm/C/Haskell will run a search > only if it is explicitly (step by step) forced by the programmer to > perform a search. Yes, e.g. a LALR(1) shift-reduce parser generator generates the entire space of LR(0) items that drive the stack machine, and then when it populates tables, it discovers conflicts there. With someone's hand-written parser, we cannot be sure without searching with test cases, which are language instances, which is intractable to do exhaustively. Just because if/else is behaving in certain ways in certain test cases doesn't mean that a new, more complex test case with more/different context cannot be found in which the if/else behaves differently. The procedural/recursive parser code doesn't declare to us what the grammar is. It could be hiding multiple rules for if/else active in different contexts which differ from each other. That's not an actual ambiguity, but for the purposes of working with the language, it's an informal ambiguity (related to my earlier observation of the user not knowing what all hidden rules are). -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal [If you write a hand written recursive descent parser, which was typical before yacc made it easy to use a LALR parser, it was common for the language the parser accepted to be somewhat different from the one the programmer intended to accept due to less than complete checking for unexpected inputs. On the other hand, that C example isn't ambiguous, it's deliberately indeterminate. Early Fortran allowed the compiler to compile any mathematically equivalent, not just numerically equivalent, version of an expression so A*B+A*C could turn into A*(B+C) which was great for optimizing, not so much for predictable results. -John]
Back to comp.compilers | Previous | Next — Previous in thread | Next in thread | Find similar
Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? Kaz Kylheku <480-992-1380@kylheku.com> - 2021-12-29 18:48 +0000
Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2021-12-29 16:05 -0800
Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? Kaz Kylheku <480-992-1380@kylheku.com> - 2021-12-30 18:00 +0000
Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? Kaz Kylheku <480-992-1380@kylheku.com> - 2021-12-30 20:08 +0000
Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? gah4 <gah4@u.washington.edu> - 2021-12-29 18:41 -0800
Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? Kaz Kylheku <480-992-1380@kylheku.com> - 2021-12-30 18:14 +0000
Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2021-12-30 13:47 -0800
Re: What does = mean, was Why are ambiguous grammars usually a bad idea? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2021-12-30 17:10 -0800
Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? mac <acolvin@efunct.com> - 2022-01-03 19:51 +0000
Re: for or against equality, was Why are ambiguous grammars usually a bad idea? gah4 <gah4@u.washington.edu> - 2022-01-03 21:07 -0800
Re: for or against equality, was Why are ambiguous grammars usually a bad idea? Thomas Koenig <tkoenig@netcologne.de> - 2022-01-04 19:23 +0000
Re: for or against equality, was Why are ambiguous grammars usually a bad idea? gah4 <gah4@u.washington.edu> - 2022-01-04 13:26 -0800
Re: Why are ambiguous grammars usually a bad idea? Why are languages usually defined and implemented with ambiguous grammars? gah4 <gah4@u.washington.edu> - 2021-12-30 13:40 -0800
Re: why do people choose a language, was Why are ambiguous grammars usually a bad idea? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2021-12-30 20:19 -0800
csiph-web