Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Christopher F Clark Newsgroups: comp.compilers Subject: Re: What attributes of a programming language simplify its use? Date: Sat, 3 Dec 2022 23:33:07 +0000 Organization: Compilers Central Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-12-005@comp.compilers> References: <22-12-001@comp.compilers> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="45908"; mail-complaints-to="abuse@iecc.com" Keywords: syntax, design Posted-Date: 03 Dec 2022 19:27:47 EST X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3252 The discussion on reserved words versus keywords reminds me of decisions we made while building Yacc++. It is worth noting that we (both of its developers) worked at Pr1me computer where PL/I dialects were the key programming language used in build both the OS and the compilers, so we were likely highly influenced by that. As a result, Yacc++ has very few reserved words, I'm pretty sure the number is 3 or less. There is only 1 that I can think of "yy_eof" which is reserved because it is used in the library in a place where we have hand-written code that we don't want to modify.(*) And, we specifically reserve all yy_ words for use in the library, although most can be used in grammars (and code) without any ill effect. And in doing so, we feel we haven't taken away any common words from the users' vocabulary, and we have done so in a way that when the words have special meaning, it is generally the same meaning as traditional lex/flex/yacc/bison variants. However, we do have plenty of context sensitive keywords. But we structured their usage (as keywords) such that they are easily disambiguated. Thus, left, right, nonassoc don't have special characters in them, as opposed to yacc where they are %left et al. Now, %prec we couldn't make unambiguous, so it retains the required % spelling. Still, worth noting to make that a possibility, we had to require that all productions have a terminating semicolon (;}, rather than depending upon name colon (:) to identify the start of the rule. That also gets rid of the lexical hack required to make the grammar LALR(1) not 2. We could have handled LALR(2) grammars, but in our opinion, it made the error recovery and messages less obvious. Sometimes simplicity of implementation makes for a simpler and more regular language. But to continue this part of the explanation, words like fast, small, readable are keywords that describe different ways we layout the tables and in specific contexts have those meanings. But in those contexts, normal identifiers cannot appear. And, in any context where a normal identifier can appear, they are simply identifiers and don't carry any significance and in the library code where we need them to have special meaning we use yy_fast et al. So, in the declarations within a grammar where we need them to have special meanings, we don't need them to be spelled some "special" way (i.e. you don't say yy_fast yy_tables, you say "fast tables" and it is perfectly clear, but you can also use "fast" and "tables" in your grammar as tokens or non-terminals without worrying that you are using a "reserved" word. In fact, we do so to describe the grammar of Yacc++ grammars. Thus, we feel like we have most achieved a similar level of balance as PL/I had, without creating a write-only language. Yes, you can probably use Yacc++ to write an extensible language that diverges into a bunch of unique and incomprehensible variants where no two programmers are using the same language. We haven't made that impossible, However, the freedom we have allowed does not inherently contribute to that nor encourage it. It simply let's people write things slightly more naturally without a lot of "line noise". ------ Now, given that, I want to dispell the illusion that it makes parsing harder (beyond a very trivial amount). The "trick" (hack) that lets the grammar deal with keywords that are not reserved is quite simple (and we document how to do it for users who are designing their own languages in our manual) and it should apply to most parser generators and doesn't rely on any special feature of Yacc++, although we have some features that make doing so easier. So, for example you have a list of keywords (tokens) that you want treated like identifiers, say "if" "then" "else" ala PL/I or "left" "right" "token" ala Yacc++ and you have a token identifier than you want to define other identifiers as in: token identifier, if, then, else, left, right, token; identifier: "a" .. "z" ("a" .. "z" | "0" .. "9")*; if: "if"; then: "then"; else: "else"; left: "left"; right: "right"; token: "token"; To get the desired property, you simply define a non-terminal (we'll call it "ident") that you use to represent identifiers in contexts where you want the keywords to be allowed, as in: ident: identifier | if | then | else | left | right | token; Now, simply use ident where you would have used identifier previously, rule: ident ":" ident* ";" ; And, use the keywords where they have their special meaning: if_stmt: if expression then stmt (else stmt)?; left_decl: left ident ("," ident)* ";" ; As long as the uses are unambiguous (and the generator uses) the prefer shift in shift-reduce conflict method of resolution (or you can force it to with "precedence" declarations), then the grammar will work as expected. If it is ambiguous and you want to disallow certain keywords, simply introduce other non-terminals, such as ident_not_if: identifier | then | else | left | right | token; assignment : ident_not_if "=" expression; // if keyword not allowed before = ----- Now, as I said we have features in Yacc++ that make this easier, but the principle doesn't require our tool. And, there is much more you can do. This is just one of the relevant grammar hacks. *) yy_error might also be a reserved word used as part of error recovery. -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------