Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: Christopher F Clark <christopher.f.clark@compiler-resources.com>
Newsgroups: comp.compilers
Subject: Re: What attributes of a programming language simplify its use?
Date: Sat, 3 Dec 2022 23:33:07 +0000
Organization: Compilers Central
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-12-005@comp.compilers>
References: <22-12-001@comp.compilers>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="45908"; mail-complaints-to="abuse@iecc.com"
Keywords: syntax, design
Posted-Date: 03 Dec 2022 19:27:47 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: csiph.com comp.compilers:3252

The discussion on reserved words versus keywords reminds me of
decisions we made while building Yacc++. It is worth noting that we
(both of its developers) worked at Pr1me computer where PL/I dialects
were the key programming language used in build both the OS and the
compilers, so we were likely highly influenced by that.

As a result, Yacc++ has very few reserved words, I'm pretty sure the
number is 3 or less. There is only 1 that I can think of "yy_eof"
which is reserved because it is used in the library in a place where
we have hand-written code that we don't want to modify.(*) And, we
specifically reserve all yy_ words for use in the library, although
most can be used in grammars (and code) without any ill effect. And in
doing so, we feel we haven't taken away any common words from the
users' vocabulary, and we have done so in a way that when the words
have special meaning, it is generally the same meaning as traditional
lex/flex/yacc/bison variants.

However, we do have plenty of context sensitive keywords. But we
structured their usage (as keywords) such that they are easily
disambiguated. Thus, left, right, nonassoc don't have special
characters in them, as opposed to yacc where they are %left et al.
Now, %prec we couldn't make unambiguous, so it retains the required %
spelling.

Still, worth noting to make that a possibility, we had to require that
all productions have a terminating semicolon (;}, rather than
depending upon name colon (:) to identify the start of the rule. That
also gets rid of the lexical hack required to make the grammar LALR(1)
not 2. We could have handled LALR(2) grammars, but in our opinion, it
made the error recovery and messages less obvious. Sometimes
simplicity of implementation makes for a simpler and more regular
language.

But to continue this part of the explanation, words like fast, small,
readable are keywords that describe different ways we layout the
tables and in specific contexts have those meanings. But in those
contexts, normal identifiers cannot appear. And, in any context where
a normal identifier can appear, they are simply identifiers and don't
carry any significance and in the library code where we need them to
have special meaning we use yy_fast et al. So, in the declarations
within a grammar where we need them to have special meanings, we don't
need them to be spelled some "special" way (i.e. you don't say yy_fast
yy_tables, you say "fast tables" and it is perfectly clear, but you
can also use "fast" and "tables" in your grammar as tokens or
non-terminals without worrying that you are using a "reserved" word.
In fact, we do so to describe the grammar of Yacc++ grammars.

Thus, we feel like we have most achieved a similar level of balance as
PL/I had, without creating a write-only language. Yes, you can
probably use Yacc++ to write an extensible language that diverges into
a bunch of unique and incomprehensible variants where no two
programmers are using the same language. We haven't made that
impossible, However, the freedom we have allowed does not inherently
contribute to that nor encourage it. It simply let's people write
things slightly more naturally without a lot of "line noise".

------

Now, given that, I want to dispell the illusion that it makes parsing
harder (beyond a very trivial amount). The "trick" (hack) that lets
the grammar deal with keywords that are not reserved is quite simple
(and we document how to do it for users who are designing their own
languages in our manual) and it should apply to most parser generators
and doesn't rely on any special feature of Yacc++, although we have
some features that make doing so easier.

So, for example you have a list of keywords (tokens) that you want
treated like identifiers, say "if" "then" "else" ala PL/I or "left"
"right" "token" ala Yacc++ and you have a token identifier than you
want to define other identifiers as in:

token identifier, if, then, else, left, right, token;
identifier: "a" .. "z" ("a" .. "z" | "0" .. "9")*;
if: "if";
then: "then";
else: "else";
left: "left";
right: "right";
token: "token";

To get the desired property, you simply define a non-terminal (we'll
call it "ident") that you use to represent identifiers in contexts
where you want the keywords to be allowed, as in:

ident: identifier | if | then | else | left | right | token;

Now, simply use ident where you would have used identifier previously,

rule: ident ":" ident* ";" ;

And, use the keywords where they have their special meaning:

if_stmt: if expression then stmt (else stmt)?;
left_decl: left ident ("," ident)* ";" ;

As long as the uses are unambiguous (and the generator uses) the
prefer shift in shift-reduce conflict method of resolution (or you can
force it to with "precedence" declarations), then the grammar will
work as expected.

If it is ambiguous and you want to disallow certain keywords, simply
introduce other non-terminals, such as

ident_not_if: identifier | then | else | left | right | token;

assignment : ident_not_if "=" expression;  // if keyword not allowed before =

-----

Now, as I said we have features in Yacc++ that make this easier, but
the principle doesn't require our tool. And, there is much more you
can do. This is just one of the relevant grammar hacks.

*) yy_error might also be a reserved word used as part of error recovery.

--
******************************************************************************
Chris Clark                  email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc.  Web Site: http://world.std.com/~compres
23 Bailey Rd                 voice: (508) 435-5016
Berlin, MA  01503 USA      twitter: @intel_chris
------------------------------------------------------------------------------