Context sensitive tokens

From	Christopher F Clark <christopher.f.clark@compiler-resources.com>
Newsgroups	comp.compilers
Subject	Context sensitive tokens
Date	2020-03-01 20:14 +0200
Organization	Compilers Central
Message-ID	<20-03-004@comp.compilers> (permalink)

Show all headers | View raw

The discussion on tokens that are substrings of other tokens got me
thinking about a feature that might help make such tokens easier to
specify.  I am now looking for a name (keyword) to use to describe
these tokens.

In particular consider the case of ">>" v. ">" in C++ templates.  In
expression contexts, you want >> to return the "right shift operator"
token, but in template contexts you want it to return each ">" as an
"end of template angle bracket" token.  You can do this with lexer
states.  But, the more of these you have, the more lexer states you
get and combinatorial explosion sets in.  Not desirable, especially if
you are creating the lexer states by hand.

An alternate solution (that seems nice and simple to me) is to have
flags associated with the problematic tokens that you want returned
only in some states and not others.  Where the lexer queries the
parser to determine which tokens are allowed and only returns one from
the allowable set.

So normally, in Yacc++, one would write:

token greater_than : ">";
token right_shift : ">>";

But, since we want the right shift token to be context sensitive.   We
would instead write.

token greater_than : ">";
context sensitive token right_shift : ">>";

Now, before returning a right_shift token, it queries the parser as to
whether that is legal in the current parser state.  It would be an
array of bits indicating which ones were legal that the parser would
toggle to indicate whether the token was legal or not.  (The parser
knows for each state, what tokens are expected, so the bit mask is not
hard to generate.  And the only reason to do this only for some tokens
is to make syntax error discovery easier, by not turning all
unexpected tokens into lexical syntax errors.)  If not, the lexer
would return a different sequence of tokens (e.g. just a greater_than
token, since that was the longest match prior to this disallowed
match).  The actual implementation is a little more subtle than that,
but that captures the idea.

The main question I have is what keyword(s) I should use to indicate
the tokens in question.

context sensitive
contextual
optional
expected
expectable?!??
expectation sensitive
suppressible

something else?

--
******************************************************************************
Chris Clark                  email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc.  Web Site: http://world.std.com/~compres
23 Bailey Rd                 voice: (508) 435-5016
Berlin, MA  01503 USA      twitter: @intel_chris
------------------------------------------------------------------------------

Back to comp.compilers | Previous | Next | Find similar

Thread

Context sensitive tokens Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2020-03-01 20:14 +0200

csiph-web