Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #2475
| From | Christopher F Clark <christopher.f.clark@compiler-resources.com> |
|---|---|
| Newsgroups | comp.compilers |
| Subject | Context sensitive tokens |
| Date | 2020-03-01 20:14 +0200 |
| Organization | Compilers Central |
| Message-ID | <20-03-004@comp.compilers> (permalink) |
The discussion on tokens that are substrings of other tokens got me thinking about a feature that might help make such tokens easier to specify. I am now looking for a name (keyword) to use to describe these tokens. In particular consider the case of ">>" v. ">" in C++ templates. In expression contexts, you want >> to return the "right shift operator" token, but in template contexts you want it to return each ">" as an "end of template angle bracket" token. You can do this with lexer states. But, the more of these you have, the more lexer states you get and combinatorial explosion sets in. Not desirable, especially if you are creating the lexer states by hand. An alternate solution (that seems nice and simple to me) is to have flags associated with the problematic tokens that you want returned only in some states and not others. Where the lexer queries the parser to determine which tokens are allowed and only returns one from the allowable set. So normally, in Yacc++, one would write: token greater_than : ">"; token right_shift : ">>"; But, since we want the right shift token to be context sensitive. We would instead write. token greater_than : ">"; context sensitive token right_shift : ">>"; Now, before returning a right_shift token, it queries the parser as to whether that is legal in the current parser state. It would be an array of bits indicating which ones were legal that the parser would toggle to indicate whether the token was legal or not. (The parser knows for each state, what tokens are expected, so the bit mask is not hard to generate. And the only reason to do this only for some tokens is to make syntax error discovery easier, by not turning all unexpected tokens into lexical syntax errors.) If not, the lexer would return a different sequence of tokens (e.g. just a greater_than token, since that was the longest match prior to this disallowed match). The actual implementation is a little more subtle than that, but that captures the idea. The main question I have is what keyword(s) I should use to indicate the tokens in question. context sensitive contextual optional expected expectable?!?? expectation sensitive suppressible something else? -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------
Back to comp.compilers | Previous | Next | Find similar
Context sensitive tokens Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2020-03-01 20:14 +0200
csiph-web