Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #3100
| From | Christopher F Clark <christopher.f.clark@compiler-resources.com> |
|---|---|
| Newsgroups | comp.compilers |
| Subject | One lexer rule for keywords and identifiers |
| Date | 2022-06-26 00:09 +0300 |
| Organization | Compilers Central |
| Message-ID | <22-06-080@comp.compilers> (permalink) |
| References | <22-06-075@comp.compilers> |
Two parts to my answer. First, if they are using the Yacc++ my partner and I wrote, they are following that rule. Why? Because we built it into the product. You simply declare keywords like this: keyword if, then, else, begin, end, for, while, do, defun, let; substring keyword subroutine, function, procedure; // these match any unique substrings. sensitive keyword EoF; // only accepts that exact spelling (not eof or EOF) keyword amp "&" // keywords that are not strictly alphanumeric inline keyword ge ".ge.", le ".le.", eq ".eq."; // don't put these in the symbol table but instead make rules for them... ... It also incorporates the C-typedef hack if you want that. It also deals with the PL/I issue of context-sensitive keywords, where the word sometimes is and sometimes is not a keyword. . All of this is built-in. It does that by using the idea from Frank Deremer of separating the scanner from the screener. The keyword processing is done in the screener (using the symbol table), and because it uses the symbol table, the parser can add (or remove) entries from being keywords--although that's not the technique used for PL/I-style context-sensitive ones but is the right approach for C-style typedefs. So, I would say the Dragon book is correct. Not because it makes the tables smaller, but because it says cognitive space. You don't want to be reinventing the wheel for all those variations. You just want to declare what you mean and have the tool do the "right" thing. ----- Now, the reverse side of the coin. ANTLR's lexer may be separate from the parser, but doesn't have the concept of a screener. You cannot get between the lexer and the parser to do "magic". Thus, you have to use explicit keyword tokens and if you want case insensitive ones, use regular expressions (character classes) to specify them, same thing for substring keywords (and it doesn't warn you if your substrings are ambiguous). So, while I normally laud Terence for his work on ANTLR. He gets far more right than wrong. This is a case where he got it wrong. (BTW, JavaCC which borrows heavily from ANTLRs implementation, suffers the same way). Now, being fair, ANTLR4 does have a unique feature in that regard in that if you write a keyword or any token (in quotes) in your grammar, it is processed only when that rule is being recognized. That's part of how ANTLR4 is more PEG-like. That is a different solution to the PL/I keywords as identifiers issue than what we support. And as far as I can tell, it even overrides the longest-match problem. Thus, if you have a place in your grammar where you want only ">" to be recognized and not ">>", simply put ">" in your parser at that location. -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------
Back to comp.compilers | Previous | Next — Previous in thread | Find similar
Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers? Roger L Costello <costello@mitre.org> - 2022-06-25 12:58 +0000
Re: Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers? gah4 <gah4@u.washington.edu> - 2022-06-25 13:01 -0700
Re: Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers? Kaz Kylheku <480-992-1380@kylheku.com> - 2022-06-26 06:17 +0000
Re: Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-06-26 10:20 +0200
One lexer rule for keywords and identifiers Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-06-26 00:09 +0300
csiph-web