Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Roger L Costello Newsgroups: comp.compilers Subject: Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers? Date: Sat, 25 Jun 2022 12:58:25 +0000 Organization: Compilers Central Lines: 44 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-06-075@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="78285"; mail-complaints-to="abuse@iecc.com" Keywords: history, practice, comment Posted-Date: 25 Jun 2022 12:41:29 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Thread-Topic: Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers? Thread-Index: AdiIkmeY/cQZAc/NR0C+NL7PLIp4Bw== Xref: csiph.com comp.compilers:3096 Hi Folks, Page 101-102 of the dragon book recommends having one lexer rule for both keyword and identifiers (symtab = symbol table): [a-zA-Z][a-zA-Z0-9]* { Action: check symtab for the lexeme. If the symtab says the lexeme is a keyword, return the keyword. Else, if the symtab says the lexeme is an existing identifier, return ID and a pointer to the symtab entry. Else, add the identifier to the symtab and return ID and a pointer to the new symtab entry. } The book says the advantages of this approach are: 1. Easy to add new keywords. Simply initialize the symtab with the new keywords. No changes to the lexer rules. 2. Smaller transition diagram. Instead of several hundred states, there are less than a hundred states in the transition diagram. Those seem like pretty compelling reasons for having one lexer rule for both keywords and identifiers. So why don't compiler (lexer) writers follow that recommendation? Consider: Page 86-90 of the book "flex & bison" shows the lexer for MySQL. The lexer has a rule for every keyword. Pag 228 of "Introduction to Compiling Techniques" show a lexer for VSL (Very Simple Language). Again, the lexer has a rule for every keyword. /Roger [I think the answer to a lot of these questions contains the phrase "64K PDP-11." In the 1970s a thousand state DFA took a lot of memory. Today it's barely a rounding error. In flex&bison the main reason I did it the way I did is that it's a pedagogical example but it also meant the symbol table didn't need special cases for keywords. There is also the PL/I situation where keywords are only keywords in certain contexts so you can say IF IF = GOTO; THEN ELSE = BEGIN; ELSE END = IF; -John]