Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #3096

Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers?

From Roger L Costello <costello@mitre.org>
Newsgroups comp.compilers
Subject Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers?
Date 2022-06-25 12:58 +0000
Organization Compilers Central
Message-ID <22-06-075@comp.compilers> (permalink)

Show all headers | View raw


Hi Folks,

Page 101-102 of the dragon book recommends having one lexer rule for both
keyword and identifiers (symtab = symbol table):

[a-zA-Z][a-zA-Z0-9]*      { Action: check symtab for the lexeme.
			If the symtab says the lexeme
			is a keyword, return the keyword.
			Else, if the symtab says the lexeme
			is an existing identifier, return ID
			and a pointer to the symtab entry.
			Else, add the identifier to the symtab
			and return ID and a pointer to the
			new symtab entry. }

The book says the advantages of this approach are:

1. Easy to add new keywords. Simply initialize the symtab with the new
keywords. No changes to the lexer rules.

2. Smaller transition diagram. Instead of several hundred states, there are
less than a hundred states in the transition diagram.

Those seem like pretty compelling reasons for having one lexer rule for both
keywords and identifiers.

So why don't compiler (lexer) writers follow that recommendation?

Consider:

Page 86-90 of the book "flex & bison" shows the lexer for MySQL. The lexer has
a rule for every keyword.

Pag 228 of "Introduction to Compiling Techniques" show a lexer for VSL (Very
Simple Language). Again, the lexer has a rule for every keyword.

/Roger
[I think the answer to a lot of these questions contains the phrase "64K PDP-11."  In
the 1970s a thousand state DFA took a lot of memory.  Today it's barely a rounding error.
In flex&bison the main reason I did it the way I did is that it's a pedagogical example
but it also meant the symbol table didn't need special cases for keywords.  There is also
the PL/I situation where keywords are only keywords in certain contexts so you can say
 IF IF = GOTO; THEN ELSE = BEGIN; ELSE END = IF;
-John]

Back to comp.compilers | Previous | NextNext in thread | Find similar


Thread

Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers? Roger L Costello <costello@mitre.org> - 2022-06-25 12:58 +0000
  Re: Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers? gah4 <gah4@u.washington.edu> - 2022-06-25 13:01 -0700
    Re: Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers? Kaz Kylheku <480-992-1380@kylheku.com> - 2022-06-26 06:17 +0000
    Re: Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-06-26 10:20 +0200
  One lexer rule for keywords and identifiers Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-06-26 00:09 +0300

csiph-web