Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #3096

Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers?

Path csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From Roger L Costello <costello@mitre.org>
Newsgroups comp.compilers
Subject Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers?
Date Sat, 25 Jun 2022 12:58:25 +0000
Organization Compilers Central
Lines 44
Sender news@iecc.com
Approved comp.compilers@iecc.com
Message-ID <22-06-075@comp.compilers> (permalink)
Mime-Version 1.0
Content-Type text/plain; charset="us-ascii"
Content-Transfer-Encoding 8bit
Injection-Info gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="78285"; mail-complaints-to="abuse@iecc.com"
Keywords history, practice, comment
Posted-Date 25 Jun 2022 12:41:29 EDT
X-submission-address compilers@iecc.com
X-moderator-address compilers-request@iecc.com
X-FAQ-and-archives http://compilers.iecc.com
Thread-Topic Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers?
Thread-Index AdiIkmeY/cQZAc/NR0C+NL7PLIp4Bw==
Xref csiph.com comp.compilers:3096

Show key headers only | View raw


Hi Folks,

Page 101-102 of the dragon book recommends having one lexer rule for both
keyword and identifiers (symtab = symbol table):

[a-zA-Z][a-zA-Z0-9]*      { Action: check symtab for the lexeme.
			If the symtab says the lexeme
			is a keyword, return the keyword.
			Else, if the symtab says the lexeme
			is an existing identifier, return ID
			and a pointer to the symtab entry.
			Else, add the identifier to the symtab
			and return ID and a pointer to the
			new symtab entry. }

The book says the advantages of this approach are:

1. Easy to add new keywords. Simply initialize the symtab with the new
keywords. No changes to the lexer rules.

2. Smaller transition diagram. Instead of several hundred states, there are
less than a hundred states in the transition diagram.

Those seem like pretty compelling reasons for having one lexer rule for both
keywords and identifiers.

So why don't compiler (lexer) writers follow that recommendation?

Consider:

Page 86-90 of the book "flex & bison" shows the lexer for MySQL. The lexer has
a rule for every keyword.

Pag 228 of "Introduction to Compiling Techniques" show a lexer for VSL (Very
Simple Language). Again, the lexer has a rule for every keyword.

/Roger
[I think the answer to a lot of these questions contains the phrase "64K PDP-11."  In
the 1970s a thousand state DFA took a lot of memory.  Today it's barely a rounding error.
In flex&bison the main reason I did it the way I did is that it's a pedagogical example
but it also meant the symbol table didn't need special cases for keywords.  There is also
the PL/I situation where keywords are only keywords in certain contexts so you can say
 IF IF = GOTO; THEN ELSE = BEGIN; ELSE END = IF;
-John]

Back to comp.compilers | Previous | NextNext in thread | Find similar


Thread

Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers? Roger L Costello <costello@mitre.org> - 2022-06-25 12:58 +0000
  Re: Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers? gah4 <gah4@u.washington.edu> - 2022-06-25 13:01 -0700
    Re: Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers? Kaz Kylheku <480-992-1380@kylheku.com> - 2022-06-26 06:17 +0000
    Re: Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-06-26 10:20 +0200
  One lexer rule for keywords and identifiers Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-06-26 00:09 +0300

csiph-web