Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: "Alexei A. Frounze" <alexfrunews@gmail.com>
Newsgroups: comp.compilers
Subject: Re: The dragon book says separating lexical analysis and parsing is beneficial, so why doesn't ANTLR separate them?
Date: Thu, 9 Jun 2022 18:07:46 -0700 (PDT)
Organization: Compilers Central
Lines: 30
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-06-027@comp.compilers>
References: <22-06-023@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="49582"; mail-complaints-to="abuse@iecc.com"
Keywords: design, performance, comment
Posted-Date: 09 Jun 2022 21:15:54 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-06-023@comp.compilers>
Xref: csiph.com comp.compilers:3062

On Thursday, June 9, 2022 at 9:50:50 AM UTC-7, Roger L Costello wrote:
> 2. ...
> A large amount of time is spent reading the source program and partitioning it
> into tokens. Specialized buffering techniques for reading input characters and
> processing tokens can significantly speed up the performance of a compiler.

In any decent compiler this amount of time is relatively small compared to optimizations.
Unless we're talking about JIT, which is a different kind of compiling.

> 3. Compiler portability is enhanced. Input alphabet peculiarities and other
> device-specific anomalies can be restricted to the lexical analyzer. The
> representation of special or non-standard symbols, such as the up-arrow in
> Pascal, can be isolated in the lexical analyzer.

ASCII is supported out of the box nowadays. Unicode is available and simply
parsing and storing Unicode code points is straightforward (processing
Unicode text, especially displaying, is problematic, but that should hardly
affect a programming language or its compiler much).

> Those seem like compelling reasons for separating the lexical analysis from
> parsing,
...
> [1] Compilers: Principles, Techniques, and Tools by Aho, Sethi, and Ullman.

The book is old and doesn't quite reflect the current state of things, IMO.

Alex
[My impression is that the lexer can often take significant time since it has to look
at every character in the input, but the parser is fast unless you're doing something
strange like very ambiguous Earley parsing. -John]