Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: "Alexei A. Frounze" Newsgroups: comp.compilers Subject: Re: The dragon book says separating lexical analysis and parsing is beneficial, so why doesn't ANTLR separate them? Date: Thu, 9 Jun 2022 18:07:46 -0700 (PDT) Organization: Compilers Central Lines: 30 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-06-027@comp.compilers> References: <22-06-023@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="49582"; mail-complaints-to="abuse@iecc.com" Keywords: design, performance, comment Posted-Date: 09 Jun 2022 21:15:54 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com In-Reply-To: <22-06-023@comp.compilers> Xref: csiph.com comp.compilers:3062 On Thursday, June 9, 2022 at 9:50:50 AM UTC-7, Roger L Costello wrote: > 2. ... > A large amount of time is spent reading the source program and partitioning it > into tokens. Specialized buffering techniques for reading input characters and > processing tokens can significantly speed up the performance of a compiler. In any decent compiler this amount of time is relatively small compared to optimizations. Unless we're talking about JIT, which is a different kind of compiling. > 3. Compiler portability is enhanced. Input alphabet peculiarities and other > device-specific anomalies can be restricted to the lexical analyzer. The > representation of special or non-standard symbols, such as the up-arrow in > Pascal, can be isolated in the lexical analyzer. ASCII is supported out of the box nowadays. Unicode is available and simply parsing and storing Unicode code points is straightforward (processing Unicode text, especially displaying, is problematic, but that should hardly affect a programming language or its compiler much). > Those seem like compelling reasons for separating the lexical analysis from > parsing, ... > [1] Compilers: Principles, Techniques, and Tools by Aho, Sethi, and Ullman. The book is old and doesn't quite reflect the current state of things, IMO. Alex [My impression is that the lexer can often take significant time since it has to look at every character in the input, but the parser is fast unless you're doing something strange like very ambiguous Earley parsing. -John]