Path: csiph.com!1.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: gah4 <gah4@u.washington.edu>
Newsgroups: comp.compilers
Subject: Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple?
Date: Sun, 17 Jul 2022 20:39:50 -0700 (PDT)
Organization: Compilers Central
Lines: 35
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-07-036@comp.compilers>
References: <22-07-011@comp.compilers> <22-07-030@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="74774"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, design, comment
Posted-Date: 18 Jul 2022 12:30:48 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-07-030@comp.compilers>
Xref: csiph.com comp.compilers:3129

On Sunday, July 17, 2022 at 10:29:53 AM UTC-7, christoph...@compiler-resources.com wrote:
> You are asking the wrong question. You are optimizing at the wrong level.
> Stop.

(snip)

> You are focusing on the trivial, the irrelevant. It is unlikely that
> having the lexer convert integers (or floats or quaternions) into a binary
> representation is a sufficiently expensive operation to make sense fretting
> about it.

Not so long ago, I wrote about the whole idea of separating lexing and parsing
was overoptimizing.  It was needed many years ago, when computer memories
were in kilobytes, not yet megabytes or gigabytes.  But got almost no comment.

Converting to integer has lots of problems, especially in the case of overflow.

In some cases, and even for some people, separating lexing and parsing might
make it easier to write and/or understand. Those are important.

But maybe not.

The STEP processor, which I have written about before, mostly doesn't
separate them.  There are some built-in rules related to where macro
matches can start and end, and those are similar to the results of
some lexical analysis.  Some of those rules speed up processing, which
was more important 45 years ago than now.

I suspect that I believe now that they could be separate, but implemented
by the same program.  That would allow them to be separated where
convenient for the user, and also not separated where it is easier.
[In my experience separating the lexer from the parser makes it a lot easier
to deal with common lexical situations like skipping white space and comments.
You could certainly do that in a combined scheme but I'm not sure it would end
up any simpler. -John]