Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: "matt.ti...@gmail.com" <matt.timmermans@gmail.com>
Newsgroups: comp.compilers
Subject: Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple?
Date: Sat, 16 Jul 2022 05:32:58 -0700 (PDT)
Organization: Compilers Central
Lines: 26
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <22-07-028@comp.compilers>
References: <22-07-011@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="14795"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, parse, design
Posted-Date: 16 Jul 2022 13:09:10 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
In-Reply-To: <22-07-011@comp.compilers>
Xref: csiph.com comp.compilers:3125

On Thursday, 14 July 2022 at 11:10:56 UTC-4, Roger L Costello wrote:
> [...]
> It seems to me that the lexer should return to the parser the text number
and
> it is the responsibility of the parser to convert the value to an integer
data
> type if it desires.
>
> What do you think?

The division of the job into lexing and parsing is *not* an important
separation of concerns.  Both of these are written at the same time, by the
same person, and the requirements of the parser feed into the lexer in many
detailed ways.  Their specifications are highly coupled and, I expect, almost
always written by the same person pretty much simultaneously.

Instead, this division -- a great big line that divides one part of a
context-free grammar from the other -- is an annoying practical concession
that we make to improve performance in time and size (smaller tables, and
optimization for text), and to get around the limitations in our tools (LR(1)
is a lot less useful when the (1) is a character).

This is specifically why all the answers here are giving you less than
compelling practical justifications for parsing numbers in the lexer and
nobody seems to mind.  There really is no "should" and "should not" w.r.t. the
division between lexing and parsing except what is practical.