Path: csiph.com!1.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: gah4 Newsgroups: comp.compilers Subject: Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Date: Sun, 17 Jul 2022 20:39:50 -0700 (PDT) Organization: Compilers Central Lines: 35 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-07-036@comp.compilers> References: <22-07-011@comp.compilers> <22-07-030@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="74774"; mail-complaints-to="abuse@iecc.com" Keywords: lex, design, comment Posted-Date: 18 Jul 2022 12:30:48 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com In-Reply-To: <22-07-030@comp.compilers> Xref: csiph.com comp.compilers:3129 On Sunday, July 17, 2022 at 10:29:53 AM UTC-7, christoph...@compiler-resources.com wrote: > You are asking the wrong question. You are optimizing at the wrong level. > Stop. (snip) > You are focusing on the trivial, the irrelevant. It is unlikely that > having the lexer convert integers (or floats or quaternions) into a binary > representation is a sufficiently expensive operation to make sense fretting > about it. Not so long ago, I wrote about the whole idea of separating lexing and parsing was overoptimizing. It was needed many years ago, when computer memories were in kilobytes, not yet megabytes or gigabytes. But got almost no comment. Converting to integer has lots of problems, especially in the case of overflow. In some cases, and even for some people, separating lexing and parsing might make it easier to write and/or understand. Those are important. But maybe not. The STEP processor, which I have written about before, mostly doesn't separate them. There are some built-in rules related to where macro matches can start and end, and those are similar to the results of some lexical analysis. Some of those rules speed up processing, which was more important 45 years ago than now. I suspect that I believe now that they could be separate, but implemented by the same program. That would allow them to be separated where convenient for the user, and also not separated where it is easier. [In my experience separating the lexer from the parser makes it a lot easier to deal with common lexical situations like skipping white space and comments. You could certainly do that in a combined scheme but I'm not sure it would end up any simpler. -John]