Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: gah4 Newsgroups: comp.compilers Subject: Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Date: Tue, 19 Jul 2022 16:39:36 -0700 (PDT) Organization: Compilers Central Lines: 29 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-07-040@comp.compilers> References: <22-07-011@comp.compilers> <22-07-030@comp.compilers> <22-07-036@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="91007"; mail-complaints-to="abuse@iecc.com" Keywords: lex, design Posted-Date: 20 Jul 2022 20:45:04 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com In-Reply-To: <22-07-036@comp.compilers> Xref: csiph.com comp.compilers:3131 On Monday, July 18, 2022 at 9:30:51 AM UTC-7, gah4 wrote: (snip, or moderator wrote) > [In my experience separating the lexer from the parser makes it a lot easier > to deal with common lexical situations like skipping white space and comments. > You could certainly do that in a combined scheme but I'm not sure it would end > up any simpler. -John] Interesting. As I previously noted, STEP mostly doesn't do a separate lexical analysis. It does, however, do three things before the macros see the input: convert multiple blanks to a single blank, pass apostrophed strings through whole, and remove comments delimited by double quotes. Apostrophed strings are slightly more interesting. Internal double apostrophes are converted to single apostrophes, and the delimiting apostrophes are converted to a special character that isn't an input character. One of my projects 45 years ago, was to write macros to recognize the syntax of IBM OS/360 Fortran IV. Direct access I/O statements use a single apostrophe to delimit the record number: READ(1'N) X,Y,Z There is no way to write macros for that syntax after the previous processing. Much fun figuring out all the strange things done in programming language syntax over the years.