Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Hans-Peter Diettrich Newsgroups: comp.compilers Subject: Re: What stage should entities be resolved? Date: Mon, 14 Mar 2022 19:43:22 +0100 Organization: Compilers Central Lines: 28 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-03-031@comp.compilers> References: <22-03-019@comp.compilers> <22-03-025@comp.compilers> <22-03-028@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="56396"; mail-complaints-to="abuse@iecc.com" Keywords: parse, design Posted-Date: 14 Mar 2022 14:50:21 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2933 On 3/12/22 1:11 PM, Christopher F Clark wrote: > Contrary to what might assume from my previous posting on this topic. > I agree with Dodi. > > Sometimes, the right answer is another phase. To keep your lexer > simple, it can be useful to have a separate phase that deals with > "character" issues, whether that is transforming UTF-8 extensions into > unique code points (or actual characters representing glyphs possibly > accented, i.e. resolving the combining code points into canonical > versions) or taking sequences like & or \n or whatever into single > tokens (or characters). That *can* make the whole process simpler and > faster. I consider these "phases" as "filters". In my C parser I also had a number of filter levels that handle the various aspects in detail of the preprocessor macro substitution and conditional compilation. The parser calls the top level filter to return the next C token, which in turn calls lower level filters until all levels returned enough information about the next token to parse. A sloppy interpretation by Microsoft of the preprocessor as a self-contained stage revealed that the newer C standards disallow a stand-alone C preprocessor. Such a separate preprocessor could synthesize tokens like "//" that never occured in a strict (embedded) C standard implementation. Even if this was not stated explicitly in the standard it turned out as a side effect of the lexer implementation. DoDi