Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Spiros Bousbouras Newsgroups: comp.compilers Subject: Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Date: Fri, 15 Jul 2022 07:08:17 -0000 (UTC) Organization: Cyber23 news Lines: 76 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-07-018@comp.compilers> References: <22-07-011@comp.compilers> <22-07-015@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="67799"; mail-complaints-to="abuse@iecc.com" Keywords: lex, design, comment Posted-Date: 15 Jul 2022 12:07:05 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3119 On Thu, 14 Jul 2022 16:38:32 -0400 George Neuner wrote: > On Thu, 14 Jul 2022 10:25:24 +0000, Roger L Costello > wrote: > > >Hi Folks, > > > >A common example in books on Lex/Flex and Yacc/Bison is evaluating arithmetic > >expressions. When the lexer encounters an integer lexeme, it casts the lexeme > >to a binary integer and returns the value to the parser. The lexer contains a > >rule that looks something like this: > > > >{INTEGER} { yylval.intval = atoi(yytext); return NUMBER; } > > > >But, but, but, ... > > > >Countless times on this list I have been told: Keep the lexer simple! > > > >By converting the lexeme to an integer, the lexer has assumed that the parser > >needs/wants a binary integer, not a text number. How does the lexer know what > >the parser needs/wants? > > Presumably because the same developer that wrote the parser also wrote > the rules for the lexer. [...] > >It seems to me that the lexer should return to the parser the text number and > >it is the responsibility of the parser to convert the value to an integer data > >type if it desires. > > > >What do you think? > > The general problem is performance. The place to decide what class a > token falls into is where you contact the text. > > In principle the lexer could be as simple as strtok() - just > separating the tokens, returning a pointer directly to its input > buffer, and punting all further processing to the parser. > > But ... > > The lexer has already scanned the text just to locate the end of it. > If it then simply gives the text to the parser, the parser must scan > the text AGAIN to verify that it represents a number (or whatever the > parser happens to be looking for at that point). > > Then the text gets scanned YET AGAIN to convert it to a number. > > That is 3 scans of the text if the parser does the work versus 2 scans > if the lexer does the work. I don't see why it has to be this way. The lexer can return a structure with components START , END , KIND which says that a token of kind , for example , "integer number" was found in the input stream starting at START and ending at END .The parser wouldn't have to verify again that it is a number and would simply go over the characters from START to END a 2nd time if it needs to convert the number to a different format. But if the parser knows that a number at this point is a syntax error then it can simply signal that and save the extra processing required for conversion i.e. call atoi() or whatever. [...] > Moving (at least gross) input classification into the lexer avoids at > least one pass over the entirety of the input. When considered > against the length of the average input, that can add up to a LOT of > avoided work. I don't think Roger was suggesting that the lexer should not do gross input classification (like recognise that a token is a number) but rather that perhaps it should not convert the number. [In flex lexers you can't count on text of the token being available after the action routine is done, so you have to make a copy and keep track of the copy. That is usually way more work than whatever else you might do with it. -John]