Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #3119
| From | Spiros Bousbouras <spibou@gmail.com> |
|---|---|
| Newsgroups | comp.compilers |
| Subject | Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? |
| Date | 2022-07-15 07:08 +0000 |
| Organization | Cyber23 news |
| Message-ID | <22-07-018@comp.compilers> (permalink) |
| References | <22-07-011@comp.compilers> <22-07-015@comp.compilers> |
On Thu, 14 Jul 2022 16:38:32 -0400
George Neuner <gneuner2@comcast.net> wrote:
> On Thu, 14 Jul 2022 10:25:24 +0000, Roger L Costello
> <costello@mitre.org> wrote:
>
> >Hi Folks,
> >
> >A common example in books on Lex/Flex and Yacc/Bison is evaluating arithmetic
> >expressions. When the lexer encounters an integer lexeme, it casts the lexeme
> >to a binary integer and returns the value to the parser. The lexer contains a
> >rule that looks something like this:
> >
> >{INTEGER} { yylval.intval = atoi(yytext); return NUMBER; }
> >
> >But, but, but, ...
> >
> >Countless times on this list I have been told: Keep the lexer simple!
> >
> >By converting the lexeme to an integer, the lexer has assumed that the parser
> >needs/wants a binary integer, not a text number. How does the lexer know what
> >the parser needs/wants?
>
> Presumably because the same developer that wrote the parser also wrote
> the rules for the lexer.
[...]
> >It seems to me that the lexer should return to the parser the text number and
> >it is the responsibility of the parser to convert the value to an integer data
> >type if it desires.
> >
> >What do you think?
>
> The general problem is performance. The place to decide what class a
> token falls into is where you contact the text.
>
> In principle the lexer could be as simple as strtok() - just
> separating the tokens, returning a pointer directly to its input
> buffer, and punting all further processing to the parser.
>
> But ...
>
> The lexer has already scanned the text just to locate the end of it.
> If it then simply gives the text to the parser, the parser must scan
> the text AGAIN to verify that it represents a number (or whatever the
> parser happens to be looking for at that point).
>
> Then the text gets scanned YET AGAIN to convert it to a number.
>
> That is 3 scans of the text if the parser does the work versus 2 scans
> if the lexer does the work.
I don't see why it has to be this way. The lexer can return a structure
with components START , END , KIND which says that a token of kind , for
example , "integer number" was found in the input stream starting at START
and ending at END .The parser wouldn't have to verify again that it is
a number and would simply go over the characters from START to END a 2nd
time if it needs to convert the number to a different format. But if the
parser knows that a number at this point is a syntax error then it can
simply signal that and save the extra processing required for conversion
i.e. call atoi() or whatever.
[...]
> Moving (at least gross) input classification into the lexer avoids at
> least one pass over the entirety of the input. When considered
> against the length of the average input, that can add up to a LOT of
> avoided work.
I don't think Roger was suggesting that the lexer should not do gross input
classification (like recognise that a token is a number) but rather that
perhaps it should not convert the number.
[In flex lexers you can't count on text of the token being available
after the action routine is done, so you have to make a copy and keep
track of the copy. That is usually way more work than whatever else
you might do with it. -John]
Back to comp.compilers | Previous | Next — Previous in thread | Next in thread | Find similar
Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Roger L Costello <costello@mitre.org> - 2022-07-14 10:25 +0000
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? gah4 <gah4@u.washington.edu> - 2022-07-14 10:03 -0700
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? George Neuner <gneuner2@comcast.net> - 2022-07-14 16:38 -0400
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Spiros Bousbouras <spibou@gmail.com> - 2022-07-15 07:08 +0000
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-07-15 03:02 -0700
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-07-15 10:50 -0700
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? George Neuner <gneuner2@comcast.net> - 2022-07-17 16:52 -0400
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Thomas Koenig <tkoenig@netcologne.de> - 2022-07-18 05:44 +0000
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? George Neuner <gneuner2@comcast.net> - 2022-07-17 18:01 -0400
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Kaz Kylheku <480-992-1380@kylheku.com> - 2022-07-15 14:41 +0000
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? "matt.ti...@gmail.com" <matt.timmermans@gmail.com> - 2022-07-16 05:32 -0700
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-07-17 13:10 -0400
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? gah4 <gah4@u.washington.edu> - 2022-07-17 20:39 -0700
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? gah4 <gah4@u.washington.edu> - 2022-07-19 16:39 -0700
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? luser droog <luser.droog@gmail.com> - 2022-07-21 14:16 -0700
Scannerless parsing was: Why does the lexer convert text integer lexemes ...? "Ev. Drikos" <drikosev@gmail.com> - 2022-07-21 13:41 +0300
Re: Scannerless parsing was: Why does the lexer convert text integer lexemes ...? "Ev. Drikos" <drikosev@gmail.com> - 2022-07-22 12:29 +0300
csiph-web