Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #3113

Re: Learning only one lexer made me blind to its hidden assumptions

From "Ev. Drikos" <drikosev@gmail.com>
Newsgroups comp.compilers
Subject Re: Learning only one lexer made me blind to its hidden assumptions
Date 2022-07-13 14:58 +0300
Organization Aioe.org NNTP Server
Message-ID <22-07-009@comp.compilers> (permalink)
References <22-07-006@comp.compilers>

Show all headers | View raw


On 07/07/2022 20:49, Roger L Costello wrote:
> ...

> Difference:
> - Flex allows overlapping regexes. It is up to Flex to use the 'correct'
> regex. Flex has rules for picking the correct one: longest match wins, regex
> listed first wins.
> - ScanGen does not allow overlapping regexes. Instead, you create one regex
> and then, if needed, you create "Except" clauses. E.g., the token is an
> Identifier, except if the token is 'Begin' or 'End' or 'Read' or 'Write'
>
> ...

As you can imagine there are many such options. A DFA builder may have
options a) to behave as Flex b) to treat only some tokens as reserved,
others as non reserved and c) to allow you examine shorter matches.

Who knows what else there is out there! (I don't claim to be an expert)

> Difference:
> - Flex deals with individual characters
> - ScanGen lumps characters into character classes and deals with classes. Use
> of character classes decreases (quite significantly) the size of the
> transition table
>

FYI, there is also a related controversial issue that may fire flames!

Bison also doesn't support character classes and this could be a reason
that scannerless parsing sounds weird to several people. Of course one
may use Bison down to the character level, but with many more states.

Also, if the grammar allows two consecutive identifiers, a lookahead
operator is likely necessary. (admittedly, a better alternative to
scannerless parsing may be different start states as supported by Flex).

When I played in the past with a scannerless GRL parser for SQL I hadn't
seen dramatic runtime slow downs with a few single/multi line commands.
Yet, I wouldn't try (or suggest) such an approach for XML processing.

> ...

Back to comp.compilers | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Learning only one lexer made me blind to its hidden assumptions Roger L Costello <costello@mitre.org> - 2022-07-07 17:49 +0000
  Re: Learning only one lexer made me blind to its hidden assumptions luser droog <luser.droog@gmail.com> - 2022-07-12 19:49 -0700
    Re: Learning only one lexer made me blind to its hidden assumptions Juan Miguel Vilar Torres <jvilar@uji.es> - 2022-07-13 01:46 -0700
  Re: Learning only one lexer made me blind to its hidden assumptions "Ev. Drikos" <drikosev@gmail.com> - 2022-07-13 14:58 +0300
  Re: Learning only one lexer made me blind to its hidden assumptions antispam@math.uni.wroc.pl - 2022-07-13 19:52 +0000
    Re: Learning only one lexer made me blind to its hidden assumptions George Neuner <gneuner2@comcast.net> - 2022-07-14 16:46 -0400
      Re: Learning only one lexer made me blind to its hidden assumptions antispam@math.uni.wroc.pl - 2022-07-15 20:14 +0000
  Re: Learning only one lexer made me blind to its hidden assumptions Kaz Kylheku <480-992-1380@kylheku.com> - 2022-07-15 14:16 +0000

csiph-web