Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #3133

Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple?

From luser droog <luser.droog@gmail.com>
Newsgroups comp.compilers
Subject Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple?
Date 2022-07-21 14:16 -0700
Organization Compilers Central
Message-ID <22-07-044@comp.compilers> (permalink)
References <22-07-011@comp.compilers> <22-07-030@comp.compilers> <22-07-036@comp.compilers> <22-07-040@comp.compilers>

Show all headers | View raw


On Wednesday, July 20, 2022 at 7:45:07 PM UTC-5, gah4 wrote:
> On Monday, July 18, 2022 at 9:30:51 AM UTC-7, gah4 wrote:
>
> (snip, or moderator wrote)
> > [In my experience separating the lexer from the parser makes it a lot easier
> > to deal with common lexical situations like skipping white space and comments.
> > You could certainly do that in a combined scheme but I'm not sure it would end
> > up any simpler. -John]
> Interesting. As I previously noted, STEP mostly doesn't do a separate lexical analysis.
>
> It does, however, do three things before the macros see the input: convert multiple
> blanks to a single blank, pass apostrophed strings through whole, and remove
> comments delimited by double quotes.
>
> Apostrophed strings are slightly more interesting. Internal double apostrophes
> are converted to single apostrophes, and the delimiting apostrophes are
> converted to a special character that isn't an input character.
>
> One of my projects 45 years ago, was to write macros to recognize the
> syntax of IBM OS/360 Fortran IV. Direct access I/O statements use
> a single apostrophe to delimit the record number:
>
> READ(1'N) X,Y,Z
>
> There is no way to write macros for that syntax after the previous processing.
>
> Much fun figuring out all the strange things done in programming language
> syntax over the years.

This approach appears to offer a very nice simplification for most Algol-style
languages. But removing the white space entirely makes it harder (or impossible)
to parse languages like Python and Haskell which use the "offside rule" to
interpret the white space as delimiting multi-line constructs.

I haven't solved the above completely, but I've been building my parser combinators with
an eye towards supporting significant white space in the syntax analysis, while
mostly ignoring it.

It's parsers all the way down, but the parsers are designed to operate over lists,
so the infrastructure is agnostic as to the actual type of the elements of the list.
So, you can build the lexical analysis layer as a graph of parsers that work on
lists of integers (characters) and produce Symbol objects. The syntax analysis
layer can then be built as a graph of parsers that work on lists of Symbols.

The symbol object has an extra slot for stashing extra data, so the white space
can be captured and then hidden from the syntax analysis (unless some handler
or predicate function wants to peek in there).

Back to comp.compilers | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Roger L Costello <costello@mitre.org> - 2022-07-14 10:25 +0000
  Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? gah4 <gah4@u.washington.edu> - 2022-07-14 10:03 -0700
  Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? George Neuner <gneuner2@comcast.net> - 2022-07-14 16:38 -0400
    Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Spiros Bousbouras <spibou@gmail.com> - 2022-07-15 07:08 +0000
    Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-07-15 03:02 -0700
      Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-07-15 10:50 -0700
      Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? George Neuner <gneuner2@comcast.net> - 2022-07-17 16:52 -0400
        Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Thomas Koenig <tkoenig@netcologne.de> - 2022-07-18 05:44 +0000
      Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? George Neuner <gneuner2@comcast.net> - 2022-07-17 18:01 -0400
  Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Kaz Kylheku <480-992-1380@kylheku.com> - 2022-07-15 14:41 +0000
  Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? "matt.ti...@gmail.com" <matt.timmermans@gmail.com> - 2022-07-16 05:32 -0700
  Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-07-17 13:10 -0400
    Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? gah4 <gah4@u.washington.edu> - 2022-07-17 20:39 -0700
      Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? gah4 <gah4@u.washington.edu> - 2022-07-19 16:39 -0700
        Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? luser droog <luser.droog@gmail.com> - 2022-07-21 14:16 -0700
      Scannerless parsing was: Why does the lexer convert text integer lexemes ...? "Ev. Drikos" <drikosev@gmail.com> - 2022-07-21 13:41 +0300
        Re: Scannerless parsing was: Why does the lexer convert text integer lexemes ...? "Ev. Drikos" <drikosev@gmail.com> - 2022-07-22 12:29 +0300

csiph-web