Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #3133
| From | luser droog <luser.droog@gmail.com> |
|---|---|
| Newsgroups | comp.compilers |
| Subject | Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? |
| Date | 2022-07-21 14:16 -0700 |
| Organization | Compilers Central |
| Message-ID | <22-07-044@comp.compilers> (permalink) |
| References | <22-07-011@comp.compilers> <22-07-030@comp.compilers> <22-07-036@comp.compilers> <22-07-040@comp.compilers> |
On Wednesday, July 20, 2022 at 7:45:07 PM UTC-5, gah4 wrote: > On Monday, July 18, 2022 at 9:30:51 AM UTC-7, gah4 wrote: > > (snip, or moderator wrote) > > [In my experience separating the lexer from the parser makes it a lot easier > > to deal with common lexical situations like skipping white space and comments. > > You could certainly do that in a combined scheme but I'm not sure it would end > > up any simpler. -John] > Interesting. As I previously noted, STEP mostly doesn't do a separate lexical analysis. > > It does, however, do three things before the macros see the input: convert multiple > blanks to a single blank, pass apostrophed strings through whole, and remove > comments delimited by double quotes. > > Apostrophed strings are slightly more interesting. Internal double apostrophes > are converted to single apostrophes, and the delimiting apostrophes are > converted to a special character that isn't an input character. > > One of my projects 45 years ago, was to write macros to recognize the > syntax of IBM OS/360 Fortran IV. Direct access I/O statements use > a single apostrophe to delimit the record number: > > READ(1'N) X,Y,Z > > There is no way to write macros for that syntax after the previous processing. > > Much fun figuring out all the strange things done in programming language > syntax over the years. This approach appears to offer a very nice simplification for most Algol-style languages. But removing the white space entirely makes it harder (or impossible) to parse languages like Python and Haskell which use the "offside rule" to interpret the white space as delimiting multi-line constructs. I haven't solved the above completely, but I've been building my parser combinators with an eye towards supporting significant white space in the syntax analysis, while mostly ignoring it. It's parsers all the way down, but the parsers are designed to operate over lists, so the infrastructure is agnostic as to the actual type of the elements of the list. So, you can build the lexical analysis layer as a graph of parsers that work on lists of integers (characters) and produce Symbol objects. The syntax analysis layer can then be built as a graph of parsers that work on lists of Symbols. The symbol object has an extra slot for stashing extra data, so the white space can be captured and then hidden from the syntax analysis (unless some handler or predicate function wants to peek in there).
Back to comp.compilers | Previous | Next — Previous in thread | Next in thread | Find similar
Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Roger L Costello <costello@mitre.org> - 2022-07-14 10:25 +0000
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? gah4 <gah4@u.washington.edu> - 2022-07-14 10:03 -0700
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? George Neuner <gneuner2@comcast.net> - 2022-07-14 16:38 -0400
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Spiros Bousbouras <spibou@gmail.com> - 2022-07-15 07:08 +0000
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-07-15 03:02 -0700
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2022-07-15 10:50 -0700
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? George Neuner <gneuner2@comcast.net> - 2022-07-17 16:52 -0400
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Thomas Koenig <tkoenig@netcologne.de> - 2022-07-18 05:44 +0000
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? George Neuner <gneuner2@comcast.net> - 2022-07-17 18:01 -0400
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Kaz Kylheku <480-992-1380@kylheku.com> - 2022-07-15 14:41 +0000
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? "matt.ti...@gmail.com" <matt.timmermans@gmail.com> - 2022-07-16 05:32 -0700
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-07-17 13:10 -0400
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? gah4 <gah4@u.washington.edu> - 2022-07-17 20:39 -0700
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? gah4 <gah4@u.washington.edu> - 2022-07-19 16:39 -0700
Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? luser droog <luser.droog@gmail.com> - 2022-07-21 14:16 -0700
Scannerless parsing was: Why does the lexer convert text integer lexemes ...? "Ev. Drikos" <drikosev@gmail.com> - 2022-07-21 13:41 +0300
Re: Scannerless parsing was: Why does the lexer convert text integer lexemes ...? "Ev. Drikos" <drikosev@gmail.com> - 2022-07-22 12:29 +0300
csiph-web