Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Roger L Costello Newsgroups: comp.compilers Subject: A simpler way to tokenize and parse? Date: Fri, 24 Mar 2023 14:45:40 +0000 Organization: Compilers Central Sender: johnl@iecc.com Approved: comp.compilers@iecc.com Message-ID: <23-03-011@comp.compilers> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="84956"; mail-complaints-to="abuse@iecc.com" Keywords: Lisp, lex, comment Posted-Date: 24 Mar 2023 18:43:03 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3413 Hello Compiler Experts! I am reading the book, "Programming Languages, Application and Interpretation" by Shriram Krishnamurthi. The book says that Lisp and Scheme have a primitive called "read". The book says, "The read primitive is a crown jewel of Lisp and Scheme." Some of my notes from reading the book: - Read does tokenizing and reading. - Read returns a value known as an s-expression. - The s-expression is an intermediate representation. - The output of read is either a number or a list. That's it! Example of tokenizing/parsing using read: (+ 3 4) --> read --> (list `+ 3 4) --> parse --> (add (num 3) (num 4)) The first expression (+ 3 4) is the concrete syntax. The middle expression (list `+ 3 4) is an s-expression. It is an intermediate representation. The last expression (add (num 3) (num 4)) is the abstract syntax. The book says: read is one of the great ideas of computer science. It helps decompose a fundamentally difficult process - generalized parsing of the input stream - into two simple processes: (1) reading the input stream into an intermediate representation (2) parsing that intermediate representation I've read several compiler books and none of them talked about this. They talk about creating a lexer to generate a stream of tokens and a parser that receives the tokens and arranges them into a tree data structure. Why no mention of the "crown jewel" of tokenizing/parsing? Why no mention of "one of the great ideas of computer science"? I have done some work with Flex and Bison and recently I've done some work with building parsers using read. My experience is the latter is much easier. Why isn't read more widely discussed and used in the compiler community? Surely the concept that read embodies is not specific to Lisp and Scheme, right? /Roger [Yes, it's specific to Lisp and Scheme. They have an extremely simple symtax called S expressions of nested parenthesized lists of space separated tokens with some quoting. The original plan was that Lisp 2 would have M expressions that looked more like a normal language but it's over 50 years later and they still haven't gotten around to it. -John]