Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Roger L Costello Newsgroups: comp.compilers Subject: Simple Lexer and Simple Parser [ was RE: Flex is the most powerful lexical analysis language in the world. True or False? ] Date: Sun, 8 May 2022 13:34:03 +0000 Organization: Compilers Central Lines: 49 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-05-022@comp.compilers> References: <22-05-003@comp.compilers> <22-05-007@comp.compilers> <22-05-009@comp.compilers> <22-05-018@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="38724"; mail-complaints-to="abuse@iecc.com" Keywords: lex Posted-Date: 08 May 2022 14:45:06 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Content-Language: en-US Xref: csiph.com comp.compilers:3001 Thank you again Chris. Terrific information. Another question if I may. You wrote: > And that goes to an important point. Your lexer *should be* almost > trivially simple (i.e. regular expressions only and not complicated > ones). You rarely want to solve problems at the lexical level. You > are much less likely to get good error reporting if you do. In most > cases, your parser should be simple also. For a while now I have been (for fun) working on building a parser for parsing XML documents. I have experimented with making the lexer simple and with making the parser simple. If I make the lexer simple, then the parser is complex. If I make the lexer complex (using lots of states and making heavy use of Flex's pushdown stack) then the parser is simple. It doesn't seem possible to make both the lexer and parser simple. There are lots of "conditional rules" in XML. For example, in XML the & is called an "XML entity." Since the & is a reserved symbol, XML documents need to use & instead of &. An XML parser is to convert & to &. However, if the & is in certain contexts -- within a comment or within a CDATA section -- then the & is not converted. Thus, there is conditional processing: IF (& is in a comment or in a CDATA section) THEN OUTPUT(&) ELSE OUTPUT(&) Flex's states/stack mechanism is ideally suited for conditional processing like this. From the section on Start Conditions in the Flex manual: "flex provides a mechanism for conditionally activating rules." So while it would be great to have a simple lexer, I am leaning towards dealing with the conditional rules in XML using the Flex states/stack mechanism rather than dealing with the conditional rules in Bison. In other words, I am leaning towards a complex lexer. I am interested in hearing your thoughts on this. > You don't need a flamethrower My apologies. It wasn't my intent to throw a flame. But in hindsight I can see that I should have worded things much better. I will do better in the future. /Roger