Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #3110
| From | Roger L Costello <costello@mitre.org> |
|---|---|
| Newsgroups | comp.compilers |
| Subject | Learning only one lexer made me blind to its hidden assumptions |
| Date | 2022-07-07 17:49 +0000 |
| Organization | Compilers Central |
| Message-ID | <22-07-006@comp.compilers> (permalink) |
Hi Folks, For months I have been immersed in learning and using Flex. Great fun indeed. But recently I have been reading a book, Crafting a Compiler with C, and reading its chapter on lexers. The chapter describes two lexer-generators: ScanGen and Lex. Oh my! Learning ScanGen opened my eyes to the hidden assumptions in Lex/Flex. Without learning ScanGen I would have continued to think that the way things are done in Lex/Flex way is the only way. Below I have documented some of the differences between Lex/Flex and ScanGen. Difference: - Flex allows overlapping regexes. It is up to Flex to use the 'correct' regex. Flex has rules for picking the correct one: longest match wins, regex listed first wins. - ScanGen does not allow overlapping regexes. Instead, you create one regex and then, if needed, you create "Except" clauses. E.g., the token is an Identifier, except if the token is 'Begin' or 'End' or 'Read' or 'Write' Difference: - Flex regexes use juxtaposition for specifying concatenation. - ScanGen uses '.' to specify concatenation. And oh by the way, ScanGen calls it 'catenation' not 'concatenation' Difference: - Flex regexes use | for specifying alteration in regexes - ScanGen uses ',' to specify alternation Difference: - With Flex, tossing out characters (e.g., toss out the quotes surrounding a string) may involve writing C code to reprocess the token - ScanGen has a 'Toss' command to toss out a character, e.g, Quote(Toss). No token reprocessing needed Difference: Flex regexes use ^ for specifying 'not', e.g., [^ab] means any char except a and b ScanGen regexes uses 'Not', e.g., Not(Quote) Difference: - Flex deals with individual characters - ScanGen lumps characters into character classes and deals with classes. Use of character classes decreases (quite significantly) the size of the transition table Difference: - Flex regexes use the ? meta-symbol - ScanGen doesn't have that. Instead, it has 'Epsilon' Difference: - ScanGen has something called a Major number and a Minor number for each token - Flex doesn't have that concept [For the same reason, I don't think it's a good idea to learn only one programming langage. -John]
Back to comp.compilers | Previous | Next — Next in thread | Find similar
Learning only one lexer made me blind to its hidden assumptions Roger L Costello <costello@mitre.org> - 2022-07-07 17:49 +0000
Re: Learning only one lexer made me blind to its hidden assumptions luser droog <luser.droog@gmail.com> - 2022-07-12 19:49 -0700
Re: Learning only one lexer made me blind to its hidden assumptions Juan Miguel Vilar Torres <jvilar@uji.es> - 2022-07-13 01:46 -0700
Re: Learning only one lexer made me blind to its hidden assumptions "Ev. Drikos" <drikosev@gmail.com> - 2022-07-13 14:58 +0300
Re: Learning only one lexer made me blind to its hidden assumptions antispam@math.uni.wroc.pl - 2022-07-13 19:52 +0000
Re: Learning only one lexer made me blind to its hidden assumptions George Neuner <gneuner2@comcast.net> - 2022-07-14 16:46 -0400
Re: Learning only one lexer made me blind to its hidden assumptions antispam@math.uni.wroc.pl - 2022-07-15 20:14 +0000
Re: Learning only one lexer made me blind to its hidden assumptions Kaz Kylheku <480-992-1380@kylheku.com> - 2022-07-15 14:16 +0000
csiph-web