Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Roger L Costello Newsgroups: comp.compilers Subject: What does it mean to "move characters" in the lexer? Date: Tue, 21 Jun 2022 10:27:15 +0000 Organization: Compilers Central Lines: 29 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-06-057@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="85436"; mail-complaints-to="abuse@iecc.com" Keywords: history, comment Posted-Date: 21 Jun 2022 12:25:09 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Thread-Topic: What does it mean to "move characters" in the lexer? Thread-Index: AdiFWBix4QF9p6qWTPmZjnkljZpiHA== Accept-Language: en-US Content-Language: en-US Xref: csiph.com comp.compilers:3081 Hi Folks, Page 89 of the dragon book says: Because a large amount of time can be consumed moving characters, specialized buffering techniques have been developed to reduce the amount of overhead to process an input character. Page 102 of "A Retargetable C Compiler: Design and Implementation" says: The lexical analyzer's main activity is moving characters, so minimizing the amount of character movement helps increase speed. And on page 103 it says: An important consequence of this design is that most of the input characters are accessed by *cp and many characters are never moved. Only identifiers (excluding keywords) and string literals that appear in executable code are copied out of the buffer into permanent storage. I don't understand what they mean by "moving characters". Do they mean copying characters? Do they mean reading characters from a file into memory? Would you explain what this "character movement" thing is all about, please? /Roger [Keeping in mind that this was written in the 1970s, they mean copying strings of characters from one place to another. On a PDP-11. With 64K bytes of memory. It is still true that character processing in a lexer consumes a large fraction of the time in compilers. -John]