Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Kaz Kylheku <480-992-1380@kylheku.com> Newsgroups: comp.compilers Subject: Re: What does it mean to "move characters" in the lexer? Date: Wed, 22 Jun 2022 01:05:52 -0000 (UTC) Organization: A noiseless patient Spider Lines: 28 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-06-065@comp.compilers> References: <22-06-057@comp.compilers> Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="24266"; mail-complaints-to="abuse@iecc.com" Keywords: lex, performance Posted-Date: 21 Jun 2022 21:47:45 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3087 On 2022-06-21, Roger L Costello wrote: > Hi Folks, > > Page 89 of the dragon book says: > > Because a large amount of time can be consumed moving characters, specialized > buffering techniques have been developed to reduce the amount of overhead to > process an input character. It's not clear what exactly this is referring to, but probably just to the practice of making multiple copies of the same data in the processing stack. If we put an imaginary tracer on a single character, we may see it hopping among multiple buffers. In the Chaper 2 lexical analyzer, getchar is used to read a character; getchar fills a buffer in the stdio stream, and the program is sucking it out from there one character at a time. So to build a lexeme, it has to have its own buffer for the lexeme, which is another copy of the data. The technique described in chapter 3 allows bulk reads into a buffer, eliminating the stream library. The tokens are delimited right inside the buffer, reducing a copy. [The technique seems closely related to the "flip buffer", which can be found in places like TTY implementations of Unix kernels. Linux had one; there was once a "struct tty_flip_buffer". At a quick glance, it looks like that today there are nmore than two buffers used which are allocated and freed on the fly. There is still a which contains a modicum of "flip" terminology,]