Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #730
| From | "BartC" <bc@freeuk.com> |
|---|---|
| Newsgroups | comp.compilers |
| Subject | Re: lexer speed, was Bison |
| Date | 2012-08-21 17:39 +0100 |
| Organization | A noiseless patient Spider |
| Message-ID | <12-08-013@comp.compilers> (permalink) |
| References | <12-08-005@comp.compilers> <12-08-006@comp.compilers> <12-08-008@comp.compilers> |
"Hans-Peter Diettrich" <DrDiettrich1@aol.com> wrote in message >> [Compilers spend a lot of time in the lexer, because that's the only >> phase that has to look at the input one character at a time. -John] > > When the source code resides in a memory buffer, the time for reading > e.g. the characters of an identifier (in the lexer) is neglectable vs. > the time spent in lookup and entering the identifier into a symbol table > (in the parser). > > Even if a lexer reads single characters from a file, most OSs maintain > their own file buffer, so that little overhead is added over the > program-buffered solution. > > I really would like to see some current benchmarks about the behaviour > of current compilers and systems. I was recently testing a new language with a bare lexer for a near-identical syntax. This test, which excluded file input and any identifer lookups (just tokenising), ran at 17M chars/second, 3M tokens/second, and some 800K lines/second, on a low-end desktop PC. At that rate, tokenising the entire compiler might take 25msec. If that represented the bulk of the compilation time, then I'd be quite happy! But I suspect it will be just a fraction. While lexers do have to deal with a character at a time, the processing might be quite simple (eg. merely six instructions for each character of an identifier in my version). Oh, I should mention that this lexer is written in an interpreted, dynamically typed language. That means a native code version might process some 20M tokens per second (3 or 4 msec to scan the entire compiler). That doesn't really suggest it's going to be a bottleneck! > [The benchmarks I did were a while ago, but they showed a large > fraction of time in the lexer. ... -John] Testing an older compiler (which doesn't lend itself to isolating just the bare tokeniser), showed it spent about 50% of compilation time in tokenising, lookups and parsing. This generates simple byte-code, so with a more sophisticated one with more passes, optimisation and so on, it would likely be even less. (But it's also possible my compilers are highly inefficient apart from the tokenising parts..) -- Bartc
Back to comp.compilers | Previous | Next — Previous in thread | Next in thread | Find similar
Bison deterministic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support hsad005@gmail.com - 2012-08-17 11:22 -0700
Re: Bison deterministic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2012-08-18 10:13 +0100
Re: lexer speed, was Bison Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2012-08-20 01:01 +0100
Re: lexer speed, was Bison Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2012-08-20 16:14 +0100
Re: lexer speed, was Bison BGB <cr88192@hotmail.com> - 2012-08-20 14:14 -0500
Re: lexer speed, was Bison Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2012-08-21 07:40 +0100
Re: lexer speed, was Bison "BartC" <bc@freeuk.com> - 2012-08-21 17:39 +0100
Re: Bison deterministic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-08-20 13:35 +0000
Re: Bison deterministic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support BGB <cr88192@hotmail.com> - 2012-08-21 14:45 -0500
Re: Bison deterministic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support "BartC" <bc@freeuk.com> - 2012-08-22 14:04 +0100
Re: Bison deterministic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support BGB <cr88192@hotmail.com> - 2012-08-26 19:37 -0500
Bison deterministic LALR parser for Java/C++ "BartC" <bc@freeuk.com> - 2012-08-29 22:03 +0100
speeding up C recompilation, was Re: Bison deterministic LALR BGB <cr88192@hotmail.com> - 2012-09-04 13:45 -0500
Re: C include handling, was Bison deterministic LALR Marco van de Voort <marcov@toad.stack.nl> - 2012-09-05 08:40 +0000
Re: Bison deterministic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support hsad005@gmail.com - 2012-08-18 02:09 -0700
csiph-web