Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #732
| From | "BartC" <bc@freeuk.com> |
|---|---|
| Newsgroups | comp.compilers |
| Subject | Re: Bison deterministic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support |
| Date | 2012-08-22 14:04 +0100 |
| Organization | A noiseless patient Spider |
| Message-ID | <12-08-015@comp.compilers> (permalink) |
| References | <12-08-005@comp.compilers> <12-08-006@comp.compilers> <12-08-009@comp.compilers> <12-08-014@comp.compilers> |
"BGB" <cr88192@hotmail.com> wrote in message > On 8/20/2012 8:35 AM, Anton Ertl wrote: >> Code generation and optimization do not change the relation between >> the time spent in scanning and in parsing. Moreover, if the compiler >> spends most of the time in code generation, optimization and/or >> "more", there is even less reason to worry about parsing speed. > > > (sorry in advance for sort of wandering off on a tangent...). > > > a major thing that I think skews things so much in the case of C is just > how much stuff can get pulled in by the preprocessor, the bulk of which > ends up quickly being discarded by subsequent stages (and much of the > rest is only minimally processed by later stages). I think much of that is the fault of the language and over-reliance on the pre-processor. Instead of relying on neat language features to minimise the sizes of headers, everything has to be spelled out in a long-winded way. Lack of proper namespaces (in lists of enumerations for example) means identifiers are long and unwieldy, which doesn't help. > if we spend 1000ms preprocessing and parsing the code, and 10ms or 100ms > compiling it, where has most of the time gone?... But even given all that, there are ways of dealing with huge header files so that it is not necessary to repeatedly tokenise and parse the same headers over and over again (for recompiling the same module, or compiling many modules all sharing the same headers). I've no idea whether many C compilers actually bother though; perhaps it's easier to just recommend a faster computer.. > OTOH, in my scripting language (which doesn't use headers), I have > generally been looking at millisecond-range compile times. Well, that's more like it. Compilation time simply shouldn't be an issue, not for compiling a single module anyway. And has never been a problem for me, ever, no matter what hardware I was using, partly thanks to using my own tools. I only get a delay when there are interdependencies and I just recompile everything for simplicity. Then I might have to twiddle my thumbs for 5-10 seconds. But then, I now have to rely on some external tools.. > although admittedly most of my compilers have tended to be pretty dumb, > typically working like (for the upper end): > split into tokens and parse the code directly into an AST; > perform basic simplifications on the AST (evaluating constant > expressions); > perform basic type-analysis / inference (generally forward propagation > driven by declarations and assignment operations); > flatten this out into a bytecode-style format (binary serialization is > optional). > > this is then currently followed by a backend which translates the > bytecode into threaded-code and runs this (generally also pretty fast, > and generally functions/methods are translated on call). I have two current compilers, one native code, and this one for bytecode for a dynamically typed, non-type-inferred language: source -> lexer/parser -> types -> code generator -> optim -> binary bytecode file The type pass does very little here, mainly checking l-values and reducing constant expressions. The optim pass does very little too, just reducing some common bytecode combinations into one. Nevertheless, the resulting bytecode, even with a straightforward, non JIT-ing interpreter, can make a basic lexer run at some 3M tokens/second as I mentioned in another post. The native code compiler is more involved (I hate these ones! But I need one to implement the interpreter): source -> lexer/parser -> names -> types -> intermediate code generator -> target code generator -> optim -> asm source file The optim stage does some peephole stuff, but haven't gone as far as having some variables allocated to registers. Last time I checked, it was perhaps 40-50% slower than gcc-O1, averaged over ~20 integer/floating-point-intensive benchmarks. That might do me. -- Bartc [I've seen C compilers that keep preparsed versions of headers. Dunno what they do with #if. Also see Microsoft's C# and other .NET languages, that put all of the type info in the objects, so you can use the object as a compiled include file. -John]
Back to comp.compilers | Previous | Next — Previous in thread | Next in thread | Find similar
Bison deterministic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support hsad005@gmail.com - 2012-08-17 11:22 -0700
Re: Bison deterministic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2012-08-18 10:13 +0100
Re: lexer speed, was Bison Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2012-08-20 01:01 +0100
Re: lexer speed, was Bison Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2012-08-20 16:14 +0100
Re: lexer speed, was Bison BGB <cr88192@hotmail.com> - 2012-08-20 14:14 -0500
Re: lexer speed, was Bison Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2012-08-21 07:40 +0100
Re: lexer speed, was Bison "BartC" <bc@freeuk.com> - 2012-08-21 17:39 +0100
Re: Bison deterministic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-08-20 13:35 +0000
Re: Bison deterministic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support BGB <cr88192@hotmail.com> - 2012-08-21 14:45 -0500
Re: Bison deterministic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support "BartC" <bc@freeuk.com> - 2012-08-22 14:04 +0100
Re: Bison deterministic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support BGB <cr88192@hotmail.com> - 2012-08-26 19:37 -0500
Bison deterministic LALR parser for Java/C++ "BartC" <bc@freeuk.com> - 2012-08-29 22:03 +0100
speeding up C recompilation, was Re: Bison deterministic LALR BGB <cr88192@hotmail.com> - 2012-09-04 13:45 -0500
Re: C include handling, was Bison deterministic LALR Marco van de Voort <marcov@toad.stack.nl> - 2012-09-05 08:40 +0000
Re: Bison deterministic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support hsad005@gmail.com - 2012-08-18 02:09 -0700
csiph-web