Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!border3.nntp.dca.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED!nerds-end From: BGB Newsgroups: comp.compilers Subject: Re: Bison =?UTF-8?B?ZGV0ZXJtaW5pc+KAi3RpYyBMQUxSKDEpIHBhcnNlciBm?= =?UTF-8?B?b3IgSmF2YS9DKysgKGtpbmQgb2YgY29tcGxleCBsYW5nYXVnZSkgd2l0aG91dCA=?= =?UTF-8?B?J2xleGFyIGhhY2snIHN1cHBvcnQ=?= Date: Tue, 21 Aug 2012 14:45:33 -0500 Organization: albasani.net Lines: 142 Sender: johnl@iecc.com Approved: comp.compilers@iecc.com Message-ID: <12-08-014@comp.compilers> References: <12-08-005@comp.compilers> <12-08-006@comp.compilers> <12-08-009@comp.compilers> NNTP-Posting-Host: news.iecc.com Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: leila.iecc.com 1345597159 14928 64.57.183.58 (22 Aug 2012 00:59:19 GMT) X-Complaints-To: abuse@iecc.com NNTP-Posting-Date: Wed, 22 Aug 2012 00:59:19 +0000 (UTC) Keywords: C, performance Posted-Date: 21 Aug 2012 20:59:19 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:731 On 8/20/2012 8:35 AM, Anton Ertl wrote: > Hans-Peter Diettrich writes: >>> [Get it working in bison, then in the unlikely event that's not fast >>> enough, profile your compiler to see where it's spending its time and >>> fix what needs to be fixed. Although in theory GLR can be very slow, >>> in practice the ambiguities are generally resolved within a few tokens >>> and the performance is fine. compilers always spend way more time in >>> the lexer than the parser anyway. Writing RD parsers by hand can be >>> fun, but you never know what language it actually parses. -John] >> >> There exist parser generators for several models. I also doubt that - >> except in misdesigned C-ish languages - a compiler spends significant >> time in the lexer. This may be true for dummy parsers, which do >> nothing but syntax checks, but not for compilers with code generation, >> optimization and more. > > Code generation and optimization do not change the relation between > the time spent in scanning and in parsing. Moreover, if the compiler > spends most of the time in code generation, optimization and/or > "more", there is even less reason to worry about parsing speed. (sorry in advance for sort of wandering off on a tangent...). a major thing that I think skews things so much in the case of C is just how much stuff can get pulled in by the preprocessor, the bulk of which ends up quickly being discarded by subsequent stages (and much of the rest is only minimally processed by later stages). actually, a fairly large portion of the time may actually be spent in the preprocessor, which may basically sit around, for each source line: splitting it into tokens; checking if any of the identifiers in the source-line are names for macros (and expanding as necessary); repeating the above until no more macro expansions are seen; recomposing the line and spitting it into the output. then, for example, there are typically large numbers of typedefs and structure definitions, very few of which may end up actually being used by the code in question (most just get put into a table and then forgotten). then, after all this, the compiler is typically left looking at a fairly small chunk of (actual) code. if we spend 1000ms preprocessing and parsing the code, and 10ms or 100ms compiling it, where has most of the time gone?... OTOH, in my scripting language (which doesn't use headers), I have generally been looking at millisecond-range compile times. although admittedly most of my compilers have tended to be pretty dumb, typically working like (for the upper end): split into tokens and parse the code directly into an AST; perform basic simplifications on the AST (evaluating constant expressions); perform basic type-analysis / inference (generally forward propagation driven by declarations and assignment operations); flatten this out into a bytecode-style format (binary serialization is optional). this is then currently followed by a backend which translates the bytecode into threaded-code and runs this (generally also pretty fast, and generally functions/methods are translated on call). my code-generator backend was a little more involved, basically doing its own type-analysis, mapping the stack to registers and temporaries, and using peephole optimization. this backend fell into disuse mostly because I was having a hard time really debugging it, and it was eating up a lot of my time trying to do so (threaded code generally being a simpler option). unlike many other compilers though, my stuff is generally based on a stack-machine model, so pretty much everything (from intermediate values, to handling local variables and environment scopes) is handled using a stack-machine model. for example, in my bytecode format, local variables (and function arguments, and parent scopes) are identified by their position on the variable stack (with declarations pushing new variables onto the stack, and other operations popping them back off). note that my bytecode is lexically scoped, but tends to see the entire lexical environment as a big stack (which may actually be split across multiple call frames, or have parts be allocated on the heap as well). there are also the dynamic, object ("this" scope), and toplevel environments, but lookups into these are given by name or by name+signature, rather than by an explicit stack index as in the case of the lexical environment. note that packages/import build on delegation, which is built on the object scope (packages are actually implemented as objects, and imports are actually lexical variables used to delegate to these objects). but, in general, I just seem to find the stack model generally a lot easier to understand and work with. my code generator also tied register allocation and lifetime to these stacks as well, although later on it had added temporaries which were identified by handle and had an independent lifetime, rather than being tied to the stack-machine model (and were, ironically, much closer to how the CPU stack was actually being utilized by the code generator, as the evaluation stack and local/environment stack were not mapped 1:1 with the CPU stack, whereas the temp-vars were keyed 1:1 with their physical storage). however, it was introducing temporaries and registers which existed independently of the stack-machine model which is where the pain began. if I ever get back to working on or writing a new native code generator, I will probably just put the temporaries on their own stack, thus any temporary entering its lifetime is pushed onto the temp-var stack, and popped off when its lifetime ends, mostly in the name of being less of a PITA to work with. note that, in any case, the actual stack-frame (at the machine-level) is basically a fixed-size region which is divided up and allocated (as appropriate) for things like holding variables and evaluation-stack entries (and treated as an array of machine-words allocated using a bitmap). note that some minor ABI hacks were made to allow for things like lexical scoping, but otherwise it was using the same ABI as C (C style block functions being callable directly as C function pointers). currently, the threaded-code uses a more narrowly defined subset of the C ABI, basically where all calls have exactly 2 arguments: the VM context and the current threaded-opcode context. this, combined with most of the logic being interpreter-style and written in C, does a bit of harm to the performance though (but, it is still "fast enough" for general application scripting tasks). but, hopefully with hope and luck, I will be able to justify the effort to be able to return to using a full-featured JIT. granted, I could always just be like, "hell with it, I am just going to do it..." (whether or not it is worthwhile), which sort of ends up being invoked occasionally (for things where I can't really justify doing, but want to do them anyways...).