Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #732

Re: Bison determinis​tic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support

From "BartC" <bc@freeuk.com>
Newsgroups comp.compilers
Subject Re: Bison determinis​tic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support
Date 2012-08-22 14:04 +0100
Organization A noiseless patient Spider
Message-ID <12-08-015@comp.compilers> (permalink)
References <12-08-005@comp.compilers> <12-08-006@comp.compilers> <12-08-009@comp.compilers> <12-08-014@comp.compilers>

Show all headers | View raw


"BGB" <cr88192@hotmail.com> wrote in message
> On 8/20/2012 8:35 AM, Anton Ertl wrote:

>> Code generation and optimization do not change the relation between
>> the time spent in scanning and in parsing.  Moreover, if the compiler
>> spends most of the time in code generation, optimization and/or
>> "more", there is even less reason to worry about parsing speed.
>
>
> (sorry in advance for sort of wandering off on a tangent...).
>
>
> a major thing that I think skews things so much in the case of C is just
> how much stuff can get pulled in by the preprocessor, the bulk of which
> ends up quickly being discarded by subsequent stages (and much of the
> rest is only minimally processed by later stages).

I think much of that is the fault of the language and over-reliance on the
pre-processor. Instead of relying on neat language features to minimise the
sizes of headers, everything has to be spelled out in a long-winded way.
Lack of proper namespaces (in lists of enumerations for example) means
identifiers are long and unwieldy, which doesn't help.

> if we spend 1000ms preprocessing and parsing the code, and 10ms or 100ms
> compiling it, where has most of the time gone?...

But even given all that, there are ways of dealing with huge header files so
that it is not necessary to repeatedly tokenise and parse the same headers
over and over again (for recompiling the same module, or compiling many
modules all sharing the same headers).

I've no idea whether many C compilers actually bother though; perhaps it's
easier to just recommend a faster computer..

> OTOH, in my scripting language (which doesn't use headers), I have
> generally been looking at millisecond-range compile times.

Well, that's more like it. Compilation time simply shouldn't be an issue,
not for compiling a single module anyway.

And has never been a problem for me, ever, no matter what hardware I was
using, partly thanks to using my own tools.

I only get a delay when there are interdependencies and I just recompile
everything for simplicity. Then I might have to twiddle my thumbs for 5-10
seconds. But then, I now have to rely on some external tools..

> although admittedly most of my compilers have tended to be pretty dumb,
> typically working like (for the upper end):
> split into tokens and parse the code directly into an AST;
> perform basic simplifications on the AST (evaluating constant
> expressions);
> perform basic type-analysis / inference (generally forward propagation
> driven by declarations and assignment operations);
> flatten this out into a bytecode-style format (binary serialization is
> optional).
>
> this is then currently followed by a backend which translates the
> bytecode into threaded-code and runs this (generally also pretty fast,
> and generally functions/methods are translated on call).

I have two current compilers, one native code, and this one for bytecode for
a dynamically typed, non-type-inferred language:

source -> lexer/parser -> types -> code generator -> optim -> binary
bytecode file

The type pass does very little here, mainly checking l-values and reducing
constant  expressions. The optim pass does very little too, just reducing
some common bytecode combinations into one. Nevertheless, the resulting
bytecode, even with a straightforward, non JIT-ing  interpreter, can make a
basic lexer run at some 3M tokens/second as I mentioned in another post.

The native code compiler is more involved (I hate these ones! But I need one
to implement the interpreter):

source -> lexer/parser -> names -> types -> intermediate code generator ->
target code generator -> optim -> asm source file

The optim stage does some peephole stuff, but haven't gone as far as having
some variables allocated to registers. Last time I checked, it was
perhaps 40-50% slower than gcc-O1, averaged over ~20
integer/floating-point-intensive benchmarks. That might do me.

--
Bartc
[I've seen C compilers that keep preparsed versions of headers.  Dunno
what they do with #if.  Also see Microsoft's C# and other .NET languages,
that put all of the type info in the objects, so you can use the object
as a compiled include file. -John]

Back to comp.compilers | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Bison determinis​tic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support hsad005@gmail.com - 2012-08-17 11:22 -0700
  Re: Bison determinis​tic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2012-08-18 10:13 +0100
    Re: lexer speed, was Bison Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2012-08-20 01:01 +0100
      Re: lexer speed, was Bison Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2012-08-20 16:14 +0100
      Re: lexer speed, was Bison BGB <cr88192@hotmail.com> - 2012-08-20 14:14 -0500
        Re: lexer speed, was Bison Hans-Peter Diettrich <DrDiettrich1@aol.com> - 2012-08-21 07:40 +0100
      Re: lexer speed, was Bison "BartC" <bc@freeuk.com> - 2012-08-21 17:39 +0100
    Re: Bison determinis​tic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2012-08-20 13:35 +0000
      Re: Bison determinis​tic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support BGB <cr88192@hotmail.com> - 2012-08-21 14:45 -0500
        Re: Bison determinis​tic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support "BartC" <bc@freeuk.com> - 2012-08-22 14:04 +0100
          Re: Bison determinis​tic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support BGB <cr88192@hotmail.com> - 2012-08-26 19:37 -0500
            Bison deterministic LALR parser for Java/C++ "BartC" <bc@freeuk.com> - 2012-08-29 22:03 +0100
              speeding up C recompilation, was Re: Bison deterministic LALR BGB <cr88192@hotmail.com> - 2012-09-04 13:45 -0500
              Re: C include handling, was Bison deterministic LALR Marco van de Voort <marcov@toad.stack.nl> - 2012-09-05 08:40 +0000
  Re: Bison determinis​tic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support hsad005@gmail.com - 2012-08-18 02:09 -0700

csiph-web