Path: csiph.com!xmission!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Christopher F Clark Newsgroups: comp.compilers Subject: Re: Compiler bootstrapping and the standard header files Date: Fri, 20 Mar 2020 06:21:56 -0400 Organization: Compilers Central Lines: 126 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <20-03-021@comp.compilers> References: <20-03-018@comp.compilers> <20-03-019@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="85939"; mail-complaints-to="abuse@iecc.com" Keywords: practice Posted-Date: 20 Mar 2020 11:45:40 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2492 Dodi gets the answer basically right. I'm going to say something similar in slightly different words. The good news is that you are doing this for C which was designed to be a relatively simple language to port, although you may even want to use a restricted dialect of C to make it even simpler. The simpler the dialect, the less runtime library you need (to get the bootstrap working). First, there are 3 interacting parts. They are all interconnected, but still separate. The compiler itself The header files The supporting runtime library There are also two bits of terminology you need to learn from cross-compiling. host target So, the machine you are compiling on and the compiler you are compiling with are considered the host. The machine that the program will run on and the runtime library that support it are considered the target. There are diagrams that illustrate this. They are called T-diagrams. Here is an ASCII rendition (excuse my drawing skills). + ------------------------- + | host headers target | + --- + + ----- + ------------------- + | compiler | host headers target | + ----------- + ---- + + ----- + | compiler | + ----------- + Where the target in the first T is the host in the second T. Everything else can be different. You can nest this diagram as many times as one likes. The typical bootstrapping process nests 3 Ts. I will explain why later. From that diagram you can see that the headers must match both the host (compiler) and the target (runtime). Let's now illustrate that with a couple of different scenarios. The first simplest scenario is you want to run the resulting program in the same environment (same target machine, same target runtime library) as the host environment. This is the way you bootstrap a new version of the compiler using the same runtime library. This "new version" might be this new compiler you are building from scratch. So, you take the program you want to compile (this will be the new version of the compiler), And plug it into the host box of the first T. The host compiler takes this program and the header files which match that compiler and target runtime library and compiles it to a target program that uses the target runtime library. You now have a new executable program (after linking) the you can run on the target machine. This new executable program, just happens to be your new [version of the] compiler. So, now you can take the source code of the program again and compile it with the new compiler (using the header which match that new compiler and target runtime library) and compile it again. If you repeat this process twice (that is 3 T boxes), the code generated should be roughly the same. There may be timestamps or similar artifacts that differ, that you have to filter out, but otherwise any differences are bugs in the compiler. The scenario gets a bit more complicated if you are building a cross-compiler (targeting a different machine than the host machine, or even just a different (and incompatible) runtime library on the host machine. In that case, your first host and target are the same machine, but your second target is a different machine (different runtime library). You may or may not be able to build a native (host) compiler on that second machine. It is quite common for embedded machines to lack all the facilities you need (e.g. file systems) to run a compiler on them. You don't need a compiler to run on the chip that runs your car engine or toaster. You just need a compiler that can generate code for that chip. However, if you are building a native compiler for that chip, then you need the 3 step T diagram. Hopefully, you can figure out from this, that: When compiling your compiler with some other compiler, you use the header files from that compiler (and that go with that runtime routine). You will note that cross-compilers (e.g. compilers that run on an x86 but compile code for an arm machine) may use different header files than the compiler from the same vendor that target the host machine. The header files must match both the compiler and the target runtime and target runtimes for different machines (even for the "same" compiler) can differ due to linker and OS dependencies. When compiling your compile with your own compiler, you must use the header files for your compiler. You may even have two different copies, if you are developing your own runtime library. One the matches the original compiler's runtime library, so you can use that and one the matches your own runtime library so you have something to "ship". ------- Finally, I am going to illustrate this process with one of the first compiler's I worked on. In 1978 I worked at Softech and we had a contract to build a cross-compiler from Multics to the Interdata 8/32 for the Jovial language (a new dialect called J73/C). We wanted our compiler to be written in Jovial, but there was no Jovial compiler on the Honeywell Multics machines. So, Carl Martin, my mentor at that time, wrote a translator (effectively a macro package) that translated a subset of Jovial into PL/I, with PL/l semantics, so you could only use a subset of Jovial where the semantics of it and PL/I aligned. But, that was ok, because you don't [shouldn't] need a lot of sophisticated semantics to write a compiler. So, then we wrote our first compiler in that subset. We then ran it through the translator (1st T diagram) and got out an equivalent PL/I program which we could compile with the Multics PL/I compiler. Then we ran that compiler through the Multics PL/I compiler (2nd T diagram) and got out a native Multics executable. Now, we had a program that you could run on the Mutics machine that would compile Jovial (and output Interdata 8/32 code--3rd T diagram). We also had a version that generated code for the Multics machine. I don't know if we ever built a native Jovial compiler on the Interdata machine (that would have been a 4th T diagram). In theory we could have, but the compiler was targeting embedded applications, so I don't know what OS support there was. ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris