Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Christopher F Clark Newsgroups: comp.compilers Subject: What attributes of a programming language simplify its implementation? Date: Fri, 30 Sep 2022 12:46:28 +0100 Organization: Compilers Central Lines: 127 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-09-026@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="57283"; mail-complaints-to="abuse@iecc.com" Keywords: design Posted-Date: 30 Sep 2022 21:42:18 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3178 I answered this question on Quora, but I think it is relevant to this community (and I know I'll get discussion as a result).. What attributes of a programming language simplify its implementation. 1. Simple semantics. That's it. Simple semantics. (Simple meaning whatever is easy to implement. Not mathematical elegance. Not consistency.) How do you get there? Have a very simple set of types. BASIC had numbers, strings, and arrays. Don't worry about type conversions and floating point versus integer. Sweep that all under the rug. Whatever your implementation does, that's what it does. (Even simpler is what a lot of shells do, you have just "strings" and if the strings happen to be a number when you pass them to the "add function", + operator, it does arithmetic. If they aren't it, whatever it does is the definition.) Do an interpreter rather than a compiler. Don't try to get "efficient" machine code. Just get code that works, for your simple cases. See the paragraph above. Whatever your interpreter does, that's what it does. Don't get fancy. The original C compilers were almost like BASIC, just slightly more complex. And even though they were compilers not interpreters. You got whatever code they generated. It just happened (well, actually a lot of theory went into making it "just happen") to easily match the machine/assembly language of the machines of that era. Even the stuff that was added to C was often done so to keep the implementation simple. Header files are a good example. They let you put together slightly more complex programs, but they only work if the programmer uses them right. If you have inconsistent conflicting header files, you get "undefined behavior" a code word for "whatever the implementor decided to do". Maybe (if you are lucky) you get an error, but maybe you get code that just doesn't work. ------------------------------ But static typing. No. It doesn't help. Simplicity of implementation wants you to throw away all those types. What static typing gives you is reliable and well-defined programs, not a simple implementation. Ahead of time compilation, same thing. Does not make the implementation easier. It has other attributes but simplicity of implementation is not necessarily one of them. (In some cases it can be simpler, but not always. an interpreter is almost always simpler than any compiler for the same amount of functionality.) ------------------------------ *Edit added:* By the way, that's how many introductory Compiler classes are structured. Take a language with a relatively simple language (C or Pascal are popular choices, lisp dialects are even simpler) and then throw things out. One type "int" which is a fixed width (e.g. 32 bit) signed integer, no conversions. Allow only one function "main". Allow only one arithmetic operation "add" (+). Allow only one comparison "equal" (==). If you are generating code rather than doing an interpreter, pick the simplest architecture you can (e.g. MIPS) and then only allow constants of 16 bits so you don't need hi/lo. Now, you have a simple enough language that a student can likely get it working in one semester (or even one quarter). Believe it or not, that's actually how a lot of "real" compilers are written. You do a "spike" that is pick one *exceptionally* simple case and get it working end-to-end. Then, you build around that. If something looks, hard, you do a new spike that makes that issue as simple as possible and get that working. ------------------------------ Even C++ was built that way. It started with a working C compiler as a base(*). Then Stroustrup added, feature by feature (probably using C macros) the things he wanted to make it object-oriented, to make it "C with classes". He didn't start with multiple-inheritance and templates and the STL. You can even see the results of that in the design of C++. I suspect the weird way that constructors take parameters as ctor_name(arg1, arg2, arg3) comes from that. Ctors were probably initially turned into macros and that was C's syntax for macros. The fact that it makes certain declarations ambiguous wasn't noticed because in the "spike" they worked as intended. The complexity of the other case (how you sometimes can't tell a function declaration from a constructor call) was ignored until later. Similarly, the fact that you need to use "new" and "delete" instead of "malloc" and "free". The same thing. In a spike that made it easy. Fixing malloc and free to know when things had ctors and initializing them properly would have been more work. Adding new functions that did so was easier. Thus simplicity of implementation ruled and the complexity for users was not factored in. I could go on. Even later when C++ had a standards committee, things were added one feature at a time. The STL didn't exist until after C++ has templates. The move semantics rules were a patch to fix up a case where things that were initially simple didn't do what users wanted. But again, they were done as a "spike" add only one feature at a time. And sometimes, one has to add new features or specifications to fix up the interaction of the features which slowly acreted. *) And starting with a C compiler as a base, gave Stroustrup a simple model to start with. Writing C code is easier than writing assembly code, even for a PDP-11. Again, simplify as much as possible to make one's implementation easy. Lots of "lisp" interpreters are written in lisp, because that's an easy way to express lisp's semantics. You then have a small program written in lisp, that you need to hand-implement. Once that program works, you bootstrap your way up to the whole interpreter you want. When we did a Jovial compiler at my first job, we started with PL/I macros that gave us a subset of Jovial that we needed. We didn't worry about the cases where the PL/I semantics weren't exactly the same as Jovial, we weren't going to use those features anyway. Again, sweep any hard semantics under the rug and don't worry about them. Make your implementation simple and accept whatever semantics it gives you. Label anything that doesn't work the way you want in your implementation, "undefined behavior". ------------------------------ By the way Richard P Gabriel famously wrote about this, coining the phrase "Worse is better". Here is a link to a Wikipedia article derived from his ideas. -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris