Path: csiph.com!3.us.feeder.erje.net!feeder.erje.net!news.linkpendium.com!news.linkpendium.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Christopher F Clark Newsgroups: comp.compilers Subject: Re: Bounds checking, Optimization techniques and undefined behavior Date: Mon, 6 May 2019 05:05:52 -0400 Organization: Compilers Central Lines: 145 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <19-05-035@comp.compilers> References: <19-04-021@comp.compilers> <19-04-023@comp.compilers> <19-04-037@comp.compilers> <19-04-039@comp.compilers> <19-04-042@comp.compilers> <19-04-044@comp.compilers> <19-04-047@comp.compilers> <19-05-004@comp.compilers> <19-05-006@comp.compilers> <19-05-016@comp.compilers> <19-05-020@comp.compilers> <19-05-024@comp.compilers> <19-05-025@comp.compilers> <19-05-028@comp.compilers> <19-05-029@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="35143"; mail-complaints-to="abuse@iecc.com" Keywords: standards, errors, C++ Posted-Date: 06 May 2019 10:43:15 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:2270 While I join this conversation with trepidation as it is generating more heat than light, I need to make a few points. 1) “Built in” Array bounds checking can be done in C. It actually can be done in assembly language. And, by built in, I mean when the system enforces it even when the user doesn’t explicitly write it. (More on this below.) 2) C++ with vectors supports exactly the kind of for-loops suggested. In fact, it seems to be the preferred idiom these days. I know item 1 for a fact, because while I was at DEC circa 1995, one of my tasks was working on “Third degree”. Third degree was a tool written by DEC Labs for the Alpha computers to replace “Purify” (aka “Saber C”) which Rational Software wanted $10 million to port and $100k/per instance. A similar tool was (is?) available for Intel PCs called “Bounds Checker”. Valgrind does roughly the same stuff if I understand it right. And those aren’t the only tools to do so. Malloc like libraries that do some of the checking are also quite common. So, don’t blame C/C++ or the relevant compilers that you aren’t getting your array bounds checked, or that you have dangling pointers or garbage that isn’t deallocated. It can be done. That includes if you want slices. I don’t know any tool that does that explicitly, but it wouldn’t be hard to write. Anyway, the point I wanted to make is that Third degree didn’t even need the original source code. Third degree worked from the object files (or from an executable file). It decompiled them, instrumented them, and checked for all the common errors. The basic ideas that it was based upon were all developed in the 1960s (or perhaps before). It isn’t necessarily the declaration in source code that gives an object its bounds. An object naturally has bounds. You just have to understand the underlying semantic model. Again, this was realized in the 1960s if not before. If your model says that something handed back by the allocator is an object and that you cannot reference beyond it. You can create the requisite fat pointer. You can find all the places in the code, where the fat pointer is used. You can instrument them with whatever checks you need. You can even apply optimization checks (and Third degree did) that eliminate the need for checks that are redundant and cannot fail (or vice versa, that will always fail). You can do that for arithmetic too. You don’t need support from the compiler. You just need to understand the semantics you want. Now, that doesn’t mean you can solve the Halting problem. Code that generates code and jumps to it is significantly harder to make work. But, even that can be solved if you know what semantics you want to support. Debuggers have long inserted “trap instructions” into code that replace bits of code to be checked, so that if the code is executed, it goes and does some other code sequence that has the same semantics but with additional checks. Think aspect oriented programming at the machine code level. You can even get that if your model of generated code is C. You can generate C that applies the checks you want (and only those you want). There is nothing in the language that prevents you from doing that. In fact, the nature of C is to make that possible. That’s why C is the modern portable assembly language. Undefined behavior is just there because not every underlying machine supports exactly the same semantics and if you want a universal assembler, you have to deal with the fact that the minimal coding sequence may have different results given the different underlying semantics. Even if you stay in the x86 family. The best code for an 8086 is not the best code for an amd64 machine and the semantic models are different despite there being a portable subset that would work on both. So, the hard part is defining a balance that one presents to the users. The balance C struck is that the code will be simple “assembly like” and you will get something as close to what you would have hand written in assembler (without checks) as the compiler can give you. That includes assuming that the rules of arithmetic apply and that x*4/4 (so macros where the two constant 4s may come from different places, but you still want them to cancel work) is still x despite overflows, but if the compiler cannot detect the overflow and it happens you get whatever the machine gives you. That’s exactly the code I want. Because, I know that there will be places in my code where something goes wrong no matter how carefully I (and my team mates) craft it, we are human and make mistakes. Thus, we know how to practice defensive programming. Programming that checks for a pointer that in theory should never be null is actually not null and those checks still catch things, things like the ECC code on the chip missing a double-bit error caused by stray radiation or a transistor that switched too slowly. The code was correct, but quantum mechanics got in the way. And that’s reality. Mathematically correct doesn’t catch the fact that the under carefully controlled conditions of heat and pressure the system does what it pleases. And, yes, I have fixed bugs like that, where the source code was correct, but the resulting system didn’t work for reasons outside the programmers control. But, of course, many more bugs where the source code simply wasn’t correct or bugs because the spec changed and the code hadn’t been updated yet. Now, most people can’t deal with that. They want to imagine that their world is safe and that they can write simple code that will “just work”. My stint at Google showed me that point of view in spades. So, you can turn on -Wall, which many people do, and many teams require. Then you find, you cannot write the code (at least not and have it pass by *all* compilers with no warnings/errors). int32_t sum, increment, max; sum = 0; max = something_far_less_than_maxint6; for something { increment = some_small_value; // such that sum + increment will not overflow, not even an int16 if (max <= increment + sum) break; sum += increment; // with -Wall, you get a warning about possible overflow } The compile time checks are simply not clever enough to realize that this cannot overflow. You cannot even fix it with: Int64_t added; added = sum + increment; if (max <= added) break; sum = added; // -Wall still warns here about potential overflow (loss of data) There are no casts you can do. No coding tricks, short of applying a compiler specific pragma to suppress the warning. Until we have much smarter compilers, this is our fate. Kvetching about C is not going to fix that. Of fast, easy to write, and always correct you can pick 2. It’s not the fault of C, that you cannot get all 3. -- ***************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris -----------------------------------------------------------------------------