Path: csiph.com!3.us.feeder.erje.net!feeder.erje.net!news.linkpendium.com!news.linkpendium.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: David Brown Newsgroups: comp.compilers Subject: Re: Bounds checking, Optimization techniques and undefined behavior Date: Wed, 8 May 2019 10:16:29 +0200 Organization: A noiseless patient Spider Lines: 75 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <19-05-062@comp.compilers> References: <19-04-021@comp.compilers> <19-04-023@comp.compilers> <19-04-037@comp.compilers> <19-04-039@comp.compilers> <19-04-042@comp.compilers> <19-04-044@comp.compilers> <19-04-047@comp.compilers> <19-05-004@comp.compilers> <19-05-006@comp.compilers> <19-05-016@comp.compilers> <19-05-020@comp.compilers> <19-05-024@comp.compilers> <19-05-025@comp.compilers> <19-05-028@comp.compilers> <19-05-029@comp.compilers> <19-05-034@comp.compilers> <19-05-045@comp.compilers> <19-05-058@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="53273"; mail-complaints-to="abuse@iecc.com" Keywords: C, standards Posted-Date: 08 May 2019 12:34:58 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Content-Language: en-GB Xref: csiph.com comp.compilers:2297 On 08/05/2019 02:42, Andy Walker wrote: > On 07/05/2019 10:04, David Brown wrote: >> It will not be slower in C - because the compiler knows that "i" is >> never changed in the loop.  (I'd like some way to have that enforced in >> C, but I don't know of any good method.) > >     There's rather little you can "enforce" in C unless the compiler > is on your side!  See below. > >> Just for a laugh, try this code: [largely snipped for brevity -- ANW] >> #include >>          int *pi = &i; >>          int *pj = &j; >>          printf("pi = %p, pj[-1] = %p\n", [...] > >     There is sanction in the Standard for adding one to a pointer to > a scalar, but not for subtracting one;  so this is UB [N1570, section > 6.5.6, paras 7, 8]. > Yes, it is UB. Even if "i" and "j" are swapped (or the compiler allocation different) so that you would be using pj[1], you are not allowed to dereference pj[1], merely to calculate it and use it for certain comparisons. >> [...] With no optimisation and a literal translation of >> the code, the result will be 48. > >     Yes, provided that the compiler doesn't enforce the Standard! > >>                     With optimisation, the compiler knows >> that access through "pj" cannot possibly affect "i" without invoking >> undefined behaviour - so it can simplify the loop to "sum = 55;" and >> shows that result. > >     Quite.  That's the sort of way that UB comes back to bite you. It is not the UB that is biting you, as such - it is the bug in the code. When your code is wrong (as mine is), according to the language rules, then you can never expect a "correct" answer. This sample demonstrates how the compiler can assume that UB does not occur and give faster and more efficient code, though it makes the result of the wrong code unstable. > You optimise some code, and the answer changes.  Yes - that happens when your code has a bug. I have many times seen questions from programmers along the lines of "my code works with optimisation disabled, but fails when optimisation is enabled" - it means they have a bug in their code. > There are worse > possibilities, even without going beyond reasonableness;  take, for > example Bart's "struct" with four integer members that he wants to > treat as an array.  Suppose further that he tries to write to the i'th > member of that array, even after checking that 0 <= i <= 3.  A non- > checking compiler will allow that, even though it's UB if i > 0.  An > optimising compiler may quite well, however, deduce that "therefore" > i = 0, and carry that value of "i" forward in the analysis.  If the > decision whether or not to "rm -rf /" depends on the value of "i", > you can be well and truly up the creek without a paddle. I note that in Bart's compiler, if he had four local "int" variables, they would be aligned at 64-bit spacing. He could quite reasonably have the same thing in his struct. Then trying to access the struct elements as an array is bound to fail. There are good reasons why this sort of thing is UB. And yes, if your decision to "rm -rf /" depends on UB, you are in trouble. You are equally in trouble if you have /any/ bug in your code, even if the code has fully defined behaviour according to the language. There is /nothing/ special about UB in that respect - it is just a bug.