Path: csiph.com!goblin3!goblin.stu.neva.ru!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.compilers
Subject: Re: Bounds checking, Optimization techniques and undefined behavior
Date: Thu, 9 May 2019 21:56:24 +0200
Organization: A noiseless patient Spider
Lines: 88
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <19-05-076@comp.compilers>
References: <19-04-021@comp.compilers> <19-04-023@comp.compilers> <19-04-037@comp.compilers> <19-04-039@comp.compilers> <19-04-042@comp.compilers> <19-04-044@comp.compilers> <19-04-047@comp.compilers> <19-05-004@comp.compilers> <19-05-006@comp.compilers> <19-05-016@comp.compilers> <19-05-020@comp.compilers> <19-05-024@comp.compilers> <19-05-025@comp.compilers> <19-05-028@comp.compilers> <19-05-029@comp.compilers> <19-05-034@comp.compilers> <19-05-045@comp.compilers> <19-05-058@comp.compilers> <19-05-062@comp.compilers> <19-05-073@comp.compilers>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="51550"; mail-complaints-to="abuse@iecc.com"
Keywords: standards, errors
Posted-Date: 10 May 2019 10:44:20 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Content-Language: en-GB
Xref: csiph.com comp.compilers:2311

On 09/05/2019 02:15, Bart wrote:
> On 08/05/2019 09:16, David Brown wrote:
>> On 08/05/2019 02:42, Andy Walker wrote:
>
>>>
>>>      Quite.  That's the sort of way that UB comes back to bite you.
>>
>> It is not the UB that is biting you, as such - it is the bug in the
>> code.  When your code is wrong (as mine is), according to the language
>> rules, then you can never expect a "correct" answer.  This sample
>> demonstrates how the compiler can assume that UB does not occur and give
>> faster and more efficient code, though it makes the result of the wrong
>> code unstable.
>
> This is actually a little worrying. The code includes these lines:
>
>                  if (i == 3) {
>                          pj[-1] = 5;
>                  }
>
> The programmer didn't just add them for no reason. Why then should a
> compiler just blithely ignore them?

It is a bug in the code, since it does not follow the rules of the
language.  So either the programmer put them in for no reason (i.e., by
mistake), or they he/she put them in for wrong reasons - there being no
possibility of valid reasons for incorrect code.

The compiler may do the best it can with bad code.  That can include
ignoring it, but can also include attempting the write while still
assuming it has no influence on "i".  When trying this code, I first had
"pj[1]" rather than "pj[-1]".  The compiler generated the write, which
led to a OS error message telling me the program had a "stack smashing"
fault.  Arguably, that is the second best thing the compiler could do.
(The best thing would be a compile-time error message, which gcc did not
give.)

>
> Either it should do what the code says, or explain why it's ignoring
> your instructions and doing something else instead. (I couldn't get gcc,
> clang or msvc to say anything about it. BTW msvc gives either 55 or 57,
> rather than 55 or 48.)
>

Often it is very difficult to give error messages for this kind of
thing, without also giving lots of false positive in other situations.
It can be difficult to understand how it can be that the compiler can
see there is something odd that it can use for optimisation, but not be
able to warn about the situation.

>
>>> You optimise some code, and the answer changes.
>>
>> Yes - that happens when your code has a bug.
>>
>> I have many times seen questions from programmers along the lines of "my
>> code works with optimisation disabled, but fails when optimisation is
>> enabled" - it means they have a bug in their code.
>
> And a few times I've found it to be a bug in a compiler.
>

I have occasionally found bugs in compilers - the possibility cannot be
excluded.  (I have reported a couple in gcc.)  But in most cases, the
great majority of cases, it is the user code that is at fault.

>>> There are worse
>>> possibilities, even without going beyond reasonableness;  take, for
>>> example Bart's "struct" with four integer members that he wants to
>>> treat as an array.  Suppose further that he tries to write to the i'th
>>> member of that array, even after checking that 0 <= i <= 3.  A non-
>>> checking compiler will allow that, even though it's UB if i > 0.  An
>>> optimising compiler may quite well, however, deduce that "therefore"
>>> i = 0, and carry that value of "i" forward in the analysis.  If the
>>> decision whether or not to "rm -rf /" depends on the value of "i",
>>> you can be well and truly up the creek without a paddle.
>
>> I note that in Bart's compiler, if he had four local "int" variables,
>> they would be aligned at 64-bit spacing.  He could quite reasonably have
>> the same thing in his struct.  Then trying to access the struct elements
>> as an array is bound to fail.
>
> No, struct layouts have to match C's rules for those, or at least be
> compatible with other compilers on the same platform.

C allows you to have whatever padding you want, but if you want
compatibility with other compilers or ABI's, then of course you need to
follow their layout.