Groups > comp.compilers > #2043 > unrolled thread

Language standards vs. implementation, was Re: A right alternative to IEEE-754's format

Started by	"Walter Banks" <walter@bytecraft.com>
First post	2018-04-10 11:07 -0400
Last post	2018-05-05 20:28 +0200
Articles	4 — 4 participants

Back to article view | Back to comp.compilers

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Language standards vs. implementation, was Re: A right alternative to IEEE-754's format "Walter Banks" <walter@bytecraft.com> - 2018-04-10 11:07 -0400
    Re: Language standards vs. implementation, was Re: A right alternative to IEEE-754's format Gene Wirchenko <genew@telus.net> - 2018-04-10 11:07 -0700
      Re: Language standards vs. implementation, was Re: A right alternative to IEEE-754's format Martin Ward <martin@gkc.org.uk> - 2018-04-12 09:52 +0100
        Re: Language standards vs. implementation, was Re: A right alternative to IEEE-754's format albert@cherry.spenarnc.xs4all.nl (Albert van der Horst) - 2018-05-05 20:28 +0200

#2043 — Language standards vs. implementation, was Re: A right alternative to IEEE-754's format

From	"Walter Banks" <walter@bytecraft.com>
Date	2018-04-10 11:07 -0400
Subject	Language standards vs. implementation, was Re: A right alternative to IEEE-754's format
Message-ID	<18-04-018@comp.compilers>

 [[ this string is copied from comp.arch because your moderation found it interesting ]]

On 2018-04-01 11:52 AM, Tim Rentsch wrote:
> anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>
>> A responsible software maintainer does not change behaviour that
>> users make use of.  See, e.g.,
>> <https://felipec.wordpress.com/2013/10/07/the-linux-way/>.
>
> Interesting article.  Thank you for posting it.
>
>> Unfortunately, there's an epidemic of irresponsibility among C
>> compiler maintainers.
>
> I can't completely agree with this reaction.  In some ways, sure, but
> for choices that are allowed because of undefined behavior the
> question is not so black-and-white.  Some of the responsibility
> belongs to the ISO C standard (and the people who produce it).
> Unfortunately it's a difficult problem;  I know there is interest in
> the ISO C group to find a middle ground, somewhere between
> unspecified behavior and undefined behavior, but it isn't easy to
> find that.  For example, consider this reasonable-sounding rule: no
> library interface should ever result in undefined behavior, not
> counting things like bad pointer inputs (and null pointers should
> never be in the set of bad inputs).  But what about printf()?  In
> printf() we have an interface with large parts of the input domain
> that give undefined behavior.  POSIX takes advantage of this to
> define the behavior of positional format specifications, which are
> quite useful in some contexts.  But, and here is the important part,
> formats other than those allowed in the POSIX spec /are still
> undefined behavior/.  Moreover that freedom is important, to allow
> further extensions to be added at some later date.
>
> I should add that I am mostly on your side.  I think what compiler
> writers are doing with so-called "aggressive optimization" belongs
> more to the problem set than the solution set.  But solving the
> problem has to include getting changes made to the ISO C standard, so
> that compiler writers have no choice if they want their stuff to be
> conforming.  I know doing that is not an easy task; ultimately though
> it seems unavoidable if we are to get things to improve.
>

As someone who has done a significant amount of compiler development
there really never enough testing. Once past sanity tests and detailed
test suites a lot can be gained just by running regression tests even if
the tests are not executed. Detailed metrics alone can be a very
revealing indication of significant compiler problems.

This can be especially true while developing optimization a surprising
number of new optimizations do not have the intended effect on old
functioning programs.

w..

[toc] | [next] | [standalone]

#2052

From	Gene Wirchenko <genew@telus.net>
Date	2018-04-10 11:07 -0700
Message-ID	<18-04-033@comp.compilers>
In reply to	#2043

On Tue, 10 Apr 2018 11:07:44 -0400 (EDT), "Walter Banks"
<walter@bytecraft.com> wrote:

[snip]

>This can be especially true while developing optimization a surprising
>number of new optimizations do not have the intended effect on old
>functioning programs.

     I am a compiler non-expert.  Could you give some non-trivial
examples (or point to some), please?

Sincerely,

Gene Wirchenko

[toc] | [prev] | [next] | [standalone]

#2062

From	Martin Ward <martin@gkc.org.uk>
Date	2018-04-12 09:52 +0100
Message-ID	<18-04-044@comp.compilers>
In reply to	#2052

On 10/04/18 19:07, Gene Wirchenko wrote:
> On Tue, 10 Apr 2018 11:07:44 -0400 (EDT), "Walter Banks"
> <walter@bytecraft.com> wrote:
>
> [snip]
>
>> This can be especially true while developing optimization a surprising
>> number of new optimizations do not have the intended effect on old
>> functioning programs.
>
>      I am a compiler non-expert.  Could you give some non-trivial
> examples (or point to some), please?

As I understand it, a major cause is the 199 or so cases
of "undefined behaviour" in the C standard. People write programs
which rely on the compiler doing a particular thing,
then an optimisation is introduced which "exploits" the undefined
behaviour (usually to delete code or tests), and the program
stops working as expected

These posts give some examples:

https://blog.regehr.org/archives/213

https://blog.regehr.org/archives/759

Gcc may optimize out tests for buffer overflows
because of integer overflows:

https://lwn.net/Articles/278137/

Quote:

     if (buffer + len >= buffer_end)
	die_a_gory_death("len is out of range\n");

Here, the programmer is trying to ensure that len (which might come from
an untrusted source) fits within the range of buffer. There is a
problem, though, in that if len is very large, the addition could cause
an overflow, yielding a pointer value which is less than buffer. So a
more diligent programmer might check for that case by changing the code
to read:

     if (buffer + len >= buffer_end || buffer + len < buffer)
	loud_screaming_panic("len is out of range\n");

This code should catch all cases; ensuring that len is within range.
There is only one little problem: recent versions of GCC will optimize
out the second test (returning the if statement to the first form shown
above), making overflows possible again. So any code which relies upon
this kind of test may, in fact, become vulnerable to a buffer overflow
attack.

This behavior is allowed by the C standard, which states that, in a
correct program, pointer addition will not yield a pointer value outside
of the same object. So the compiler can assume that the test for
overflow is always false and may thus be eliminated from the expression.

[The desire for efficiency over mathematical analysis takes us
back to the other topic ("language design after Algol 60") :-)]

--
			Martin

Dr Martin Ward | Email: martin@gkc.org.uk | http://www.gkc.org.uk
G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4

[toc] | [prev] | [next] | [standalone]

#2093

From	albert@cherry.spenarnc.xs4all.nl (Albert van der Horst)
Date	2018-05-05 20:28 +0200
Message-ID	<18-05-005@comp.compilers>
In reply to	#2062

In article <18-04-044@comp.compilers>, Martin Ward  <martin@gkc.org.uk> wrote:

[ discussing undefined behavior in C ]

>Gcc may optimize out tests for buffer overflows
>because of integer overflows:
>
>https://lwn.net/Articles/278137/
>
>Quote:
>
>     if (buffer + len >= buffer_end)
>       die_a_gory_death("len is out of range\n");
>
>Here, the programmer is trying to ensure that len (which might come from
>an untrusted source) fits within the range of buffer. There is a
>problem, though, in that if len is very large, the addition could cause
>an overflow, yielding a pointer value which is less than buffer. So a
>more diligent programmer might check for that case by changing the code
>to read:
>
>     if (buffer + len >= buffer_end || buffer + len < buffer)
>       loud_screaming_panic("len is out of range\n");
>

The diligent programmer gets nervous as he sees "buffer[len]" in his
code and realises that that may lead to problems if len is out of
range.
So he adds code of the sort
    &buffer[len] >= buffer_end                 &buffer[len] <buffer
He looks at this code and doesn't get nervous!
It looks more like a complete moron than a diligent programmer.

The reasonable solution is of course
   if ( len < 0 || len > sizeof(buffer) )
       panic("security breach: attempted out of buffer processing");

That makes perfect sense and will not be thrown out by any compiler.

>This code should catch all cases; ensuring that len is within range.
>There is only one little problem: recent versions of GCC will optimize
>out the second test (returning the if statement to the first form shown
>above), making overflows possible again. So any code which relies upon
>this kind of test may, in fact, become vulnerable to a buffer overflow
>attack.

There is an other problem, some one tries to break you program and
you try to execute the code without warning the autorities.

If GCC smokes out code like that, they have my blessing.

<SNIP>

--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

[toc] | [prev] | [standalone]

csiph-web

Language standards vs. implementation, was Re: A right alternative to IEEE-754's format

Contents

#2043 — Language standards vs. implementation, was Re: A right alternative to IEEE-754's format

#2052

#2062

#2093