Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #2205 > unrolled thread

Re: Optimization techniques

Started byDavid Brown <david.brown@hesbynett.no>
First post2019-04-25 21:58 +0200
Last post2019-05-07 16:38 +0200
Articles 20 on this page of 87 — 14 participants

Back to article view | Back to comp.compilers

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Optimization techniques David Brown <david.brown@hesbynett.no> - 2019-04-25 21:58 +0200
    Re: Optimization techniques Kaz Kylheku <847-115-0292@kylheku.com> - 2019-04-26 00:18 +0000
      Re: Optimization techniques David Brown <david.brown@hesbynett.no> - 2019-04-28 23:49 +0200
        Re: Optimization techniques and undefined behavior Bart <bc@freeuk.com> - 2019-04-29 00:31 +0100
          Re: Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-04-29 17:08 +0200
            Re: Optimization techniques and undefined behavior Christian Gollwitzer <auriocus@gmx.de> - 2019-04-29 18:10 +0200
              Re: Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-04-30 14:46 +0200
                Re: Optimization techniques and undefined behavior Bart <bc@freeuk.com> - 2019-05-01 13:53 +0100
                  Re: Optimization techniques and undefined behavior Andy Walker <anw@cuboid.co.uk> - 2019-05-02 11:29 +0100
                    Re: Optimization techniques and undefined behavior Bart <bc@freeuk.com> - 2019-05-03 00:48 +0100
                      Re: Optimization techniques and undefined behavior Martin Ward <martin@gkc.org.uk> - 2019-05-03 10:52 +0100
                        Re: Optimization techniques and undefined behavior George Neuner <gneuner2@comcast.net> - 2019-05-04 17:44 -0400
                          Re: Bounds checking, Optimization techniques and undefined behavior George Neuner <gneuner2@comcast.net> - 2019-05-05 17:10 -0400
                        Re: Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-05-06 08:14 +0200
                        Re: Optimization techniques and undefined behavior Gene Wirchenko <genew@telus.net> - 2019-05-11 22:25 -0700
                      Re: not a lot of memory, was Optimization techniques and undefined behavior Bart <bc@freeuk.com> - 2019-05-03 12:45 +0100
                      Re: Optimization techniques and undefined behavior Andy Walker <anw@cuboid.co.uk> - 2019-05-03 13:29 +0100
                        Re: Optimization techniques and undefined behavior Bart <bc@freeuk.com> - 2019-05-03 23:10 +0100
                          Re: Optimization techniques and undefined behavior Andy Walker <anw@cuboid.co.uk> - 2019-05-04 10:45 +0100
                            Re: Bounds checking, Optimization techniques and undefined behavior Bart <bc@freeuk.com> - 2019-05-05 11:14 +0100
                              Re: Bounds checking, Optimization techniques and undefined behavior Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2019-05-05 20:44 +0200
                                Re: Bounds checking, Optimization techniques and undefined behavior Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2019-05-06 10:15 +0200
                                  Re: Bounds checking, Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-05-07 11:04 +0200
                                    Re: Bounds checking, Optimization techniques and undefined behavior "Nuno Lopes" <nuno.lopes@ist.utl.pt> - 2019-05-07 22:38 +0100
                                    Re: Bounds checking, Optimization techniques and undefined behavior Bart <bc@freeuk.com> - 2019-05-08 01:14 +0100
                                      Re: Bounds checking, Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-05-08 09:55 +0200
                                        Re: Bounds checking, Optimization techniques and undefined behavior "Derek M. Jones" <derek@_NOSPAM_knosof.co.uk> - 2019-05-08 19:08 +0100
                                    Re: Bounds checking, Optimization techniques and undefined behavior Andy Walker <anw@cuboid.co.uk> - 2019-05-08 01:42 +0100
                                      Re: Bounds checking, Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-05-08 10:16 +0200
                                        Re: Bounds checking, Optimization techniques and undefined behavior Bart <bc@freeuk.com> - 2019-05-09 01:15 +0100
                                          Re: Bounds checking, Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-05-09 21:56 +0200
                                    Re: Bounds checking, Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-05-08 10:03 +0200
                                      Re: Bounds checking, Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-05-09 09:19 +0200
                                      Re: Bounds checking, Optimization techniques and undefined behavior Kaz Kylheku <847-115-0292@kylheku.com> - 2019-05-10 03:38 +0000
                                    Re: Bounds checking, Optimization techniques and undefined behavior Bart <bc@freeuk.com> - 2019-05-08 14:37 +0100
                                Re: Bounds checking, Optimization techniques and undefined behavior Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2019-05-06 05:05 -0400
                              Re: Bounds checking, Optimization techniques and undefined behavior George Neuner <gneuner2@comcast.net> - 2019-05-05 17:38 -0400
                                Re: Bounds checking, Optimization techniques and undefined behavior Bart <bc@freeuk.com> - 2019-05-06 13:07 +0100
                                  Re: Bounds checking, Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-05-07 14:01 +0200
                              Re: Bounds checking, Optimization techniques and undefined behavior Andy Walker <anw@cuboid.co.uk> - 2019-05-06 01:15 +0100
                                Re: Bounds checking, Optimization techniques and undefined behavior Andy Walker <anw@cuboid.co.uk> - 2019-05-06 14:40 +0100
                                  Re: Bounds checking, Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-05-07 15:05 +0200
                                    Re: Bounds checking, Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-05-08 10:18 +0200
                              Re: Bounds checking, Optimization techniques and undefined behavior Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2019-05-06 05:39 -0700
                                Re: Bounds checking, Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-05-07 15:42 +0200
                              Re: Bounds checking, Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-05-06 16:32 +0200
                          Re: Optimization techniques and undefined behavior George Neuner <gneuner2@comcast.net> - 2019-05-04 17:59 -0400
                  Re: Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-05-02 16:51 +0200
                    Re: Optimization techniques and undefined behavior Bart <bc@freeuk.com> - 2019-05-02 20:04 +0100
                      Re: Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-05-03 17:23 +0200
                        Re: Optimization techniques and undefined behavior Bart <bc@freeuk.com> - 2019-05-03 21:10 +0100
                        Re: Optimization techniques and undefined behavior Martin Ward <martin@gkc.org.uk> - 2019-05-06 13:25 +0100
                          Re: Optimization techniques and undefined behavior "Derek M. Jones" <derek@_NOSPAM_knosof.co.uk> - 2019-05-06 16:32 +0100
                          Re: Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-05-07 16:03 +0200
                            Re: Optimization techniques and undefined behavior Martin Ward <martin@gkc.org.uk> - 2019-05-08 13:16 +0100
                            Re: Optimization techniques and undefined behavior George Neuner <gneuner2@comcast.net> - 2019-05-08 15:13 -0400
                        Re: Optimization techniques and undefined behavior "Robin Vowels" <robin51@dodo.com.au> - 2019-05-07 01:22 +1000
                          Re: Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-05-07 16:05 +0200
                    Re: Optimization techniques and undefined behavior Christian Gollwitzer <auriocus@gmx.de> - 2019-05-02 22:22 +0200
            Re: Optimization techniques and undefined behavior Bart <bc@freeuk.com> - 2019-04-29 18:15 +0100
              Re: Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-04-30 15:48 +0200
                Re: Optimization techniques and undefined behavior Bart <bc@freeuk.com> - 2019-05-01 12:40 +0100
                  Re: Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-05-02 17:27 +0200
                    Re: Optimization techniques and undefined behavior Bart <bc@freeuk.com> - 2019-05-02 18:59 +0100
                      Re: Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-05-07 16:16 +0200
                Re: Optimization techniques and undefined behavior Martin Ward <martin@gkc.org.uk> - 2019-05-02 14:54 +0100
        Re: Optimization techniques and runtime checks Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2019-04-29 22:36 +0200
          Re: Optimization techniques and runtime checks David Brown <david.brown@hesbynett.no> - 2019-05-07 16:29 +0200
            Re: Optimization techniques and runtime checks Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2019-05-08 02:27 +0200
              Re: Optimization techniques and runtime checks David Brown <david.brown@hesbynett.no> - 2019-05-08 10:31 +0200
                Re: Optimization techniques and runtime checks Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2019-05-08 22:50 +0200
                Re: Optimization techniques and runtime checks "Robin Vowels" <robin51@dodo.com.au> - 2019-05-11 19:26 +1000
                Re: Optimization techniques and runtime checks Gene Wirchenko <genew@telus.net> - 2019-05-11 22:43 -0700
                  Re: Optimization techniques and runtime checks David Brown <david.brown@hesbynett.no> - 2019-05-12 20:17 +0200
            Re: Optimization techniques and runtime checks Bart <bc@freeuk.com> - 2019-05-08 14:58 +0100
              Re: Optimization techniques and runtime checks David Brown <david.brown@hesbynett.no> - 2019-05-08 23:02 +0200
                Re: Optimization techniques and runtime checks Bart <bc@freeuk.com> - 2019-05-09 18:28 +0100
                  Re: Optimization techniques and runtime checks David Brown <david.brown@hesbynett.no> - 2019-05-09 22:07 +0200
        Re: Optimization techniques Gene Wirchenko <genew@telus.net> - 2019-04-30 18:24 -0700
          Re: Optimization techniques David Brown <david.brown@hesbynett.no> - 2019-05-01 09:20 +0200
            Re: Optimization techniques Kaz Kylheku <847-115-0292@kylheku.com> - 2019-05-02 17:40 +0000
            Re: Optimization techniques and error detection Gene Wirchenko <genew@telus.net> - 2019-05-03 10:16 -0700
            Re: Optimization techniques "Robin Vowels" <robin51@dodo.com.au> - 2019-05-07 01:42 +1000
    Re: Optimization techniques Kaz Kylheku <847-115-0292@kylheku.com> - 2019-04-26 02:26 +0000
      Re: Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-04-29 00:12 +0200
        Re: Optimization techniques and undefined behavior Kaz Kylheku <847-115-0292@kylheku.com> - 2019-05-02 17:18 +0000
          Re: Optimization techniques and undefined behavior David Brown <david.brown@hesbynett.no> - 2019-05-07 16:38 +0200

Page 3 of 5 — ← Prev page 1 2 [3] 4 5  Next page →


#2274 — Re: Bounds checking, Optimization techniques and undefined behavior

FromAndy Walker <anw@cuboid.co.uk>
Date2019-05-06 14:40 +0100
SubjectRe: Bounds checking, Optimization techniques and undefined behavior
Message-ID<19-05-039@comp.compilers>
In reply to#2267
On 06/05/2019 01:15, our esteemed moderator wrote:
> [In the struct { int a,b,c,d; } S example it is my understanding that &S and &S.a
> have to be the same,

	Not "the same", as they have different types;  but yes, they must
compare equal.

>		       the four ints have to be in the order declared,

	Yes, and they must be "sequentially allocated".  Whether that means
the same as "contiguously allocated" [like array members] is not entirely
clear, as the C Standard doesn't define these terms.  Structures can contain
padding, in general, but [AFAICT] not in a way that affects this debate.
At least it seems to be the case that &S.a+1 must compare equal to &S.b,
see N1570, section 6.5.9 para 6, and footnote 109;  but I can't quite make
even this case watertight.

>								       and they
> all have the same alignment so it is valid abeit poor form to treat them as
> an array. -John]

	For the purposes of pointer arithmetic [and therefore arrays] and
comparison, non-array objects are treated as arrays with a single element.
So &S+0 and &S+1 are valid, though the latter can't be dereferenced;
likewise &S.a+0 and &S.a+1, and they can be compared with &S.b, etc.  But
I can see no support in the Standard for &S.a+2 being anything other than
UB;  it points neither to "the" element of S.a, nor one past.  That's even
though [or if!] &S.a+1 == &S.b and &S.b+1 == &S.c;  Section 6.5.6, paras
8 and 9.  As George Neuner says, if you want to use S also as an array,
then you should have declared it as a union.

	In Real Life, you will surely get away with such things, and with
much technically undefined pointer arithmetic that temporarily strays
outside an array [without dereference].  It's sanctioned by dubious custom,
but not by the Standard.  [Yes, I've done it myself.  I'm suitably ashamed
and am hereby rapped on the knuckles.]

--
Andy Walker,
Nottingham.

[toc] | [prev] | [next] | [standalone]


#2282 — Re: Bounds checking, Optimization techniques and undefined behavior

FromDavid Brown <david.brown@hesbynett.no>
Date2019-05-07 15:05 +0200
SubjectRe: Bounds checking, Optimization techniques and undefined behavior
Message-ID<19-05-047@comp.compilers>
In reply to#2274
On 06/05/2019 15:40, Andy Walker wrote:
> On 06/05/2019 01:15, our esteemed moderator wrote:
>> [In the struct { int a,b,c,d; } S example it is my understanding that
>> &S and &S.a
>> have to be the same,
>
>     Not "the same", as they have different types;  but yes, they must
> compare equal.
>

No they don't - not in C.  They are incompatible pointers, and comparing
them is a constraint violation.  That means a compiler has to complain
about trying to evaluate "&S == &S.a".

[My draft of C11 says in section 6.7.2.1: "A pointer to a structure
object, suitably converted, points to its initial member (or if that
member is a bit-field, then to the unit in which it resides), and vice
versa. There may be unnamed padding within a structure object, but not
at its beginning." -John]

If you don't like that behaviour, then maybe standard C is not the
It also means that even though the values of the pointers may be the
same, using one to access the data of the other is not valid - and the
compiler can assume it does not happen.

If you don't like that behaviour, then maybe standard C is not the
language you want.  There are plenty of C compilers that will let you
mix and match pointers and accesses despite incompatible types (you
still need appropriate casts) - such as using the "-fno-strict-alias"
flag in gcc.  But you have to actively choose the non-standard C.



>>                the four ints have to be in the order declared,
>
>     Yes, and they must be "sequentially allocated".  Whether that means
> the same as "contiguously allocated" [like array members] is not entirely
> clear, as the C Standard doesn't define these terms.  Structures can
> contain
> padding, in general, but [AFAICT] not in a way that affects this debate.

There are no restrictions on the padding that can be added between
fields in a struct.  A conforming compiler /can/ add extra padding, and
it can be inconsistent between different parts.  So for the four-int
struct here, "a" could be at offset 0, "b" at offset 4 (assuming 4-byte
int), "c" at offset 12, and "d" at offset 60.  Clearly such an
arrangement would be highly inefficient and it would take a particularly
perverse compiler writer to do anything other than the obvious 4 ints in
a row.  But the only requirements of C are that the first field is at
offset 0, and subsequent fields are at increasing offsets.

A conceivable case is that a compiler could add padding in a struct
beyond the requirements of alignment if cache line alignment made the
results faster.  Such padding can't be added in arrays.


> At least it seems to be the case that &S.a+1 must compare equal to &S.b,
> see N1570, section 6.5.9 para 6, and footnote 109;  but I can't quite make
> even this case watertight.
>

It is 6.5.9p7 that makes this comparison legal.  The wording here is a
bit odd (no one claims the C standards are always clear), but it means
that S.a and S.b can be views as single-element arrays of 1 int.  And
thus &S.a + 1 is a pointer to just beyond the end of S.a, and will
compare equal to &S.b if there is no padding.  (There /could/ be
padding, but it is quite unlikely.)

And even if "&S.a + 1 == &S.b" is true, and "&S.b + 1 == &S.c" is true,
evaluating "&S.a + 2 == &S.c" is undefined behaviour.

You might not agree that these rules are good in a language, but these
are the rules of C.

[toc] | [prev] | [next] | [standalone]


#2298 — Re: Bounds checking, Optimization techniques and undefined behavior

FromDavid Brown <david.brown@hesbynett.no>
Date2019-05-08 10:18 +0200
SubjectRe: Bounds checking, Optimization techniques and undefined behavior
Message-ID<19-05-063@comp.compilers>
In reply to#2282
On 07/05/2019 15:05, David Brown wrote:
> On 06/05/2019 15:40, Andy Walker wrote:
>> On 06/05/2019 01:15, our esteemed moderator wrote:
>>> [In the struct { int a,b,c,d; } S example it is my understanding that
>>> &S and &S.a
>>> have to be the same,
>>
>>     Not "the same", as they have different types;  but yes, they must
>> compare equal.
>>
>
> No they don't - not in C.  They are incompatible pointers, and comparing
> them is a constraint violation.  That means a compiler has to complain
> about trying to evaluate "&S == &S.a".
>
> [My draft of C11 says in section 6.7.2.1: "A pointer to a structure
> object, suitably converted, points to its initial member (or if that
> member is a bit-field, then to the unit in which it resides), and vice
> versa. There may be unnamed padding within a structure object, but not
> at its beginning." -John]


Yes.  But in this sample, the pointer was not "suitably converted".  You
are allowed to write "(int*) &S == &S.a", and it will always be true.

[toc] | [prev] | [next] | [standalone]


#2273 — Re: Bounds checking, Optimization techniques and undefined behavior

FromJan Ziak <0xe2.0x9a.0x9b@gmail.com>
Date2019-05-06 05:39 -0700
SubjectRe: Bounds checking, Optimization techniques and undefined behavior
Message-ID<19-05-038@comp.compilers>
In reply to#2263
On Sunday, May 5, 2019 at 8:01:05 PM UTC+2, Bart wrote:
> But how do they get there? Take this:
>
>      int A[10], *p;
>      p = &A[3];
>
> You intend p to refer to the 4-element slice A[3..6], but how does the
> language know that? How can it stop code from writing to p[5]?
>
> Or you intend to index p[-2] to get at the preceding elements. Actually
> using negative indexing is quite common, but surely all array bounds in
> C are presumed to start from 0?

How are you suggesting to implement malloc() and free() in C if all memory
accesses through pointers are bounds checked? An implementation of free(p)
might need to access memory at ((size_t*)p)[-1] to read metadata of the memory
block such as the block size. This memory access if outside of the bounds of
the "p" passed to free().

One solution is to introduce unsafe code regions and unsafe functions like in
Rust.

Another solution would be to implement memory allocation functions in a non-C
language. For example, older versions of the Go programming language were
implementing memory management in a non-Go language (which happens to be C).
(Newer versions of Go are implementing memory management in Go by using unsafe
pointers and in assembly.)

(I didn't read all posts in this discussion so it is possible that you already
answered this question.)

Sincerely
Jan
[There's all sorts of stuff in the C library that you can't write in
standard C.  How would you write a C version of longjmp()?
This isn't a new issue and the approaches you suggest are the ones
people use. -John]

[toc] | [prev] | [next] | [standalone]


#2283 — Re: Bounds checking, Optimization techniques and undefined behavior

FromDavid Brown <david.brown@hesbynett.no>
Date2019-05-07 15:42 +0200
SubjectRe: Bounds checking, Optimization techniques and undefined behavior
Message-ID<19-05-048@comp.compilers>
In reply to#2273
On 06/05/2019 14:39, Jan Ziak wrote:
> On Sunday, May 5, 2019 at 8:01:05 PM UTC+2, Bart wrote:
>> But how do they get there? Take this:
>>
>>      int A[10], *p;
>>      p = &A[3];
>>
>> You intend p to refer to the 4-element slice A[3..6], but how does the
>> language know that? How can it stop code from writing to p[5]?
>>
>> Or you intend to index p[-2] to get at the preceding elements. Actually
>> using negative indexing is quite common, but surely all array bounds in
>> C are presumed to start from 0?
>
> How are you suggesting to implement malloc() and free() in C if all memory
> accesses through pointers are bounds checked? An implementation of free(p)
> might need to access memory at ((size_t*)p)[-1] to read metadata of the memory
> block such as the block size. This memory access if outside of the bounds of
> the "p" passed to free().

Currently, it is impossible to implement malloc() and free() in strictly
conforming standard C.

The folks working on new documents formalising "pointer provenance" are
taking this into account, and trying to make it possible.

>
> One solution is to introduce unsafe code regions and unsafe functions like in
> Rust.

That would be conceivable, but I think inappropriate for C.  The kind of
tracking and pointer provenance that C does now is at the compiler level
- it does not involve run-time checking.  Having "safe" and "unsafe"
areas would imply different kinds of guarantees and checks - the kind
that C avoids for efficiency.  (If you want checking, use a different
language.)

A more likely choice is to introduce a way to remove the provenance from
a pointer, and also its type information.

>
> Another solution would be to implement memory allocation functions in a non-C
> language. For example, older versions of the Go programming language were
> implementing memory management in a non-Go language (which happens to be C).
> (Newer versions of Go are implementing memory management in Go by using unsafe
> pointers and in assembly.)
>
> (I didn't read all posts in this discussion so it is possible that you already
> answered this question.)
>
> Sincerely
> Jan
> [There's all sorts of stuff in the C library that you can't write in
> standard C.  How would you write a C version of longjmp()?
> This isn't a new issue and the approaches you suggest are the ones
> people use. -John]
>

There is not a lot that can't be written in standard C.
setjmp()/longjmp() is one case, as are malloc()/free(), and the
offsetof() macro.  And clearly printf, file I/O, etc., need external
code to do the actual work.  But generally the majority of any standard
C library is written in C.

[toc] | [prev] | [next] | [standalone]


#2275 — Re: Bounds checking, Optimization techniques and undefined behavior

FromDavid Brown <david.brown@hesbynett.no>
Date2019-05-06 16:32 +0200
SubjectRe: Bounds checking, Optimization techniques and undefined behavior
Message-ID<19-05-040@comp.compilers>
In reply to#2263
On 05/05/2019 12:14, Bart wrote:
> On 04/05/2019 10:45, Andy Walker wrote:
>> On 03/05/2019 23:10, Bart wrote:
>
>>> C is just a mess; it has arrays of sorts, but people generally use raw
>>> pointers without associated bounds. Maybe that's one reason why your C
>>> didn't have it. Or did it somehow manage it if enabled?
>>
>>      This isn't really a problem with C, the language.  It's clear in
>> the reference manual right back to K&R C and in the various standards
>> that pointers always have associated bounds.
>
> But how do they get there? Take this:
>
>     int A[10], *p;
>     p = &A[3];
>
> You intend p to refer to the 4-element slice A[3..6], but how does the
> language know that? How can it stop code from writing to p[5]?

C does not have types representing slices.  And even its array types are
a bit limited - decaying to pointers when they are used as lvalues in
expressions, and thus losing their array properties (like size).  This
pointer decay is very convenient in most uses of arrays, but it does
mean there is less checking of them and you can't copy them by value.
You can wrap an array in a struct to get a fixed array.

So a compiler (or additional static error checker) can stop you from
writing p[7], but not from writing p[5].

>
> Or you intend to index p[-2] to get at the preceding elements.

Again, p[-2] is valid (according to the rules of the language), but
p[-4] is not.  gcc (and some other compilers) can warn you here.

> Actually
> using negative indexing is quite common, but surely all array bounds in
> C are presumed to start from 0?

That is true for arrays.  But the [] operation is not an operation on
arrays - it is an operation on pointers.  As long as "p" points to a
compatibly typed item within an object (like an array), and "p[x]"
points to another compatibly typed item within the same object, then
"p[x]" is valid and means the same as "*(p + x)".


Using negative indexing is very /uncommon/ in C, but it does occur and
is valid (if used correctly, of course).

>
> Or this:
>
>   struct {int a,b,c,d;} S;
>
>   p = &S.a;
>
> You intend p to be used to access a,b,c,d as an int[4] array, but p's
> bounds will say it's only one element long.
>

In C, a struct of 4 int's and an array of 4 ints are not the same thing.

> Or this:
>
>    int *p = malloc(sizeof(int)*1000);
>
>    int *q = p+400;
>
> You are allocating one pool of memory than sub-allocating that into
> smaller objects, here into a 20-element array headed by q. But how does
> the language know that?

It doesn't know it.

C can't express everything you might want to say - nor can any language.
 Sometimes it is possible to say a bit more, but with awkward and
inconvenient syntax - which may or may not be worth the effort.

Some static analysers (like pc-lint) can let you give more information
in the form of special comments, and then they can spot more bugs.

>
> Or maybe p allocates memory for a 2D array, and q refers to one row of
> that; you can't detect accesses beyond that row.
>

Compilers might be able to spot problems in some cases, and static or
run-time analysers might be able to spot others.

No language can stop you making any mistakes, and no tool can spot all
errors.  C does not have every feature that a language could have - it
is intended to have relatively few safety features, and for the
programmer to take responsibility for writing correct code.


> I can see how the language can figure out the overall bounds of the
> block of memory a pointer refers to. But that can be a long way from
> knowing the precise bounds of any array or sub-array, when represented
> via a pointer to the first element.

That is not a feature supported by C.

>
> So the problems are either that bounds transgressions can't be detected
> properly as 'fat pointers' only have broad info about allocations, or
> they will give false alarms when you want to do some pointer-offset
> manipulations that C normally allows.

C does not normally use "fat pointers" at all.  But if you want to use
fat pointers, you can make your own struct types holding them.  And you
are free to make your own slice and sub-array types from them, with
whatever run-time checking you want.  Their usage is going to be a bit
ugly in C, since you can't have operator overloading or other neat
syntax (got to C++ if you want that), but it will work.  C only gives
you the low-level tools here - if you want something more advanced and
higher level, you need to make it yourself.

>
>   It's just that the K&R
>> compiler with Unix didn't check, perhaps for decent reasons, and that
>> the relatively few attempts to produce compilers that do have not been
>> all that successful.  You can't do the checking unless every pointer
>> "knows" which object it is supposed to be pointing into, which rather
>> implies "fat" pointers, which makes all pointer operations take longer
>> and require more code.  It very likely means that arrays too need to
>> be implemented differently, with proper descriptors.
>
> Which C is not going to have without breaking just about every existing
> program. Besides, most array work in C doesn't actually use arrays, but
> explicit pointers.
>

C implementations don't (usually) use fat pointers, for efficiency
reasons.  But a compiler can track the provenance of pointers and do a
good deal of static analysis on them.  The C and C++ committees are both
in the process of considering formal definitions of pointer provenance
to make this clearer, and hopefully compilers will add more static
checking of them.  With link-time optimisation allowing more swapping of
information between modules, this will get a good deal more powerful in
the future.

>   As discussed over
>> in "comp.lang.misc" recently, that's not a Bad Thing;  it gives you
>> extra facilities virtually free, as well as the added security.  But
>> it does result in larger executables and slower execution -- unless you
>> really do have hardware support [cf Burroughs] -- so historically it's
>> not been popular.
>
> With language support, it need have no cost. For example, suppose that
> array A did carry its bounds with it (or are statically known), then in
> code like this:
>
>     for i in A do       # (iterate over bounds not values)
>         A[i] := 0
>     end
>
> the compiler knows it doesn't need to bounds-check each access. Or here:

That works in C++, but not C.  (Yes, it would be nice to have it in C.
Certainly for any new programming language I would hope to see something
like that.)

A key problem with languages that have "proper" arrays is that it can be
very difficult to handle array parameters, and functions that will
handle arrays of different sizes.  Array types have their size as part
of their type, so a function that takes an array of 10 ints has a
different type signature from one that takes an array of 20 ints.  C
requires you to pass this size information in another way, such as via
an extra parameter - you get the flexibility to choose your solution,
but the responsibility to implement it yourself.  Some languages can
provide "hidden parameters" to handle the situation, which would make
certain kinds of checks easier, and is not a bad idea.

>
>     forall x in A do    # (iterate over values)
>         print x
>     end
>
> [Every pointer in C points either to a static object or to one that
> comes from a handful of routines like malloc() or mmap() so it
> shouldn't be too hard to do bounds checking within the allocated
> object, give or take unsafe typecasts which I don't think are common
> in application code.  But you are certainly correct that C isn't
> expressive enough to do slices or the like. -John]

Pointers can also point to non-static local data ("stack" data, in most
implementations).
[Oh, of course.  But the bounds of those are known, too. -John]

[toc] | [prev] | [next] | [standalone]


#2262 — Re: Optimization techniques and undefined behavior

FromGeorge Neuner <gneuner2@comcast.net>
Date2019-05-04 17:59 -0400
SubjectRe: Optimization techniques and undefined behavior
Message-ID<19-05-027@comp.compilers>
In reply to#2259
On Fri, 3 May 2019 23:10:53 +0100, Bart <bc@freeuk.com> wrote:


>I just haven't found overflow on numbers the huge problem that people
>say it is.

It was much more of a problem on 16-bit machines.  The vast majority
of *typical* integer uses in programs will not overflow 32 bits.  But
even with 64 bits, there still are situations - i.e. untrusted user
input - where prudence demands that you test.

George

[toc] | [prev] | [next] | [standalone]


#2244 — Re: Optimization techniques and undefined behavior

FromDavid Brown <david.brown@hesbynett.no>
Date2019-05-02 16:51 +0200
SubjectRe: Optimization techniques and undefined behavior
Message-ID<19-05-008@comp.compilers>
In reply to#2240
On 01/05/2019 14:53, Bart wrote:
> On 30/04/2019 13:46, David Brown wrote:
>> On 29/04/2019 18:10, Christian Gollwitzer wrote:
>
>>> P5
>>> 100 200
>>> 255
>>> ..jdk hlhdhqkd.. here comes the binary data
>>>
>>>
>>> The 100 and 200 are the width and height of the image data, the 255 is
>>> the highest possible value (for 16 bit it would be different).
>>> Obviously, you'd read in the width and height, then multiply them to
>>> compute the memory needed for the data, and  - oops - how do you make
>>> sure that no overflow occurs? In the past, there had been security
>>> problems in image libraries with exactly this kind of problem: integer
>>> overflow due to unreasonable image sizes.
>>
>> It is really incredibly simple (at least in this case).  Do the
>> multiplications using types that won't overflow.  That might be an
>> unsigned type if its range is suitable (not because it has defined
>> overflow behaviour, but use it if it has enough range) or a bigger
>> signed integer type.
>
> That's just kicking the can further down the road.
>

Yes (especially for the use of a larger signed type).  That's fine.  C
let's you easily kick the can a /long/ way down the road.  How often do
you need integers that will overflow a 64-bit type?

> If you have two unknown values A and B, and need to multiply, you won't
> know if the result will overflow.
>

First off, how often do you actually have unknown values to deal with?
Usually you know something at least.

> Using a type that is double the width might help, unless A and B are
> already using the widest type. But if they are int32, and you do the
> calculation like this:
>
>      (int64)A * (int64)B

(You only need the cast on one of these, not both.  But it's fine to put
it on both if you prefer.  And since we are discussing C, why not use
the C types - int64_t ?)

>
> then suppose you had to work out A*B*C; do you use:
>
>      (int128)*(int64)A * (int64)*B) * (int128)C ?
>
> (Real example: imagebytes := width * height * bytes_per_pixel)
>

bytes_per_pixel is not going to be more than, say, 16 - that's for
32-bit per colour, including alpha channel.  No picture format is going
to have more.  Width and height will also be limited - the highest
resolution image sensors are in the range of 120 megapixels.  If you are
talking about a panorama picture sewn together from 5 billion such
camera pictures, you might have a point - and that's just using 64-bit
types.

Looking at it from another viewpoint, are you really wanting to work
with a picture that takes more than 2 ^ 63 bytes storage?  If not, there
is going to be no overflow.  It really is rare that 128 bit types are
needed (although it is nice if a language supports them for those rare
occasions.  Standard C does not require it, though many PC C compilers
support them).


> It doesn't really scale. And it doesn't help here:
>
>      int32 A, B, C;
>
>      C = (int64)A * (int64)B;
>
> as you will need to check the int64 intermediate result for overflow. It
> all gets very messy.
>

Again, /think/ about what your values are and what the code you are
writing is doing.

(And one thing we can be sure of here - having two's complement wrapping
arithmetic certainly will not help.)

> I think a lot of this can be handled with application code dealing with
> validation; it doesn't really benefit from UB in the language.
>
> In the example posed, you have the additional problem that the input can
> be this:
>
>     P5
>     389000000000000000000000000000 9200000000000000000000000000
>
> with both dimensions exceeding int64. You need to get past that first,
> and that might indicate some error before you get around to doing any
> multiplying.
>
> These are all real, practical problems that are starting to get a long
> way from UB in a language.

These are all imaginary problems that are starting to get a long way
from reality.

>
>> There are plenty of cases where it is difficult to write code that is
>> efficient even on poorer compilers, and correct code so that it works on
>> good compilers.  No one claims that programming is always easy.  And
>> sometimes the best solution is code that is not portable or correct by
>> the standards, but works well with the implementations you need to use.
>> Most code, after all, is only ever compiled on the one compiler.
>>
>>
>>> The simplest thing would be to reject any width or height > 100,000,
>>> claiming that noone can handle this images, but what about an image of
>>> size 200,000 x 3 ? If C++ would provide an easy way to detect / branch
>>> on overflow, for example throw an exception, then this could be handled
>>> easily. I can't see how you can claim that your code never overflows,
>>> unless you don't handle untrusted user input data.
>>>
>
>> All C++ compilers with any self-respect support 64-bit integer types.
>> Do you think it's reasonable to reject any image dimension greater than
>> 2,000,000,000 ?  I do.
>
> I remember being shown a typesetting machine some decades ago, which
> (IIRC) was fed PostScript code and produced an image on film at some
> 16,000 dpi.
>
> 2 billion pixels (or dots) would be only 8 square inches of image. While
> A4 scanners working at 9600dpi (1-bit depth) would result in an image of
> some 9 billion pixels. So 2 billion pixels is not really that far out of
> the ball-park.

I was talking about a /dimension/ of 2 billion - that is, a width or
height of 2 billion.  Not an /area/ of 2 billion - such resolutions only
occur in very niche situations, but they are not inconceivable.  At the
extraordinary high resolution used by your typesetter (which is a lot
higher than any printing process produces - you are talking about 0.16µm
dots), 64-bit types are enough to handle a printout that is 3 km square.

[toc] | [prev] | [next] | [standalone]


#2250 — Re: Optimization techniques and undefined behavior

FromBart <bc@freeuk.com>
Date2019-05-02 20:04 +0100
SubjectRe: Optimization techniques and undefined behavior
Message-ID<19-05-014@comp.compilers>
In reply to#2244
On 02/05/2019 15:51, David Brown wrote:
> On 01/05/2019 14:53, Bart wrote:

>> That's just kicking the can further down the road.
>>
>
> Yes (especially for the use of a larger signed type).  That's fine.  C
> let's you easily kick the can a /long/ way down the road.  How often do
> you need integers that will overflow a 64-bit type?

Most of your post seems to take the premise that 64-bit types have such
a wide range that you can forget about overflow problems.

But the principle is actually the same: two values A and B represented
within N-bit types could overflow when performing arithmetic. It doesn't
matter if N is 32 or 64.

The subject in UB, and whether the possibility of overflow can just be
ignored by a language, a language that deals with low-level
machine-sized types.

You seem OK with a C compiler assuming that overflow cannot happen so
that it can generate slightly faster benchmarks.

I prefer a language acknowledging that it could happen, and stipulating
exactly what does happen.


>> If you have two unknown values A and B, and need to multiply, you won't
>> know if the result will overflow.
>>
>
> First off, how often do you actually have unknown values to deal with?
> Usually you know something at least.

The example here was reading values from a file. So they are external
data, and will be unknown.

> bytes_per_pixel is not going to be more than, say, 16 - that's for
> 32-bit per colour, including alpha channel.

It shouldn't be, but you might be reading it from a file.

> (And one thing we can be sure of here - having two's complement wrapping
> arithmetic certainly will not help.)

No; neither is unsigned number wrapping. But having gcc do something
weird instead, at odds with nearly every other compiler and language, is
even less help.

[toc] | [prev] | [next] | [standalone]


#2256 — Re: Optimization techniques and undefined behavior

FromDavid Brown <david.brown@hesbynett.no>
Date2019-05-03 17:23 +0200
SubjectRe: Optimization techniques and undefined behavior
Message-ID<19-05-021@comp.compilers>
In reply to#2250
On 02/05/2019 21:04, Bart wrote: > On 02/05/2019 15:51, David Brown
wrote: >> On 01/05/2019 14:53, Bart wrote: > >>> That's just kicking
the can further down the road.  >>> >> >> Yes (especially for the use
of a larger signed type).  That's fine.  C >> let's you easily kick
the can a /long/ way down the road.  How often do >> you need integers
that will overflow a 64-bit type?  > > Most of your post seems to take
the premise that 64-bit types have such > a wide range that you can
forget about overflow problems.

I have never suggested "forgetting about overflow problems".  I have
suggested that you make the effort to write correct code.  If you are
overflowing, you have a bug - either in the implementation of the
function you want, or in its design or specification.  One of the ways
you can often be sure that your calculations will not overflow is by
using bigger types - but it is certainly not the only way.

>
> But the principle is actually the same: two values A and B represented
> within N-bit types could overflow when performing arithmetic. It doesn't
> matter if N is 32 or 64.
>

Correct.  But unless you are writing silly and unrealistic test code, or
working with something like cryptography that requires very large
numbers (in which case you are in a different type of coding
altogether), your integers won't overflow 64-bit types.  And certainly,
using two's complement wrapping types will not give you the correct
answer in anything but a /tiny/ fraction of situations where UB
overflowing types fail.  The key point is that UB overflow arithmetic
does not fail in appreciable numbers of cases where wrapping overflow
would succeed.  (And as has already been established, UB overflow gives
more efficient code and better error checking and debugging.)

> The subject in UB, and whether the possibility of overflow can just be
> ignored by a language, a language that deals with low-level
> machine-sized types.
>

The answer is yes, it can.

In assembly programming - as low as you get - overflow is almost always
ignored.

> You seem OK with a C compiler assuming that overflow cannot happen so
> that it can generate slightly faster benchmarks.
>

I am not the slightest bit interested in performance in benchmarks.  I
am interested first in /correctness/ of correctly written and valid
code, and then in the efficiency of that code.  For /real/ code, on real
targets, doing real work.  And yes, I am happy that my compiler can
assume my calculations on signed integers don't overflow.  If I need to
have specific behaviour on overflow, I don't use plain signed integers.

> I prefer a language acknowledging that it could happen, and stipulating
> exactly what does happen.
>

Then you might as well give up programming, because that can't be done.
There are lots of situations where behaviour is undefined, in /all/
programming languages.  The lower level and more efficient the language,
the more such situations you get - but none are entirely free of
undefined behaviour.  And the more you fight undefined behaviour, the
more limited your coding will be.  Accept it, realise that it is part of
the world of programming, and you will get on much better.

Every function in computing (and this is fundamental to computation - it
applies to everything from a Turing machine to a quantum computer,
regardless of programming language) can be described in terms of
starting with a pre-condition, and establishing a post-condition.  The
inputs to a function - its parameters, and any accessible global state -
must satisfy the pre-condition.  Then the function guarantees to
establish the post-condition.  The function does not say what will
happen if the pre-conditions are not fulfilled - calling the function in
that situation is undefined behaviour.  "Undefined behaviour" simply
means "the behaviour is not defined" - there are no rules or
instructions to handle the situation.

A programming language might say that the precondition for its signed
integer addition "function" is "true" - that is, it works for any input
values.  Another might say that the precondition is that its inputs must
sum to a valid value for the result type.  (Yes, it's perfectly okay for
the pre-condition to be very like the operation itself.)

You can prefer that languages have "true" preconditions for some
functions, but they certainly won't have it for all.


>
>>> If you have two unknown values A and B, and need to multiply, you won't
>>> know if the result will overflow.
>>>
>>
>> First off, how often do you actually have unknown values to deal with?
>> Usually you know something at least.
>
> The example here was reading values from a file. So they are external
> data, and will be unknown.
>

Only a fool uses unknown data from outside without checking them.  Check
that the data makes sense, then use it.  Don't use it first then check
for carnage afterwards.

Or if you do want to use the data without much checking (say, for a
quick test code, or for use with files you already know are valid), then
don't worry about undefined behaviour on invalid data, because you don't
care what results you would get for bad files.

>> bytes_per_pixel is not going to be more than, say, 16 - that's for
>> 32-bit per colour, including alpha channel.
>
> It shouldn't be, but you might be reading it from a file.
>

The data will be valid before you start using it - because you will have
checked the unknown data.

>> (And one thing we can be sure of here - having two's complement wrapping
>> arithmetic certainly will not help.)
>
> No; neither is unsigned number wrapping. But having gcc do something
> weird instead, at odds with nearly every other compiler and language, is
> even less help.

You are, as usual, very keen to pick out gcc as though it was something
special here.  Compilers have been assuming signed integer overflow
never happens for over 20 years (that's just from my own personal
experience), long before gcc was that smart.

(Yes, I agree that wrapping unsigned arithmetic won't help here.  I have
regularly said that most unsigned overflows are also bugs, and that
undefined behaviour there would also make sense.  Maybe you "forgot"
that in order to try to make another anti-gcc remark?)

[toc] | [prev] | [next] | [standalone]


#2258 — Re: Optimization techniques and undefined behavior

FromBart <bc@freeuk.com>
Date2019-05-03 21:10 +0100
SubjectRe: Optimization techniques and undefined behavior
Message-ID<19-05-023@comp.compilers>
In reply to#2256
On 03/05/2019 16:23, David Brown wrote:
> On 02/05/2019 21:04, Bart wrote: > On 02/05/2019 15:51, David Brown

>> The subject is UB, and whether the possibility of overflow can just be
>> ignored by a language, a language that deals with low-level
>> machine-sized types.
>>
>
> The answer is yes, it can.
>
> In assembly programming - as low as you get - overflow is almost always
> ignored.

Fine. In that case I wanted it ignored for other kinds of languages too.
And that includes a low level language at the level of C.

>> You seem OK with a C compiler assuming that overflow cannot happen so
>> that it can generate slightly faster benchmarks.
>>
>
> I am not the slightest bit interested in performance in benchmarks.

I mean the people behind the compilers are.

>> I prefer a language acknowledging that it could happen, and stipulating
>> exactly what does happen.
>>
>
> There are lots of situations where behaviour is undefined, in /all/
> programming languages.  The lower level and more efficient the language,
> the more such situations you get - but none are entirely free of
> undefined behaviour.  And the more you fight undefined behaviour, the
> more limited your coding will be.  Accept it, realise that it is part of
> the world of programming, and you will get on much better.

I think you've just used too much C.

There's nothing wrong with a language saying it expects to target twos
complement machines with power-of-two word sizes, and that it expects
overflow on such types to be well-defined on those machines.

There's nothing to fight. Certainly I'm not interesting in fighting C.

(Actually, I've dropped C as a possible target from my compilers; there
was simply too much of a struggle to keep C compilers happy, with UB on
innocuous operations being one small part of it. And it was holding me
back from using advanced features difficult to express in C.)

 > Then you might as well give up programming, because that can't be done.

Don't forget I normally devise my own languages and write my own
compilers, so I can largely do what I like.

> Only a fool uses unknown data from outside without checking them.  Check
> that the data makes sense, then use it.  Don't use it first then check
> for carnage afterwards.

I understand that your work is mostly concerned with small embedded
systems. Most of mine for about 15 years involved applications that had
to deal with user input.

> You are, as usual, very keen to pick out gcc as though it was something
> special here.

True. It's gcc /and/ clang (which tries to copy what gcc does), where
I've observed this behaviour, that is not only at odds with other
compilers and languages, but also with themselves.

[toc] | [prev] | [next] | [standalone]


#2272 — Re: Optimization techniques and undefined behavior

FromMartin Ward <martin@gkc.org.uk>
Date2019-05-06 13:25 +0100
SubjectRe: Optimization techniques and undefined behavior
Message-ID<19-05-037@comp.compilers>
In reply to#2256
On 03/05/19 16:23, David Brown wrote:
> There are lots of situations where behaviour is undefined, in /all/
> programming languages.  The lower level and more efficient the language,
> the more such situations you get - but none are entirely free of
> undefined behaviour.

There are many language in which all behaviour is defined:
for example, Go has no undefined behaviour. Some behaviour
may be "unspecified" (eg it could do X, or could do Y,
but cannot do anything else).
There is no logical requirement for leaving any behaviour undefined:
other than laziness by the language designers, or an inability
to reach agreement on a suitable behaviour. I believe Java
also has no undefined behaviour.

> You can prefer that languages have "true" preconditions for some
> functions, but they certainly won't have it for all.

Preconditions are used in proofs of correctness and in
the specification of functions. In a correctness proof, you prove
that for any initial state satisfying the given precondition,
the program is guaranteed to terminate in a state which
satisfies the given postcondition. For built-in functions
and operations in a language definition there is no reason why
the precondition cannot be "true", or the language could
specify that an error message is returned or an exception raised
if the function's precondition is not met.

> I have regularly said that most unsigned overflows are also bugs,
 > and that undefined behaviour there would also make sense.

This would appear to make it extremely difficult to implement
two's complement arithmetic in C: currently one can cast
the signed values to unsigned, calculate the result,
and cast back to signed. (Note however that negative numbers are not
guaranteed to be stored in two's complement format!)

How would you implement arithmetic MOD 2^32, MOD 2^64 or MOD 2^128
if unsigned overflow was undefined?

--
			Martin

Dr Martin Ward | Email: martin@gkc.org.uk | http://www.gkc.org.uk
G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4

[toc] | [prev] | [next] | [standalone]


#2277 — Re: Optimization techniques and undefined behavior

From"Derek M. Jones" <derek@_NOSPAM_knosof.co.uk>
Date2019-05-06 16:32 +0100
SubjectRe: Optimization techniques and undefined behavior
Message-ID<19-05-042@comp.compilers>
In reply to#2272
Martin,

> There are many language in which all behaviour is defined:
> for example, Go has no undefined behaviour. Some behaviour

Ada does not have any undefined behavior, but they are changing that ;-)
https://shape-of-code.coding-guidelines.com/2016/04/01/ada-updated-to-support-undefined-behavior/

--
Derek M. Jones
blog:shape-of-code.coding-guidelines.com

[toc] | [prev] | [next] | [standalone]


#2284 — Re: Optimization techniques and undefined behavior

FromDavid Brown <david.brown@hesbynett.no>
Date2019-05-07 16:03 +0200
SubjectRe: Optimization techniques and undefined behavior
Message-ID<19-05-049@comp.compilers>
In reply to#2272
On 06/05/2019 14:25, Martin Ward wrote:
> On 03/05/19 16:23, David Brown wrote:
>> There are lots of situations where behaviour is undefined, in /all/
>> programming languages.  The lower level and more efficient the language,
>> the more such situations you get - but none are entirely free of
>> undefined behaviour.
>
> There are many language in which all behaviour is defined:
> for example, Go has no undefined behaviour. Some behaviour
> may be "unspecified" (eg it could do X, or could do Y,
> but cannot do anything else).

If I write a function "square_root" in Go that takes a non-negative
input and returns its square root, then calling that function with a
negative input is undefined behaviour.

It does not matter if it always has the same effect - if the behaviour
has not been specified, it is undefined.

Extrapolate that to any other aspect of any language, library, function,
etc.

> There is no logical requirement for leaving any behaviour undefined:
> other than laziness by the language designers, or an inability
> to reach agreement on a suitable behaviour. I believe Java
> also has no undefined behaviour.
>

See above.


It is certainly that case that many languages have /less/ undefined
behaviour than C.  It is certainly the case that many languages define
the behaviour of particular things that are famously undefined in C,
such as signed integer overflow (in many cases, the definitions are
stupid and worse than useless - in other cases they may be useful but
are inefficient).

And it is certainly the case that less undefined behaviour can be useful
- C is not the best choice of language for many tasks.


>> You can prefer that languages have "true" preconditions for some
>> functions, but they certainly won't have it for all.
>
> Preconditions are used in proofs of correctness and in
> the specification of functions. In a correctness proof, you prove
> that for any initial state satisfying the given precondition,
> the program is guaranteed to terminate in a state which
> satisfies the given postcondition. For built-in functions
> and operations in a language definition there is no reason why
> the precondition cannot be "true", or the language could
> specify that an error message is returned or an exception raised
> if the function's precondition is not met.
>

Preconditions and postconditions are not just for proofs - they are
specifications for the functions, and you should understand them for any
operation or function you use.

Certainly for many built-in functions and operations, it is possible to
have a "true" precondition.  But there /are/ reasons why you don't
always want that.  Different languages have different purposes.  Checks
and exceptions are not free.  Some languages have checks and exceptions
built in to save the programmer the effort, others require them to be
written manually to give the programmer full control.

>> I have regularly said that most unsigned overflows are also bugs,
>> and that undefined behaviour there would also make sense.
>
> This would appear to make it extremely difficult to implement
> two's complement arithmetic in C: currently one can cast
> the signed values to unsigned, calculate the result,
> and cast back to signed. (Note however that negative numbers are not
> guaranteed to be stored in two's complement format!)

Correct.

>
> How would you implement arithmetic MOD 2^32, MOD 2^64 or MOD 2^128
> if unsigned overflow was undefined?
>

When saying that undefined behaviour for unsigned overflow would make
sense and be useful, I have usually also elaborated by saying you would
need another method to get wrapping behaviour on the few occasions when
you need it.  I should have included this here too.

[toc] | [prev] | [next] | [standalone]


#2300 — Re: Optimization techniques and undefined behavior

FromMartin Ward <martin@gkc.org.uk>
Date2019-05-08 13:16 +0100
SubjectRe: Optimization techniques and undefined behavior
Message-ID<19-05-065@comp.compilers>
In reply to#2284
On 07/05/19 15:03, David Brown wrote:
> If I write a function "square_root" in Go that takes a non-negative
> input and returns its square root, then calling that function with a
> negative input is undefined behaviour.

https://golang.org/pkg/math/#Sqrt

Sqrt returns the square root of x.

Special cases are:

Sqrt(+Inf) = +Inf
Sqrt(±0) = ±0
Sqrt(x < 0) = NaN
Sqrt(NaN) = NaN

Your "square_root" function might have "undefined behaviour" in the
informal sense that the documentation you provide for your function is
incomplete and does not specify what the function returns for a
negative value.  But that is not the kind of "undefined behaviour" we
are discussing in this thread: where the *language* says that
"anything could happen" under certain circumstances.  Your
implementation of "square_root" will not have "undefined behaviour" in
this technical sense (unless you have perversely implemented a data
race condition!)

There is a reason why the IEEE floating point standard
includes NaN as a result instead of "undefined behaviour".

--
			Martin

Dr Martin Ward | Email: martin@gkc.org.uk | http://www.gkc.org.uk
G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4

[toc] | [prev] | [next] | [standalone]


#2305 — Re: Optimization techniques and undefined behavior

FromGeorge Neuner <gneuner2@comcast.net>
Date2019-05-08 15:13 -0400
SubjectRe: Optimization techniques and undefined behavior
Message-ID<19-05-070@comp.compilers>
In reply to#2284
On Tue, 7 May 2019 16:03:04 +0200, David Brown
<david.brown@hesbynett.no> wrote:

>If I write a function "square_root" in Go that takes a non-negative
>input and returns its square root, then calling that function with a
>negative input is undefined behaviour.

That is splitting a fine hair.


>It does not matter if it always has the same effect - if the behaviour
>has not been specified, it is undefined.

*You* specified the behavior by implementing the function.  If you
want to argue that there will never be wide spread agreement on the
behavior, then ok ... but if you are want to argue semantics, then
nothing that can be implemented can truly be "undefined".

And if you really want to argue about "wide agreement", then language
standards don't qualify - most programmers do not actively *agree*
with the standard but simply choose to go along with it.  For most the
"choice" is made either by necessity - e.g., a business requirement to
use the language - or simply by inertia or apathy [too big a deal to
switch to a more "agreeable" language].

One data point: I will use C or C++ because I'm paid to do so, not
because I like them or because I agree with the decisions made by
their standards committees.


Programming is NOT mathematics [or even mathematical in general] and
there is no reason other than sanity that computer code should have to
obey accepted physical or mathematical rules.  And in fact, it
doesn't: if it did, we wouldn't be having arguments about signed vs
unsigned vs saturating vs wrap-around integers or how to deal with the
difference between real numbers and floating point arithmetic.


>Extrapolate that to any other aspect of any language, library, function, etc.

When I do, I come to a completely different conclusion.

YMMV,
George

[toc] | [prev] | [next] | [standalone]


#2276 — Re: Optimization techniques and undefined behavior

From"Robin Vowels" <robin51@dodo.com.au>
Date2019-05-07 01:22 +1000
SubjectRe: Optimization techniques and undefined behavior
Message-ID<19-05-041@comp.compilers>
In reply to#2256
From: "David Brown" <david.brown@hesbynett.no>
Sent: Saturday, May 04, 2019 1:23 AM

> You are, as usual, very keen to pick out gcc as though it was something
> special here.  Compilers have been assuming signed integer overflow
> never happens for over 20 years (that's just from my own personal
> experience), long before gcc was that smart.

Some compilers might, but I have not met one that does ignore
overflow.

Even in evaluating compile-time expressions, an overflow
is treated as an error, and a diagnostic is issued.
[clang and gcc definitely generate code that ignores integer overflows
at runtime. I just checked. -John]

[toc] | [prev] | [next] | [standalone]


#2285 — Re: Optimization techniques and undefined behavior

FromDavid Brown <david.brown@hesbynett.no>
Date2019-05-07 16:05 +0200
SubjectRe: Optimization techniques and undefined behavior
Message-ID<19-05-050@comp.compilers>
In reply to#2276
On 06/05/2019 17:22, Robin Vowels wrote:
> From: "David Brown" <david.brown@hesbynett.no>
> Sent: Saturday, May 04, 2019 1:23 AM
>
>> You are, as usual, very keen to pick out gcc as though it was something
>> special here.  Compilers have been assuming signed integer overflow
>> never happens for over 20 years (that's just from my own personal
>> experience), long before gcc was that smart.
>
> Some compilers might, but I have not met one that does ignore
> overflow.
>

I have met several (other than gcc).  These were not PC/x86 compilers.

> Even in evaluating compile-time expressions, an overflow
> is treated as an error, and a diagnostic is issued.

That is usually the case, and is helpful.

> [clang and gcc definitely generate code that ignores integer overflows
> at runtime. I just checked. -John]

[toc] | [prev] | [next] | [standalone]


#2251 — Re: Optimization techniques and undefined behavior

FromChristian Gollwitzer <auriocus@gmx.de>
Date2019-05-02 22:22 +0200
SubjectRe: Optimization techniques and undefined behavior
Message-ID<19-05-015@comp.compilers>
In reply to#2244
Am 02.05.19 um 16:51 schrieb David Brown:
> On 01/05/2019 14:53, Bart wrote:
>> On 30/04/2019 13:46, David Brown wrote:
>>> It is really incredibly simple (at least in this case).  Do the
>>> multiplications using types that won't overflow.  That might be an
>>> unsigned type if its range is suitable (not because it has defined
>>> overflow behaviour, but use it if it has enough range) or a bigger
>>> signed integer type.
>>
>> That's just kicking the can further down the road.
>>
>
> Yes (especially for the use of a larger signed type).  That's fine.  C
> let's you easily kick the can a /long/ way down the road.  How often do
> you need integers that will overflow a 64-bit type?
>
>> If you have two unknown values A and B, and need to multiply, you won't
>> know if the result will overflow.
>>
> First off, how often do you actually have unknown values to deal with?
> Usually you know something at least.

Well exactly that was my point. In order to parse the file format, you
get unknown values, and similar errors led to buffer overflows in the past:

	https://www.kb.cert.org/vuls/id/523889/

That one was really nasty, because libpng is contained in a lot of
security critical software (browsers, web servers handling uploads...).
So, this IS a real problem, maybe just not common in the code that you
write.


> bytes_per_pixel is not going to be more than, say, 16 - that's for
> 32-bit per colour, including alpha channel.  No picture format is going
> to have more.  Width and height will also be limited - the highest
> resolution image sensors are in the range of 120 megapixels.  If you are
> talking about a panorama picture sewn together from 5 billion such
> camera pictures, you might have a point - and that's just using 64-bit
> types.


To get back to reality. I chose the PGM format just as an example,
because it is so simple that we could actually post the source code of
the parser in this message. However, in my previous job, I had to deal
with multidimensional scientific data sets. The uncompressed file
formats typically have a binary (or even ASCII) header and a block of
memory (the so called "raw" format), and there may be up to 5 dimensions.

A format close to PGm is NRRD:
http://teem.sourceforge.net/nrrd/format.html Software that reads/writes
this can be found here: http://fiji.sc/

The total data set size can be as large as your computer's main memory,
we had machines with up to 1 TB RAM and data sets regularly with up to
100 GB in size.

So let's say you have a 4D data set, then you get, as input

x
y
z
w
datatype

In some formats, if it's only 2 dimensions, then the 3rd and 4th are
simply set to 1. Also, there may be oddly shaped "images" with 100,000
pixels in one dimension and only 2 or 3 in another dimension. So the
"real" task is:

* Read in x, y, z, w and the data type identifier. Each of x,y,z,w must
be checked to be within the range of size_t.

* Then compute the number of bytes by multiplying these numbers. If it
is smaller then the number of bytes in main memory, alloc memory and
load the file.

The easiest way to do that is with unlimited integers, like in Python:

memsize = x*y*z*w*sizeof(datatype)

if memsize > mainmemory:
	error "Out of memory"

In C++, you need to jump through strange hoops to detect the overflow.
You simply can't limit it such that it will not overflow for 64 bit -
because that would mean you restrict each dimension to 65535, which is
definitely NOT enough. Instead, what would be needed is a function
safe_multiply(x,y) which returns x*y if there is no overflow and throws
an error otherwise, so you could write:

try {
	size_t memsize = safe_multiply(x,y,z,w,sizeof(datatype);
} catch (overflow_error & err) {
	std::cerr<<"Not enough memory";
}


Even that safe_multiply() is extremely hard to write, unless your
compiler supports 128 bit math as an extension.

	Christian

[toc] | [prev] | [next] | [standalone]


#2229 — Re: Optimization techniques and undefined behavior

FromBart <bc@freeuk.com>
Date2019-04-29 18:15 +0100
SubjectRe: Optimization techniques and undefined behavior
Message-ID<19-04-045@comp.compilers>
In reply to#2226
On 29/04/2019 16:08, David Brown wrote:
> On 29/04/2019 01:31, Bart wrote:

>> then you don't want the compiler being clever about overflow.
>
> I /do/ want a result consistent with a single expression, or splitting
> up the expression.

Then the choice is between both ways giving you 1500000000, or both
giving you -647483648.

The former is going to be difficult, since the intermediate 32-bit value
has lost some information. The latter is very easy, and involves dumping
the UB nonsense.

> Questions about what the compiler will do with overflows, like how
> consistent it will be, are as sensible as asking how many miles per
> gallon you get from your car when it has no tires.  You would not drive
> your car without tires - that would be a mistake, a bug in your driving
> procedure.  I don't write signed integer expressions that overflow -
> barring bugs in my coding.  And thus I don't care what the compiler does
> about them, and have no interest in their consistency.

If the gcc people designed cars, either the car wouldn't have an engine
because, since you're always going to end up at your start point,
there's no point in driving it; or it wouldn't have any brakes since you
are never going to have an accident.

> I want the compiler to give me the right answer to valid questions - I
> don't expect it to give me any consistent answer to invalid questions.

What is the question? Hint: it's not the result of 1500000000*2/2, it's
the result of 1500000000*2/2 when the 1500000000 is represented as a
32-bit twos complement binary value, and intermediate calculations are
done to the same precision.

>> but I've just
>> tried 20 or so combinations of compilers and optimise flags, all give a
>> result of -647483648 - except gcc which gave 1500000000. And even gcc
>> gave -647483648 with some versions and some optimisation levels.
>>
>
> Do you understand what "optimising compiler" means?  It means the
> compiler should try to give you code that runs as efficiently as
> possible given valid inputs.  C does not impose any requirements on code
> efficiency, but compiler users certainly do - so a C compiler is not
> going to go out of its way to give poorer quality code.  So given "x * 2
> / 2;", a compiler will do one of two things - return "x" unchanged, or
> carry out the operations using the most obvious assembly instructions.
> A good compiler will thus give you 1500000000 in this case, as that is
> the most efficient implementation consistent with the source code.

And so it will be inconsistent with (in my tests) most other compilers.
My tests were done both with optimisation and without. clang-O3,
gcc81-O3, gcc81-03, and gcc51-O3 gave 1500000000.

All other compilers I tried, including VC, clang-O0 and gcc51-O0, gave
the -647483648 figure. As would my own compilers for other languages (if
using int32, but they now use int64 and the same behaviour is observed
when x is 6000000000000000000).

> It is not a correct answer for standard C signed arithmetic, because
> there is no correct answer.

This is the nub of the issue: *C* has decided that such arithmetic is
undefined. But this is exactly the same 32-bit operation that can be
done in a dozen other languages, probably on most machines that support
32-bit multiply, and most do not make it undefined.

So it is largely a peculiarity of C.


   It is not a correct answer in normal
> mathematics, or almost any real-world problem you might want to model.
> It is, however, correct if you have defined your signed arithmetic to be
> wrapping.  It is fine - but IMHO almost entirely useless and
> counter-productive - for a programming language to define signed
> arithmetic in that way.  C does not define it that way, but other
> languages (and particular C compilers) can do so.

This is contradictory - so a C compiler can choose to make something
that C as deemed undefined behaviour, defined?

>> It is certainly what you might expect on such hardware.
>>
>>> Why do you think a guaranteed wrong and meaningless answer is
>>> better than undefined behaviour?
>>
>> Is it really meaningless? Try the above using x=x*2. It will still
>> overflow and produce a result of -1294967296, apparently incorrect. But
>> print the same bit-pattern using "%u" format, and you get 3000000000 -
>> the right answer. You can predict what's going to happen, /if/ you can
>> predict what a compiler is going to do. Unfortunately with ones like
>> gcc, you can't.
>
> Again, it does not matter if you can predict what value you get if
> something is nonsensical.  It matters that you can predict what values
> are valid inputs, of course, but not what the outputs are when the
> inputs are invalid.  When garbage goes in, I don't care what colour of
> garbage comes out - if I care what is going to come out, I am careful
> about what I put in.

Sorry, but to me a result of 1500000000 would be garbage, as it is
highly misleading. If I didn't intend 1500000000*2/2 to overflow, but
the result was a perfect 1500000000, how would I know there was a bug?

[toc] | [prev] | [next] | [standalone]


Page 3 of 5 — ← Prev page 1 2 [3] 4 5  Next page →

Back to top | Article view | comp.compilers


csiph-web