Groups > comp.std.c > #6505 > unrolled thread

Does reading an uninitialized object have undefined behavior?

Started by	Keith Thompson <Keith.S.Thompson+u@gmail.com>
First post	2023-07-20 22:16 -0700
Last post	2023-08-16 21:08 +0000
Articles	20 on this page of 48 — 8 participants

Back to article view | Back to comp.std.c

Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-07-20 22:16 -0700
Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-07-21 16:33 +0100
Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-07-21 11:56 -0700
Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-07-21 20:54 +0100
Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-07-21 14:26 -0700
Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-07-21 23:39 +0100
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-12 17:00 -0700
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-13 23:41 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-15 21:06 -0700
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-15 22:40 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 23:13 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-17 07:08 +0000
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-18 12:44 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-19 05:04 +0000
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-19 01:36 -0700
Re: Does reading an uninitialized object have undefined behavior? Richard Damon <Richard@Damon-Family.org> - 2023-08-19 09:18 -0400
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-19 11:12 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-18 20:20 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-19 05:23 +0000
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-18 22:56 -0700
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-18 12:52 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-26 19:25 -0700
Re: Does reading an uninitialized object have undefined behavior? Spiros Bousbouras <spibou@gmail.com> - 2023-08-27 08:31 +0000
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-29 04:35 -0700
Re: Does reading an uninitialized object have undefined behavior? Spiros Bousbouras <spibou@gmail.com> - 2023-08-30 19:53 +0000
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-30 17:40 -0700
Re: Does reading an uninitialized object have undefined behavior? Spiros Bousbouras <spibou@gmail.com> - 2023-08-31 18:18 +0000
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-05 05:39 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-05 17:03 -0700
Re: Does reading an uninitialized object have undefined behavior? Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2023-09-07 17:09 +0200
Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-09-07 17:19 +0100
Re: Does reading an uninitialized object have undefined behavior? Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2023-09-08 23:12 +0200
Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-09-08 22:31 +0100
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-07-22 06:40 +0000
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-07-22 06:03 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-07-25 21:53 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 11:11 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-07-21 17:42 +0000
Re: Does reading an uninitialized object have undefined behavior? Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2023-07-24 07:53 +0200
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-07-25 21:57 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-03 13:13 -0700
Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-03 15:20 -0700
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-05 01:15 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 09:19 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-16 19:51 +0000
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-16 20:03 +0000
Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-16 13:43 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-16 21:08 +0000

Page 2 of 3 — ← Prev page 1 [2] 3 Next page →

#6547

From	Martin Uecker <ma.uecker@gmail.com>
Date	2023-08-18 12:52 -0700
Message-ID	<a3199783-d8b7-4065-836b-08f647a6808en@googlegroups.com>
In reply to	#6541

On Thursday, August 17, 2023 at 8:13:07 AM UTC+2, Tim Rentsch wrote:
> Martin Uecker <ma.u...@gmail.com> writes: 
> 
> [some unrelated passages removed]
> > On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote: 
> > 
> >> Martin Uecker <ma.u...@gmail.com> writes:
> [...]
> >>> One could still consider the idea that "indeterminate" is an 
> >>> abstract property that yields UB during read even for types 
> >>> that do not have trap representations. There is no wording 
> >>> in the C standard to support this, but I would not call this 
> >>> idea "fundamentally wrong". You are right that this is different 
> >>> to provenance provenance which is about values. What it would 
> >>> have in common with pointer provenance is that there is hidden 
> >>> state in the abstract machine associated with memory that 
> >>> is not part of the representation. With effective types there 
> >>> is another example of this. 
> >>
> >> I understand that you want to consider a broader topic, and that, 
> >> in the realm of that broader topic, something like provenance 
> >> could have a role to play. I think it is worth responding to 
> >> that thesis, and am expecting to do so in a separate reply (or 
> >> new thread?) although probably not right away. 
> > 
> > I would love to hear your comments, because some people 
> > want to have such an abstract of "indeterminate" and 
> > some already believe that this is how the standard should 
> > be understood already today.
> I've been thinking about this, and am close (I think) to having 
> something to say in response. Before I do that, thought, let me 
> ask this: what problem or problems are motivating the question? 
> What problems do you (or "some people") want to solve? I don't 
> want just examples here; I'm hoping to get a full list.

There are essentially two main interests driving this. First, there
is some interest to precisely formulate the semantics for C.
The provenance proposal came out of this.

Second, there is the issue of safety problems caused by
uninitialized reads, together with compiler support for zero
initialization etc. So there are various people who want to
change the semantics for uninitialized variables completely
in the interest of safety.

So far, there was no consensus in WG14 that the rules should
be changed or what the new rules should be.

Martin

[toc] | [prev] | [next] | [standalone]

#6556

From	Tim Rentsch <tr.17687@z991.linuxsc.com>
Date	2023-08-26 19:25 -0700
Message-ID	<868r9xz0ek.fsf@linuxsc.com>
In reply to	#6547

Martin Uecker <ma.uecker@gmail.com> writes:

> On Thursday, August 17, 2023 at 8:13:07?AM UTC+2, Tim Rentsch wrote:
>
>> Martin Uecker <ma.u...@gmail.com> writes:
>>
>> [some unrelated passages removed]
>>
>>> On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote:
>>>
>>>> Martin Uecker <ma.u...@gmail.com> writes:
>>
>> [...]
>>
>>>>> One could still consider the idea that "indeterminate" is an
>>>>> abstract property that yields UB during read even for types
>>>>> that do not have trap representations.  There is no wording
>>>>> in the C standard to support this, but I would not call this
>>>>> idea "fundamentally wrong".  You are right that this is different
>>>>> to provenance provenance which is about values.  What it would
>>>>> have in common with pointer provenance is that there is hidden
>>>>> state in the abstract machine associated with memory that
>>>>> is not part of the representation.  With effective types there
>>>>> is another example of this.
>>>>
>>>> I understand that you want to consider a broader topic, and that,
>>>> in the realm of that broader topic, something like provenance
>>>> could have a role to play.  I think it is worth responding to
>>>> that thesis, and am expecting to do so in a separate reply (or
>>>> new thread?) although probably not right away.
>>>
>>> I would love to hear your comments, because some people
>>> want to have such an abstract of "indeterminate" and
>>> some already believe that this is how the standard should
>>> be understood already today.
>>
>> I've been thinking about this, and am close (I think) to having
>> something to say in response.  Before I do that, thought, let me
>> ask this:  what problem or problems are motivating the question?
>> What problems do you (or "some people") want to solve?  I don't
>> want just examples here;  I'm hoping to get a full list.
>
> There are essentially two main interests driving this.  First,
> there is some interest to precisely formulate the semantics for C.
> The provenance proposal came out of this.
>
> Second, there is the issue of safety problems caused by
> uninitialized reads, together with compiler support for zero
> initialization etc.  So there are various people who want to
> change the semantics for uninitialized variables completely
> in the interest of safety.

This response doesn't answer my question.  What are the problems,
specifically, that people want to solve?  If there isn't a good
understanding of what the problem is, there is little hope of
finding a solution, let alone reaching agreement on whether a
proposed change does in fact solve the problem.  If we don't know
where we're going, any choice of road is equally good.

That said, I understand that you are asking not on your own behalf
but on behalf (perhaps indirectly) of others, and the others might
not know what the problem(s) are that they want to solve.  I think
it's worth asking the question explicitly, What is the problem
that we want to solve here?  Start by simply trying to write a
clear statement of what the problem is;  proceed on to looking for
a solution only after there is agreement (and I don't mean just a
majority vote) about what problem it is the group wants to solve.

(Note added after writing:  I didn't realize when I started how
difficult this subject is and how much there is to say about it.
I hope readers will appreciate the amount of effort that has
been invested, and get some value out of what has been produced,
even if it spends too much time on some less important issues.)

(Also, after having written the whole posting, I see that there
are some aspects that I didn't relate to the indeterminate
question and so didn't address.  If you want me to say more about
formalizing semantics or the issue of safety for uninitialized
variables, I really need some specifics before I can talk about
those.)

(One further thought:  on reading through my comments one last
time, I may have more to say about uninitialized variables.  But
I am deferring that for now, to get this beast out the door.)

> So far, there was no consensus in WG14 that the rules should
> be changed or what the new rules should be.

That's because they don't know what problem it is that they want
to solve.

Consider the question of what happens with padding bits/bytes,
and unnamed members, in structs (unions too of course, but for
now we consider only structs).  The C standard says these bits of
memory take unspecified values whenever there is a store to any
member of the struct (and maybe also at other times, but let's
ignore that).  I understand why this decision was made, namely,
to give more freedom to implementations as to how such operations
are actualized.  But it leaves behind a problem.  Speaking as a
developer, I want the values of these bits to be stable, at least
in certain cases (and I want to be able to choose which cases
those are).  The C language doesn't give me any way to do that,
at least not one that isn't horribly inconvenient.  In making the
decision about padding bits/bytes, the C committee answered the
/question/ but didn't address the /problem/.  I expect that
something similar is going on with the current discussions.

To better understand the landscape, let's look at three different
kinds of undefined behavior.  The illustrating constructions are
signed integer arithmetic, obsolete pointer values, and violating
effective type rules.

Situations where arithmetic on signed integers overflows might be
called /practical/ undefined behavior.  Certainly it would be
possible to require a better-defined semantics (such as giving an
unspecified result), but presumably overflow doesn't come up very
often, it's not clear how useful the "better" result would be,
and the cost in some hardware environments might be prohibitive.
Furthermore there is a fairly easy workaround to avoid overflow:
simply convert to unsigned types, do the operations, and then
convert back.  Overflow being undefined behavior isn't absolutely
necessary but in practical terms it's acceptable.  (I acknowledge
that some people have different views on that last statement.)

An obsolete pointer value is a pointer to an object after the end
of the object's lifetime.  Attempting to make use of an obsolete
pointer value, in any way whatsoever including simply loading it
by means of lvalue conversion, is undefined behavior.  We can
imagine narrowing the scope a bit so simply loading an obsolete
pointer value or comparing one for equality could be better
defined, but any attempt to dereference an obsolete pointer value
is what might be called an /essential/ undefined behavior.  The
problem here is both practical and theoretical:  there is no way
to be sure the underlying hardware will be able to carry out the
asked-for operation (without a machine check, etc), and even if
there were, there is no way to describe what happens in a way
that can be expressed (usefully) in terms that relate to what's
going on in the abstract machine.  There simply is no practical,
useful, sensible way to define the behavior of dereferencing an
obsolete pointer value.

At the other end of the spectrum, violating effective type rules is
what might be called /gratuitous/ undefined behavior.  There is no
particular hardware motivation for choosing UB.  And there is no
problem defining the semantics of a cross-type access, which can be
done definedly in the same way as accessing union members.  So there
is no reason to think that adding cross-type restrictions is
necessary.  An argument can be made that cross-type restrictions
are /desirable/, because they allow code transformations that
improve performance in some cases.

Incidentally, it might seem like effective type rules are similar in
some way to NaT bits or pointer provenance.  They aren't.  NaT bits
are hardware indicators that actually exist, and pointer provenances
are attached to values, not to objects.  Neither of those conditions
hold for effective types.  The seeming similarity to hidden memory
bits is a red herring.

(Also, effective type rules are a lot more complicated than they
seem at first blush, and have some peculiar properties as a result.
They seem to work okay if not looked at too closely, but a closer
look shows some serious shortcomings.  But I digress.)

There are two significant problems with undefined behavior.  The
smaller of the two is that there are no distinctions between the
different classes of undefined behavior.  There is no way around
having some sort of undefined behavior for obsolete pointer values,
but cross-typing rules are a completely different story.  Yet the C
standard puts all the different kinds of undefined behaviors into
the same absolute category.  Sometimes people use compiler options
to turn off, for example, so-called "strict aliasing", and of course
the C standard allows us to do that.  But compilers aren't required
to provide such an option, and if they do the option may not do
exactly what we expect it to do, because there is no standard
specification for it.  The C standard should define officially
sanctioned mechanisms -- as for example standard #pragma's -- to
give standard-defined semantics to certain constructs of undefined
behavior that resemble, eg, -fno-strict-aliasing.

(Let me add in passing that this should be done for some cases of
unspecified behavior as well.  To give one example, the C standard
should provide a way to direct a C compiler to maintain the values
of padding bits and bytes and unnamed members, taking away the
freedom for such things to assume unspecified values.)

The second problem is basically The Law of Unintended Consequences
smashing into The Law of Least Astonishment.  As compiler writers
have gotten more and more clever at exploiting the implications of
"undefined behavior", we see more and more cases of code that looks
reasonable being turned into mush by overly clever "optimizing"
compilers.  There is obviously something wrong with the way this
trend is going -- ever more clever "optimizations", followed by ever
more arcane compiler options to work around the problems caused by
the too-clever compilers.  This problem must be addressed by the C
standard, for if it is not the ecosystem will transform into a
confused state that is exactly what the C standard was put in place
to avoid.  (I do have some ideas about how to address this issue,
but I want to make sure everyone appreciates the extent of the
problem before we start talking about solutions.)

Before leaving the sub-topic of undefined behavior, let me mention
two success stories.  The first is 'restrict':  the performance
implications are local, the choice is under control of the program
(and programmer), and the default choice is to play safe.  Good
show.  The second is the improved sequencing rules introduced in
C11.  A thorny problem, and since C11 handled very deftly.  These
parts of the C language and C standard should be held up as examples
when considering how to go forward on other problems.

And now on to the question of "indeterminate".  Following that, a
somewhat philosophical perspective concerning the nature of the C
standard and the people who work on it.

First an observation.  The idea of "indeterminate values" is
actually two ideas in one:  non-valid abstract /values/ (like
obsolete pointers), and "uninitialized" /objects/ (in quotes
because in some circumstances objects can become "uninitialized"
even after they have been stored into.)  The word "indeterminate"
isn't really right for either of these ideas.  I understand why it
was used in the first C standard, and in that context it seems
okay, but going forward a better word (or words) should be found.
I will keep using it here but please don't get overly attached to
the word, lest it confuse the discussion.

My very strong sense is that some general notion of indeterminate
values (or objects) is a solution in search of a problem.  Let's
look at some different kinds of undefined behavior, while also
considering the lens of "indeterminate values (or objects)".

One: signed integer overflow.  Could this situation somehow produce
an "indeterminate value" that could be stored so it could wreak
havoc later?  Two problems:  no sensible developer is going to want
the bad behavior deferred rather than happening right away, and
besides anything an "indeterminate value" could do can already be
done by virtue of the generating condition being undefined itself.

Two: obsolete pointers.  These values are not indeterminate.  They
start off as valid, become obsolete when their pointed-to object
ends its lifetime, and are always obsolete thereafter.  It isn't
hard to make a formal model for "obsoleteness" (ignoring problems
such as converting pointers to and from integers, and other C-isms).
Of course the formal model doesn't map nicely onto real computer
hardware, because pointers would have far too many bits (and maybe
other problems as well, but let's ignore that).  So we pretend the
extra bits are there, even though they aren't, with a strange
consequence that two pointer objects can have the same object
representation but still be different in that one is obsolete and
the other isn't.  Also a pointer can start off with a non-valid
value, meaning "not null and points to no object".  Here again the
badness remains until a valid pointer value is put into the object;
a pointer object with a non-valid value doesn't ever magically
become valid without having been assigned or stored into.  (Note
that the same formal model for obsolete pointers can accommodate
non-valid pointers, which are simply obsolete at the start.)

Three: effective type rules.  Broken.  One of the weakest areas of
the C standard.  This framework may have started off as not a bad
idea in C90, but looking at it now it's clear that we've gotten
ahead of our skis, sorely in need of a top-to-bottom reformulation,
similar at least in spirit with what was done with sequencing rules
in C11.  Also there should be a standard-defined way of allowing
cross-type interference, with defined behavior, like what was
explained above.  I expect a well-done reformulation of cross-type
(non-)interference rules would have no notion of assigning "magic
state" to objects, and so have no need of any idea of "indeterminate
objects (or values)".

Four: uninitialized objects.  Here we have a question:  Why?  What
problem are we hoping to solve?  Presumably the point of having
uninitialized objects be "indeterminate" is so that reading them
is undefined behavior.  Let's explore that.

I realize of course that any object having a trap representation
(called a non-value representation in the C23 draft) causes
undefined behavior if read using a type in which the object
representation corresponds to a trap representation.  Obviously
there is good reason to say trying to read a trap representation
is undefined behavior.  Some types, notably unsigned char, don't
have any trap representations.  Should reading an uninitialized
object using such a type be undefined behavior?  Speaking as a
developer, I don't see any benefit.  An implementation would have
to go out of its way to do anything other than deliver a valid
unspecified value;  if there is to be undefined behavior, it is
/contrived/ undefined behavior.  Consider:

A:  such UB could allow trapping on any use of an uninitialized
object.  But UB does not guarantee that, and if someone wants
it there are tools like valgrind to get it (and without any
special language support needed to do so).

B:  such UB could allow "optimizations" by clever compiler writers.
The result would be more unexpected code scramblings and more
arcane compiler options to disable them.  A better way to provide
such imagined benefits is by adding one or more new language
constructs, along lines similar to the 'restrict' qualifier, to
selectively enable such performance changes.

C:  future hardware developments might need or take advantage of
such UB.  If and when such things happen it's better to add
specific wording to reflect the new hardware behaviors.  The last
sentence of 6.3.2.1 p2, added in C11, provides an excellent example
of how to accommodate such new hardware developments.

Indeterminate objects is a solution in search of a problem.  To
make progress, first agree on a particular problem.  Only after
that point should possible solutions be considered;  I would be
surprised if some general notion of indeterminateness ever turned
out to be the solution of choice.

Now I would like to offer a perspective on how to view work that
is done in writing the C standard.

In some respects the ISO C committee resembles the US Supreme
Court.  They consider issues, draw conclusions, and ultimately
issue "rulings" in the form of ISO-approved standards documents.
Like the Supreme Court, their decisions are final and cannot be
appealed.

However, the Supreme Court ultimately draws its authority from
how the public views its rulings.  If the rulings get too far out
of line with what the general public believes, confidence in the
Court will decline and its opinions will carry less weight.  (I
don't mean to make a political statement here - I am simply
repeating some analysis I have read recently regarding current
attitudes towards the Court.)

The same is true of the ISO C committee.  They can make whatever
decisions they want, and those decisions will end up being what
goes into the C standard.  At the same time, it's important - I
would say very important - to keep the confidence of people for
whom the C standard is regarded as an important document.  If
that confidence is lost then the C standard will be on its way
to becoming irrelevant.

Unfortunately I have the sense that this trend has already
started.  The most important constituency for the C language (and
so for the C standard) is developers.  Many developers, but in
particular and very especially C developers, want stability.  I
understand the desire to want to "improve" the language.  Getting
agreement on a change has to mean more than a majority vote -- it
needs to be not just accepted but enthusiastically approved and
with overwhelming support.  Too much of what is planned for C23
is coming from the implementation community without regard for
what is beneficial to the development community.  I see the
reported desire for general "indeterminate"-ness as part of this
trend.  It is my hope that those people who are part of the ISO C
committee reflect on this perspective and reconsider where the C
language should go for the next C standard.

[toc] | [prev] | [next] | [standalone]

#6557

From	Spiros Bousbouras <spibou@gmail.com>
Date	2023-08-27 08:31 +0000
Message-ID	<5+eRe7cp3yQjL4=AX@bongo-ra.co>
In reply to	#6556

On Sat, 26 Aug 2023 19:25:55 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
> Sometimes people use compiler options
> to turn off, for example, so-called "strict aliasing", and of course
> the C standard allows us to do that.  But compilers aren't required
> to provide such an option, and if they do the option may not do
> exactly what we expect it to do, because there is no standard
> specification for it.  The C standard should define officially
> sanctioned mechanisms -- as for example standard #pragma's -- to
> give standard-defined semantics to certain constructs of undefined
> behavior that resemble, eg, -fno-strict-aliasing.

Surely the starting point for this should be the documentation of the
compilers to specify precisely what  -fno-strict-aliasing  does. If
a consensus emerges out of these precise specifications or C programmers
indicate that they prefer the specification of some particular compiler
then this can become part of the standard. Adding a relevant  #pragma
should be trivial.

> The second problem is basically The Law of Unintended Consequences
> smashing into The Law of Least Astonishment.  As compiler writers
> have gotten more and more clever at exploiting the implications of
> "undefined behavior", we see more and more cases of code that looks
> reasonable being turned into mush by overly clever "optimizing"
> compilers.  There is obviously something wrong with the way this
> trend is going -- ever more clever "optimizations", followed by ever
> more arcane compiler options to work around the problems caused by
> the too-clever compilers.  This problem must be addressed by the C
> standard, for if it is not the ecosystem will transform into a
> confused state that is exactly what the C standard was put in place
> to avoid.  (I do have some ideas about how to address this issue,
> but I want to make sure everyone appreciates the extent of the
> problem before we start talking about solutions.)

Without specific examples , it's impossible to comment on this. Why did
the "reasonable" code have the undefined behaviour ? Could the result
the programmer was aiming for have been achieved with defined behaviour
? For example it has been pointed out on  comp.lang.c  that it's
impossible to write a  malloc()  implementation in conforming C. This is
certainly a weakness which should be addressed with some appropriate
#pragma .

> Before leaving the sub-topic of undefined behavior, let me mention
> two success stories.  The first is 'restrict':  the performance
> implications are local, the choice is under control of the program
> (and programmer), and the default choice is to play safe.  Good
> show.

From my point of view , restrict  is not a success because the
specification of  restrict  is the one part of the C1999 standard I have
given up trying to understand. I understand the underlying idea but the
specifics elude me. I remember many years ago someone asked on this
group about some code involving  restrict  and a member of the standard
committee replied and I found the reply counterintuitive. So I have
decided to not use  restrict  in my own code taking also into account
that I don't need the microoptimisations which  restrict  is intended to
allow. But for all I know , people who do need these optimisations find
the specification of  restrict  in the standard perfectly adequate.

-- 
It is not widely known that the "CPC" in "Amstrad CPC" actually stands
for "cool people club".

[toc] | [prev] | [next] | [standalone]

#6559

From	Tim Rentsch <tr.17687@z991.linuxsc.com>
Date	2023-08-29 04:35 -0700
Message-ID	<86sf82ulmb.fsf@linuxsc.com>
In reply to	#6557

Spiros Bousbouras <spibou@gmail.com> writes:

> On Sat, 26 Aug 2023 19:25:55 -0700
> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>
>> Sometimes people use compiler options to turn off, for example,
>> so-called "strict aliasing", and of course the C standard allows
>> us to do that.  But compilers aren't required to provide such an
>> option, and if they do the option may not do exactly what we
>> expect it to do, because there is no standard specification for
>> it.  The C standard should define officially sanctioned
>> mechanisms -- as for example standard #pragma's -- to give
>> standard-defined semantics to certain constructs of undefined
>> behavior that resemble, eg, -fno-strict-aliasing.
>
> Surely the starting point for this should be the documentation of
> the compilers to specify precisely what -fno-strict-aliasing does.
> [...]

Not at all.  It's easy to write a specification that says what we
want to do, along similar lines to what is said in the footnote
about union member access in section 6.5.2.3

   If the member used to access the contents of a union object
   is not the same as the member last used to store a value in
   the object, the appropriate part of the object representation
   of the value is reinterpreted as an object representation in
   the new type as described in 6.2.6 (a process sometimes called
   "type punning").  This might be a trap representation.

That behavior should be the default, for all accesses.  For cases
where a developer wants to give permission to the compiler to
optimize based on cross-type non-interference assumptions, there
should be a #pragma to do something similar to what effective type
rules do now.  The effective type rules are in need of re-writing
anyway, and making type punning be the default doesn't break any
programs, because compilers are already free to ignore the
implications of violating effective type conditions.

>> The second problem is basically The Law of Unintended Consequences
>> smashing into The Law of Least Astonishment.  As compiler writers
>> have gotten more and more clever at exploiting the implications of
>> "undefined behavior", we see more and more cases of code that looks
>> reasonable being turned into mush by overly clever "optimizing"
>> compilers.  There is obviously something wrong with the way this
>> trend is going -- ever more clever "optimizations", followed by
>> ever more arcane compiler options to work around the problems
>> caused by the too-clever compilers.  This problem must be addressed
>> by the C standard, for if it is not the ecosystem will transform
>> into a confused state that is exactly what the C standard was put
>> in place to avoid.  (I do have some ideas about how to address this
>> issue, but I want to make sure everyone appreciates the extent of
>> the problem before we start talking about solutions.)
>
> Without specific examples , it's impossible to comment on this.
> [...]

I feel that so much has been written about this issue that it
isn't necessary for me to elaborate.

> For example it has been pointed out on comp.lang.c that it's
> impossible to write a malloc() implementation in conforming
> C.  This is certainly a weakness which should be addressed with
> some appropriate #pragma .

There isn't any reason to think malloc() should be writable in
completely portable C.  That's the point of putting malloc() in
the system library in the first place.  By the way, with type
punning semantics mentioned above being the default, and with the
alignment features added in C11, I think it is possible to write
malloc() in portable C without needed any additional language
changes.  But even if it isn't that is no cause for concern;  one
of the principal reasons for having a system library is to
provide functionality that the core language cannot express (or
cannot express conveniently).

>> Before leaving the sub-topic of undefined behavior, let me mention
>> two success stories.  The first is 'restrict':  the performance
>> implications are local, the choice is under control of the program
>> (and programmer), and the default choice is to play safe.  Good
>> show.
>
> From my point of view , restrict is not a success because the
> specification of restrict is the one part of the C1999 standard I
> have given up trying to understand.  I understand the underlying
> idea but the specifics elude me.  [...]

I agree the formal definition of restrict is rather daunting.  In
practice though I think using restrict with confidence is not
overly difficult.  My working model for restrict is something
like this:

   1.  Use restrict only in the declarations of function
       parameters.

   2.  For a declaration like  const T *restrict foo  ,
       the compiler may assume that any objects that can be
       accessed through 'foo' will not be modified.

   3.  For a declaration like  T *restrict bas  ,
       the compiler may assume that any changes to objects
       that can be accessed through 'bas' will be done
       using 'bas' or a pointer value derived from 'bas'
       (and in particular that no changes will happen
       other than through 'bas' or 'bas'-derived pointer
       values).

Is this summary description helpful?

[toc] | [prev] | [next] | [standalone]

#6560

From	Spiros Bousbouras <spibou@gmail.com>
Date	2023-08-30 19:53 +0000
Message-ID	<KvVxh3+WExIyDnM+5@bongo-ra.co>
In reply to	#6559

On Tue, 29 Aug 2023 04:35:40 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
> Spiros Bousbouras <spibou@gmail.com> writes:
> 
> > On Sat, 26 Aug 2023 19:25:55 -0700
> > Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
> >
> >> Sometimes people use compiler options to turn off, for example,
> >> so-called "strict aliasing", and of course the C standard allows
> >> us to do that.  But compilers aren't required to provide such an
> >> option, and if they do the option may not do exactly what we
> >> expect it to do, because there is no standard specification for
> >> it.  The C standard should define officially sanctioned
> >> mechanisms -- as for example standard #pragma's -- to give
> >> standard-defined semantics to certain constructs of undefined
> >> behavior that resemble, eg, -fno-strict-aliasing.
> >
> > Surely the starting point for this should be the documentation of
> > the compilers to specify precisely what -fno-strict-aliasing does.
> > [...]
> 
> Not at all.  It's easy to write a specification that says what we
> want to do, along similar lines to what is said in the footnote
> about union member access in section 6.5.2.3
> 
>    If the member used to access the contents of a union object
>    is not the same as the member last used to store a value in
>    the object, the appropriate part of the object representation
>    of the value is reinterpreted as an object representation in
>    the new type as described in 6.2.6 (a process sometimes called
>    "type punning").  This might be a trap representation.

Works for me but it would be good to know that this is how compiler
writers actually understand  -fno-strict-aliasing .Is there any compiler
documentation which says something like this ?

> That behavior should be the default, for all accesses.  For cases
> where a developer wants to give permission to the compiler to
> optimize based on cross-type non-interference assumptions, there
> should be a #pragma to do something similar to what effective type
> rules do now.  The effective type rules are in need of re-writing
> anyway, and making type punning be the default doesn't break any
> programs, because compilers are already free to ignore the
> implications of violating effective type conditions.

[...]

> > For example it has been pointed out on comp.lang.c that it's
> > impossible to write a malloc() implementation in conforming
> > C.  This is certainly a weakness which should be addressed with
> > some appropriate #pragma .
> 
> There isn't any reason to think malloc() should be writable in
> completely portable C.  That's the point of putting malloc() in
> the system library in the first place.  By the way, with type
> punning semantics mentioned above being the default, and with the
> alignment features added in C11, I think it is possible to write
> malloc() in portable C without needed any additional language
> changes.  But even if it isn't that is no cause for concern;  one
> of the principal reasons for having a system library is to
> provide functionality that the core language cannot express (or
> cannot express conveniently).

One might want to experiment with different allocation algorithms
and it seems to me that this sort of thing is within the "remit" of
C. So ideally one should be able to write it in C and prove , starting
from the standard or precise specifications in compiler documentation ,
that it works correctly. I don't necessarily mean prove the correctness
of the whole code but certain key parts.

Another application I have in mind is languages which get translated
to C and support garbage collection. Again one might want to use the
standard  malloc()  to allocate a large block of memory and use different
parts of this memory for different types of objects.

If with the semantics you propose these things are possible , I'm happy.
I'm not bothered which is the default as long as there is a precise
specification from which you can reason that you get the desired behaviour.

> >> Before leaving the sub-topic of undefined behavior, let me mention
> >> two success stories.  The first is 'restrict':  the performance
> >> implications are local, the choice is under control of the program
> >> (and programmer), and the default choice is to play safe.  Good
> >> show.
> >
> > From my point of view , restrict is not a success because the
> > specification of restrict is the one part of the C1999 standard I
> > have given up trying to understand.  I understand the underlying
> > idea but the specifics elude me.  [...]
> 
> I agree the formal definition of restrict is rather daunting.  In
> practice though I think using restrict with confidence is not
> overly difficult.  My working model for restrict is something
> like this:
> 
>    1.  Use restrict only in the declarations of function
>        parameters.
> 
>    2.  For a declaration like  const T *restrict foo  ,
>        the compiler may assume that any objects that can be
>        accessed through 'foo' will not be modified.

Wouldn't that also be the case with just  const T * foo ?

>    3.  For a declaration like  T *restrict bas  ,
>        the compiler may assume that any changes to objects
>        that can be accessed through 'bas' will be done
>        using 'bas' or a pointer value derived from 'bas'
>        (and in particular that no changes will happen
>        other than through 'bas' or 'bas'-derived pointer
>        values).
> 
> Is this summary description helpful?

It seems clear enough but , as I've said , I don't have any use for
restrict  anyway and it's not worth it for me to expend the additional
mental effort to confirm that my code obeys the additional restrictions
of  restrict .If I call a function with a preexisting interface which
involves  restrict  then it seems easy enough to obey the restrictions.

-- 
Carrie also narrates the film, providing useful guidelines for those
challenged by its intricacies. Sample: "Later that day, Big and I
arrived home."
  http://www.rogerebert.com/reviews/sex-and-the-city-2-2010

[toc] | [prev] | [next] | [standalone]

#6561

From	Tim Rentsch <tr.17687@z991.linuxsc.com>
Date	2023-08-30 17:40 -0700
Message-ID	<86zg28t563.fsf@linuxsc.com>
In reply to	#6560

Spiros Bousbouras <spibou@gmail.com> writes:

> On Tue, 29 Aug 2023 04:35:40 -0700
> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>
>> Spiros Bousbouras <spibou@gmail.com> writes:
>>
>>> On Sat, 26 Aug 2023 19:25:55 -0700
>>> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>>>
>>>> Sometimes people use compiler options to turn off, for example,
>>>> so-called "strict aliasing", and of course the C standard allows
>>>> us to do that.  But compilers aren't required to provide such an
>>>> option, and if they do the option may not do exactly what we
>>>> expect it to do, because there is no standard specification for
>>>> it.  The C standard should define officially sanctioned
>>>> mechanisms -- as for example standard #pragma's -- to give
>>>> standard-defined semantics to certain constructs of undefined
>>>> behavior that resemble, eg, -fno-strict-aliasing.
>>>
>>> Surely the starting point for this should be the documentation of
>>> the compilers to specify precisely what -fno-strict-aliasing does.
>>> [...]
>>
>> Not at all.  It's easy to write a specification that says what we
>> want to do, along similar lines to what is said in the footnote
>> about union member access in section 6.5.2.3
>>
>>    If the member used to access the contents of a union object
>>    is not the same as the member last used to store a value in
>>    the object, the appropriate part of the object representation
>>    of the value is reinterpreted as an object representation in
>>    the new type as described in 6.2.6 (a process sometimes called
>>    "type punning").  This might be a trap representation.
>
> Works for me but it would be good to know that this is how compiler
> writers actually understand  -fno-strict-aliasing .  [...]

No, it wouldn't.  Implementations follow the C standard, not
the other way around.  Looking at what implementations do for
the -fno-strict-aliasing flag is worse than a waste of time.

>>> For example it has been pointed out on comp.lang.c that it's
>>> impossible to write a malloc() implementation in conforming
>>> C.  This is certainly a weakness which should be addressed with
>>> some appropriate #pragma .
>>
>> There isn't any reason to think malloc() should be writable in
>> completely portable C.  That's the point of putting malloc() in
>> the system library in the first place.  By the way, with type
>> punning semantics mentioned above being the default, and with the
>> alignment features added in C11, I think it is possible to write
>> malloc() in portable C without needed any additional language
>> changes.  But even if it isn't that is no cause for concern;  one
>> of the principal reasons for having a system library is to
>> provide functionality that the core language cannot express (or
>> cannot express conveniently).
>
> One might want to experiment with different allocation algorithms
> and it seems to me that this sort of thing is within the "remit" of
> C. So ideally one should be able to write it in C [...]

You're conflating writing something in C and writing something
in completely portable C.  It's already possible to do these
things writing in C.

>>> From my point of view , restrict is not a success because the
>>> specification of restrict is the one part of the C1999 standard I
>>> have given up trying to understand.  I understand the underlying
>>> idea but the specifics elude me.  [...]
>>
>> I agree the formal definition of restrict is rather daunting.  In
>> practice though I think using restrict with confidence is not
>> overly difficult.  My working model for restrict is something
>> like this:
>>
>>    1.  Use restrict only in the declarations of function
>>        parameters.
>>
>>    2.  For a declaration like  const T *restrict foo  ,
>>        the compiler may assume that any objects that can be
>>        accessed through 'foo' will not be modified.
>
> Wouldn't that also be the case with just  const T * foo ?

No.

>>    3.  For a declaration like  T *restrict bas  ,
>>        the compiler may assume that any changes to objects
>>        that can be accessed through 'bas' will be done
>>        using 'bas' or a pointer value derived from 'bas'
>>        (and in particular that no changes will happen
>>        other than through 'bas' or 'bas'-derived pointer
>>        values).
>>
>> Is this summary description helpful?
>
> It seems clear enough but , as I've said , I don't have any use
> for restrict anyway and it's not worth it for me to expend the
> additional mental effort to confirm that my code obeys the
> additional restrictions of restrict.  [...]

If you don't want to use restrict that is quite okay.  Part of
why I call restrict a success is that it can be ignored, with
only minimal effort, by any developer who doesn't want to use it.

[toc] | [prev] | [next] | [standalone]

#6562

From	Spiros Bousbouras <spibou@gmail.com>
Date	2023-08-31 18:18 +0000
Message-ID	<S+3le+7=sc9SPmPL3@bongo-ra.co>
In reply to	#6561

On Wed, 30 Aug 2023 17:40:52 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
> Spiros Bousbouras <spibou@gmail.com> writes:
> 
> > On Tue, 29 Aug 2023 04:35:40 -0700
> > Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
> >
> >> Spiros Bousbouras <spibou@gmail.com> writes:

[...]

> >> Not at all.  It's easy to write a specification that says what we
> >> want to do, along similar lines to what is said in the footnote
> >> about union member access in section 6.5.2.3
> >>
> >>    If the member used to access the contents of a union object
> >>    is not the same as the member last used to store a value in
> >>    the object, the appropriate part of the object representation
> >>    of the value is reinterpreted as an object representation in
> >>    the new type as described in 6.2.6 (a process sometimes called
> >>    "type punning").  This might be a trap representation.
> >
> > Works for me but it would be good to know that this is how compiler
> > writers actually understand  -fno-strict-aliasing .  [...]
> 
> No, it wouldn't.  Implementations follow the C standard, not
> the other way around.  Looking at what implementations do for
> the -fno-strict-aliasing flag is worse than a waste of time.

Actually the influence goes in both directions. In theory the standard is the
ultimate authority , in practice whatever C compilers one has access to. For
now the standard doesn't have something like  -fno-strict-aliasing  so if one
needs it then looking at what implementations do is the only option. But even
the standard committee should look at it and whether C programmers find it
useful to decide what around such lines (if anything) should go into the
standard.

> >> There isn't any reason to think malloc() should be writable in
> >> completely portable C.  That's the point of putting malloc() in
> >> the system library in the first place.  By the way, with type
> >> punning semantics mentioned above being the default, and with the
> >> alignment features added in C11, I think it is possible to write
> >> malloc() in portable C without needed any additional language
> >> changes.  But even if it isn't that is no cause for concern;  one
> >> of the principal reasons for having a system library is to
> >> provide functionality that the core language cannot express (or
> >> cannot express conveniently).
> >
> > One might want to experiment with different allocation algorithms
> > and it seems to me that this sort of thing is within the "remit" of
> > C. So ideally one should be able to write it in C [...]
> 
> You're conflating writing something in C and writing something
> in completely portable C.  It's already possible to do these
> things writing in C.

I wrote

    One might want to experiment with different allocation algorithms and it
    seems to me that this sort of thing is within the "remit" of C. So
    ideally one should be able to write it in C and prove , starting from the
    standard or precise specifications in compiler documentation , that it
    works correctly. I don't necessarily mean prove the correctness of the
    whole code but certain key parts.

.This doesn't conflate anything. One can do the writing but can one do the
proving or something close ?

-- 
vlaho.ninja/prog

[toc] | [prev] | [next] | [standalone]

#6563

From	Tim Rentsch <tr.17687@z991.linuxsc.com>
Date	2023-09-05 05:39 -0700
Message-ID	<867cp4pzdu.fsf@linuxsc.com>
In reply to	#6562

Spiros Bousbouras <spibou@gmail.com> writes:

> On Wed, 30 Aug 2023 17:40:52 -0700
> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

[...]

>> You're conflating writing something in C and writing something
>> in completely portable C.  It's already possible to do these
>> things writing in C.
>
> I wrote
>
>     One might want to experiment with different allocation
>     algorithms and it seems to me that this sort of thing is
>     within the "remit" of C. So ideally one should be able to
>     write it in C and prove , starting from the standard or
>     precise specifications in compiler documentation , that it
>     works correctly.  I don't necessarily mean prove the
>     correctness of the whole code but certain key parts.
>
> .This doesn't conflate anything.  One can do the writing but
> can one do the proving or something close ?

A substitute for malloc()/free() can be written in standard C.

A substitute for malloc()/free() can not be written in completely
portable standard C.

I hope this clarifies my earlier comments.

[toc] | [prev] | [next] | [standalone]

#6565

From	Tim Rentsch <tr.17687@z991.linuxsc.com>
Date	2023-09-05 17:03 -0700
Message-ID	<861qfcp3q5.fsf@linuxsc.com>
In reply to	#6547

Martin Uecker <ma.uecker@gmail.com> writes:

[...]

> There are essentially two main interests driving this.  First,
> there is some interest to precisely formulate the semantics for
> C.  The provenance proposal came out of this.
>
> Second, there is the issue of safety problems caused by
> uninitialized reads, together with compiler support for zero
> initialization etc.  So there are various people who want to
> change the semantics for uninitialized variables completely
> in the interest of safety.
>
> So far, there was no consensus in WG14 that the rules should
> be changed or what the new rules should be.

I have a second reply here, which I hope will come closer to
being relevant to the issues of interest.

What I think is being looked for is a way to describe the
language semantics in areas such as cross-type interference and
what is meant when an uninitialized object is read.  I thought
about this question both while I was writing the longer earlier
reply and then more deeply afterwards.

What I think is most important is that these areas in particular
are not about language semantics in the same way as, for example,
array indexing.  Rather they are about what transformations a
compiler is allowed to do in the presence of various combinations
of program constructs.  That difference means the C standard
should express the rules in a way that more directly reflects
what's going on.  More specifically, the standard should say or
explain what can be done, not by describing language semantics
(which is indirect), but explicitly in terms of what compiler
transformations are allowed (which is direct).  Note that there
is precedent for this idea, in how the C standard talks about
looping constructs and when they may be assumed to terminate.

To give an example, take uninitialized objects, either automatic
variables without an initializer, or memory allocated by malloc or
added by realloc.  The most natural semantics for such situations
is to say that newly "created" memory gets an unspecified object
representation at the start of its lifetime.  (Yes I know that C
in its current form lets automatic objects be "uninitialized"
whenever their declaration points are reached, but let's ignore
that for now.)  Now suppose a program has a read access where it
is easy to deduce that the object being read is still in the
"unspecified object representation" initial state.  To simplify
the discussion, suppose the type of the access is a pointer type,
and so is known to have trap representations (the name is changed
in the C23 draft, but the idea is what's important).

What is a compiler allowed to do in such circumstances?  One thing
it might reasonably be allowed to do is to cause the program to be
terminated if it ever reaches such an access.  Or there might be
an option to initialize the pointer to NULL.  Or, if a suitable
compiler option were invoked, the construct might be flagged with
a fatal error (or of course a warning).  There are all sorts of
actions a developer might want the compiler to take, and a
compiler could offer many of those options, as choices selected
under control of command line switches (or equivalent).  I think a
few points are worth making.

One, there must be some sort of default action that all compilers
have to support.  The default action in this case might be to
issue a non-fatal diagnostic.

Two, there must be a way for the developer to tell the compiler to
"proceed blindly" - saying, in effect, I accept that the compiled
code might misbehave, but let me take that risk, and generate code
like it's going to work.  (In other words, for the read access, go
ahead and load whatever unspecified object representation happens
to be there.)  A "proceed blindly" choice probably shouldn't be
the default, but it must be available.

Three, the consequence must never be "undefined behavior", unless
there is an explicit stipulation to that effect.  The stipulation
might take the form of a #pragma, or a compiler option, or a code
decoration using "attribute" (whatever the syntax for such things
is).

I know my comments here are somewhat sketchy, but hopefully a
general sense of the ideas gets across.  The suggestions should at
least serve to stimulate further discussion.

[toc] | [prev] | [next] | [standalone]

#6566

From	Jakob Bohm <jb-usenet@wisemo.com.invalid>
Date	2023-09-07 17:09 +0200
Message-ID	<b4qdnRse5OVYemT5nZ2dnZfqn_idnZ2d@giganews.com>
In reply to	#6565

On 2023-09-06 02:03, Tim Rentsch wrote:
> Martin Uecker <ma.uecker@gmail.com> writes:
> 
> [...]
> 
>> There are essentially two main interests driving this.  First,
>> there is some interest to precisely formulate the semantics for
>> C.  The provenance proposal came out of this.
>>
>> Second, there is the issue of safety problems caused by
>> uninitialized reads, together with compiler support for zero
>> initialization etc.  So there are various people who want to
>> change the semantics for uninitialized variables completely
>> in the interest of safety.
>>
>> So far, there was no consensus in WG14 that the rules should
>> be changed or what the new rules should be.
> 
> I have a second reply here, which I hope will come closer to
> being relevant to the issues of interest.
> 
> What I think is being looked for is a way to describe the
> language semantics in areas such as cross-type interference and
> what is meant when an uninitialized object is read.  I thought
> about this question both while I was writing the longer earlier
> reply and then more deeply afterwards.
> 
> What I think is most important is that these areas in particular
> are not about language semantics in the same way as, for example,
> array indexing.  Rather they are about what transformations a
> compiler is allowed to do in the presence of various combinations
> of program constructs.  That difference means the C standard
> should express the rules in a way that more directly reflects
> what's going on.  More specifically, the standard should say or
> explain what can be done, not by describing language semantics
> (which is indirect), but explicitly in terms of what compiler
> transformations are allowed (which is direct).  Note that there
> is precedent for this idea, in how the C standard talks about
> looping constructs and when they may be assumed to terminate.
> 
> To give an example, take uninitialized objects, either automatic
> variables without an initializer, or memory allocated by malloc or
> added by realloc.  The most natural semantics for such situations
> is to say that newly "created" memory gets an unspecified object
> representation at the start of its lifetime.  (Yes I know that C
> in its current form lets automatic objects be "uninitialized"
> whenever their declaration points are reached, but let's ignore
> that for now.)  Now suppose a program has a read access where it
> is easy to deduce that the object being read is still in the
> "unspecified object representation" initial state.  To simplify
> the discussion, suppose the type of the access is a pointer type,
> and so is known to have trap representations (the name is changed
> in the C23 draft, but the idea is what's important).
> 
> What is a compiler allowed to do in such circumstances?  One thing
> it might reasonably be allowed to do is to cause the program to be
> terminated if it ever reaches such an access.  Or there might be
> an option to initialize the pointer to NULL.  Or, if a suitable
> compiler option were invoked, the construct might be flagged with
> a fatal error (or of course a warning).  There are all sorts of
> actions a developer might want the compiler to take, and a
> compiler could offer many of those options, as choices selected
> under control of command line switches (or equivalent).  I think a
> few points are worth making.
> 
> One, there must be some sort of default action that all compilers
> have to support.  The default action in this case might be to
> issue a non-fatal diagnostic.
> 
> Two, there must be a way for the developer to tell the compiler to
> "proceed blindly" - saying, in effect, I accept that the compiled
> code might misbehave, but let me take that risk, and generate code
> like it's going to work.  (In other words, for the read access, go
> ahead and load whatever unspecified object representation happens
> to be there.)  A "proceed blindly" choice probably shouldn't be
> the default, but it must be available.
> 
> Three, the consequence must never be "undefined behavior", unless
> there is an explicit stipulation to that effect.  The stipulation
> might take the form of a #pragma, or a compiler option, or a code
> decoration using "attribute" (whatever the syntax for such things
> is).
> 

Agreed so far!

As a developer of programs in C with practical but not infinite
portability, I very much abhore the mad optimizations that use
language lawyering to state that any code path that might,
hypothetically, exceed the boundaries of standard-enforced behavior
is allowed to be arbitrarily mangled to get a faster bad result.

For example, I have one function which intentionally reads an
uninitialized variable to get a somewhat arbitrary value of a type
with no known trap representation.  I have a number of other
programs which extensively process a block of data before deciding
in some other way if the data is garbage or useful.  This is done
for sound technical reasons but requires that the compiler doesn't
plant landmines all over virgin land.

As another example, I have speed critical code that relies on running
on 2s complement machines with wraparound on signed integer overflow, 
and that code is being very clear and explicit in doing so, but there
is no C90 notation to tell all ISO-C implementation that this is the
intention, thus it is explicit only in comments, not in the tokens
passed to the C compiler.

> I know my comments here are somewhat sketchy, but hopefully a
> general sense of the ideas gets across.  The suggestions should at
> least serve to stimulate further discussion.
> 

I am writing from a similar perspective .

Enjoy

Jakob
-- 
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

[toc] | [prev] | [next] | [standalone]

#6567

From	Ben Bacarisse <ben.usenet@bsb.me.uk>
Date	2023-09-07 17:19 +0100
Message-ID	<87sf7qnefn.fsf@bsb.me.uk>
In reply to	#6566

Jakob Bohm <jb-usenet@wisemo.com.invalid> writes:

> As another example, I have speed critical code that relies on running
> on 2s complement machines with wraparound on signed integer overflow, and
> that code is being very clear and explicit in doing so, but there
> is no C90 notation to tell all ISO-C implementation that this is the
> intention, thus it is explicit only in comments, not in the tokens
> passed to the C compiler.

You can tell the compiler you want 2s complement by using the intN_t
types if you can find one that suits your portability requirements.

And can you not use unsigned arithmetic, re-interpreting as signed for
those places where it matters?  The "overflow" can only happen in
the arithmetic, not in the re-interpretation.

I know this is a deviation from the topic, so feel free to ignore if you
don't want to get into it.

-- 
Ben.

[toc] | [prev] | [next] | [standalone]

#6568

From	Jakob Bohm <jb-usenet@wisemo.com.invalid>
Date	2023-09-08 23:12 +0200
Message-ID	<p5KdnX4UMaaDE2b5nZ2dnZeNn_pj4p2d@giganews.com>
In reply to	#6567

On 2023-09-07 18:19, Ben Bacarisse wrote:
> Jakob Bohm <jb-usenet@wisemo.com.invalid> writes:
> 
>> As another example, I have speed critical code that relies on running
>> on 2s complement machines with wraparound on signed integer overflow, and
>> that code is being very clear and explicit in doing so, but there
>> is no C90 notation to tell all ISO-C implementation that this is the
>> intention, thus it is explicit only in comments, not in the tokens
>> passed to the C compiler.
> 
> You can tell the compiler you want 2s complement by using the intN_t
> types if you can find one that suits your portability requirements.
> 
> And can you not use unsigned arithmetic, re-interpreting as signed for
> those places where it matters?  The "overflow" can only happen in
> the arithmetic, not in the re-interpretation.
> 
> I know this is a deviation from the topic, so feel free to ignore if you
> don't want to get into it.
> 

The code in question has as explicit design condition that the compiler
implements signed versions with wraparound for each unsigned int type .

The code cannot rely on the intN_t types because they were not part of
C90 and thus do not exist as separate types in some targeted compilers.

In the world of C90 compilers, stdint.h was a non-standard system header
that provided convenience names for the most closely matching C90 types
on the platform, and some platforms simply didn't provide that header,
instead documenting how each C90 type mapped to data sizes.

Excessive casting where directly using the desired type seems possible
is highly counter-intuitive and thus it is inherently wrong for an
optimizer to presume the right to mangle code using types such as "int",
"short int", "long int" and "signed char".

Once again this comes down to a language drift from "undefined" meaning
"not defined by this standard" to "An extremely toxic trap condition" .

Enjoy

Jakob
-- 
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

[toc] | [prev] | [next] | [standalone]

#6569

From	Ben Bacarisse <ben.usenet@bsb.me.uk>
Date	2023-09-08 22:31 +0100
Message-ID	<87fs3omjxj.fsf@bsb.me.uk>
In reply to	#6568

Jakob Bohm <jb-usenet@wisemo.com.invalid> writes:

> On 2023-09-07 18:19, Ben Bacarisse wrote:
>> Jakob Bohm <jb-usenet@wisemo.com.invalid> writes:
>> 
>>> As another example, I have speed critical code that relies on running
>>> on 2s complement machines with wraparound on signed integer overflow, and
>>> that code is being very clear and explicit in doing so, but there
>>> is no C90 notation to tell all ISO-C implementation that this is the
>>> intention, thus it is explicit only in comments, not in the tokens
>>> passed to the C compiler.
>> You can tell the compiler you want 2s complement by using the intN_t
>> types if you can find one that suits your portability requirements.
>> And can you not use unsigned arithmetic, re-interpreting as signed for
>> those places where it matters?  The "overflow" can only happen in
>> the arithmetic, not in the re-interpretation.
>> I know this is a deviation from the topic, so feel free to ignore if you
>> don't want to get into it.
>
> The code in question has as explicit design condition that the compiler
> implements signed versions with wraparound for each unsigned int type .
>
> The code cannot rely on the intN_t types because they were not part of
> C90 and thus do not exist as separate types in some targeted
> compilers.

Ah, I didn't know targetting C90 was still a thing.  I've been out of
the business for many years.

> Excessive casting where directly using the desired type seems possible
> is highly counter-intuitive and thus it is inherently wrong for an
> optimizer to presume the right to mangle code using types such as "int",
> "short int", "long int" and "signed char".

I wasn't suggesting casts as they don't remove the undefined behaviour.
But you have a design that suits your needs so it's all good.

-- 
Ben.

[toc] | [prev] | [next] | [standalone]

#6514

From	Kaz Kylheku <864-117-4973@kylheku.com>
Date	2023-07-22 06:40 +0000
Message-ID	<20230721233227.651@kylheku.com>
In reply to	#6507

On 2023-07-21, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
> 6.3.2.1 p2:
>
>   "[...] If the lvalue designates an object of automatic storage
>   duration that could have been declared with the register storage class
>   (never had its address taken), and that object is uninitialized (not
>   declared with an initializer and no assignment to it has been
>   performed prior to use), the behavior is undefined."
>
> seems to cover it.  The restriction on not having it's address taken
> seems odd.

Wording like that looks like someone's solo documentation effort,
not peer-reviewed by an expert commitee.

That looks as if the intent is to allow some diagnoses of uses of
uninitialized variables, while discouraging others.

However, it doesn't seem a good idea to be constraining
implementations in how clever they can be in identifying
an erroneous situation.

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [prev] | [next] | [standalone]

#6515

From	Martin Uecker <ma.uecker@gmail.com>
Date	2023-07-22 06:03 -0700
Message-ID	<21265efa-1bfe-4049-950f-45b75f0b4f71n@googlegroups.com>
In reply to	#6514

On Saturday, July 22, 2023 at 8:40:42 AM UTC+2, Kaz Kylheku wrote:
> On 2023-07-21, Ben Bacarisse <ben.u...@bsb.me.uk> wrote: 
> > 6.3.2.1 p2: 
> > 
> > "[...] If the lvalue designates an object of automatic storage 
> > duration that could have been declared with the register storage class 
> > (never had its address taken), and that object is uninitialized (not 
> > declared with an initializer and no assignment to it has been 
> > performed prior to use), the behavior is undefined." 
> > 
> > seems to cover it. The restriction on not having it's address taken 
> > seems odd.
> Wording like that looks like someone's solo documentation effort, 
> not peer-reviewed by an expert commitee. 
> 
> That looks as if the intent is to allow some diagnoses of uses of 
> uninitialized variables, while discouraging others. 
> 
> However, it doesn't seem a good idea to be constraining 
> implementations in how clever they can be in identifying 
> an erroneous situation.

I personally like this rule  (but I am speaking about me. there is 
no full consensus about the exact interpretation of the standard
nor about what it should say).  I will try to explain why.

In C, we also can access objects using character points. This
should work in all cases, even for non-value (trap) representations,
and is also used in practice a lot to copy uninitialized or partially
initialized objects. If one makes all reads of objects with
indeterminate representation have undefined behavior, than
this would not work anymore.  

If one wants to allow this (and a lot of real-world programs rely
on this), then one has to invent rules how this works with an
abstract (provenance-based) notion of indeterminate values.  
This turns out to be difficult.

But if we keep this rule, it becomes very simple: On the one
hand, all reads of uninitialized automatic variables whose
address is not taken are undefined behavior. This is the most
useful behavior for detecting bugs and/or optimization.

On the other hand, taking an address and working with character
pointer to copy or manipulate an object  is always defined, one 
simply gets unspecified representation bytes (which may be
a non-value representation for some type and it is UB to
read them using a lvalue of this type). So low-level operations
with partially initialized objects work as expected without having
to introduce complicated rules.

It will cost a tiny bit of optimization opportunities, but avoid
a lot of trouble.

Martin

[toc] | [prev] | [next] | [standalone]

#6520

From	Tim Rentsch <tr.17687@z991.linuxsc.com>
Date	2023-07-25 21:53 -0700
Message-ID	<864jlrs28d.fsf@linuxsc.com>
In reply to	#6515

Martin Uecker <ma.uecker@gmail.com> writes:

> On Saturday, July 22, 2023 at 8:40:42?AM UTC+2, Kaz Kylheku wrote:
>
>> On 2023-07-21, Ben Bacarisse <ben.u...@bsb.me.uk> wrote:
>>
>>> 6.3.2.1 p2:
>>>
>>> "[...] If the lvalue designates an object of automatic storage
>>> duration that could have been declared with the register storage
>>> class (never had its address taken), and that object is
>>> uninitialized (not declared with an initializer and no
>>> assignment to it has been performed prior to use), the behavior
>>> is undefined."
>>>
>>> seems to cover it.  The restriction on not having it's address
>>> taken seems odd.
>>
>> [...]
>
> I personally like this rule (but I am speaking about me.  there is
> no full consensus about the exact interpretation of the standard
> nor about what it should say).  I will try to explain why.  [...]

It's a good rule.  I agree with your comments.  I guess it's
possible the wording could be improved, but compared to other
parts of the C standard the clarity of this passage is closer to
the top than it is to the bottom.

[toc] | [prev] | [next] | [standalone]

#6536

From	Tim Rentsch <tr.17687@z991.linuxsc.com>
Date	2023-08-16 11:11 -0700
Message-ID	<86r0o26en6.fsf@linuxsc.com>
In reply to	#6514

Kaz Kylheku <864-117-4973@kylheku.com> writes:

> On 2023-07-21, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
>
>> 6.3.2.1 p2:
>>
>>   "[...] If the lvalue designates an object of automatic storage
>>   duration that could have been declared with the register storage class
>>   (never had its address taken), and that object is uninitialized (not
>>   declared with an initializer and no assignment to it has been
>>   performed prior to use), the behavior is undefined."
>>
>> seems to cover it.  The restriction on not having it's address taken
>> seems odd.
>
> Wording like that looks like someone's solo documentation effort,
> not peer-reviewed by an expert commitee.
>
> That looks as if the intent is to allow some diagnoses of uses of
> uninitialized variables, while discouraging others.

That isn't at all what this passage is about.

[toc] | [prev] | [next] | [standalone]

#6508

From	Kaz Kylheku <864-117-4973@kylheku.com>
Date	2023-07-21 17:42 +0000
Message-ID	<20230721002225.404@kylheku.com>
In reply to	#6505

On 2023-07-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
> N3096 is the last public draft of the upcoming C23 standard.
>
> N3096 J.2 says:
>
>     The behavior is undefined in the following circumstances:
>     [...]
>     (11) The value of an object with automatic storage duration is
>          used while the object has an indeterminate representation
>          (6.2.4, 6.7.10, 6.8).

Personally, I think that the root cause of this whole issue is
the defective definition of indeterminate value.

Indeterminacy must be an abstract concept that is not encoded
in the bits of the object; it is a matter of provenance.

An indeterminate integer could have a valid bit pattern,
such as all zero, yet the implementation should be free to terminate
with a diagnostic (or behave in other ways) when it is accessed.

It should not be possible to tell whether an object is indeterminate
by looking at its bits.

An implementation can track this with meta data. Translation time
flow-analysis data can catch some uses of uninitialized objects;
that's how we get classic uninitialized variable warnings.

An implementation can track uninitialized bits at run-time with
hidden meta-data. The Valgrind debugging tool does this; for
every bit, whose value is necessarily always 0 or 1, it tracks
whether the bit is initialized.

That poor definition of indeterminate value should go.

Otherwise the standard is contradicting itself and doing
silly things like asserting that using an indeterminate value 
is undefined behavior if it is a local variable with automatic
storage.

A reasonable definition of indeterminate might be:

  indeterminate

  an abstract status indicating that a value is invalid,
  irrespective of the content of the bits which constitute
  that value.

  An improperly obtained value is indeterminate(1).

  A previously valid value may lapse into indeterminate status.(2)

  Any use of an indeterminate value is undefined behavior.

  --
  (1) For example, a value obtained accessing an uninitialized
  object defined in automatic storage, or in an uninitializeed
  region of memory obtained from malloc

  (2) For example, a pointer to an object becomes indeterminate
  if that object is deallocated.

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [prev] | [next] | [standalone]

#6516

From	Jakob Bohm <jb-usenet@wisemo.com.invalid>
Date	2023-07-24 07:53 +0200
Message-ID	<hV2dneL6E4fNjSP5nZ2dnZeNn_pg4p2d@giganews.com>
In reply to	#6508

On 2023-07-21 19:42, Kaz Kylheku wrote:
> On 2023-07-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>> N3096 is the last public draft of the upcoming C23 standard.
>>
>> N3096 J.2 says:
>>
>>      The behavior is undefined in the following circumstances:
>>      [...]
>>      (11) The value of an object with automatic storage duration is
>>           used while the object has an indeterminate representation
>>           (6.2.4, 6.7.10, 6.8).
> 
> Personally, I think that the root cause of this whole issue is
> the defective definition of indeterminate value.
> 

The problem is much deeper than that.  It all boils down to the 
obsession in the official C community to abuse the concept of 
"undefined" to cover everything from "arbitrary natural semantics
of the hardware" to "optimizing away code unexpectedly" .  It would
be highly beneficial to a cleanup in C30 or even a corrective TR to
split up the concept into explicit cases that vary for each
situation.  For example, runtime error reporting should be very
different from optimizing away code that may encounter runtime
errors on different hardware than the one it is actually run on.

 From a simplified conceptual machine model that resembles a modern
von Neumann architecture with only floating point types having
actual trap representations, a lot of rules that have at various
times been rephrased using the word "undefined" seem utterly absurd,
and applying the current meaning of "undefined" back to the
actual machines that inspired them will tend to cause even more absurdities.

For example that ability of the IA64 CPUs to raise an actual trap
exception in response to reading an uninitialized register is very
different from aggressively optimizing away code that might use an
unknown stray value, especially with the aggressive optimization
settings required by the IA64 Explicitly Parallel design.

Some of the things that "undefined" in the current text could map
to:

- anyof(A,B,C) = An implementation specific and possibly uncontrolled 
choice between A, B and C (with no others permitted).
- Continuing as if nothing happened
- Aborting execution, possibly with an error indication.
- raise(X) where X is specified in the standard.
- An implementation specific value to be listed in the
  implementation documentation.
- A standard specified value.
- Executing machine code at a specified memory address in accordance
with the actual machine behavior (This is common for calling
a function pointer that isn't set to a C function of proper type).
- Causing the code to be eliminated (think assume(0);)
- Reserved for future standardization in future editions.
- Reserved for standardization in other ISO documents (such as POSIX
  or C++).
- Reserved for implementation specific behavior to be listed in the
  implementation documentation.

For example, the effect of calling assert() with a false value is
"anyof(continuing as if nothing, abort with error)", with it being
implementation defined how to force either choice (many
implementations will use the status of the DEBUG define).

There should also be a way for limits.h (one of the few headers
required in free-standing implementations) to specify via new
standard defines if the implementation conforms to common sets
of implementation specific behaviors such as "twos complement int
with wraparound", "ones complement int with wraparound", "sign
and magnitude int with wraparound", "unsigned with wraparound",
"IEEE nnnn floating point with/without overflow exceptions",
"negative int division by positive int rounds towards zero"
(and the other possibilities for division special cases) etc. etc.

Enjoy

Jakob
-- 
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

[toc] | [prev] | [next] | [standalone]

#6521

From	Tim Rentsch <tr.17687@z991.linuxsc.com>
Date	2023-07-25 21:57 -0700
Message-ID	<86zg3jqngv.fsf@linuxsc.com>
In reply to	#6516

Jakob Bohm <jb-usenet@wisemo.com.invalid> writes:

> On 2023-07-21 19:42, Kaz Kylheku wrote:
>
>> On 2023-07-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>>
>>> N3096 is the last public draft of the upcoming C23 standard.
>>>
>>> N3096 J.2 says:
>>>
>>>      The behavior is undefined in the following circumstances:
>>>      [...]
>>>      (11) The value of an object with automatic storage duration is
>>>           used while the object has an indeterminate representation
>>>           (6.2.4, 6.7.10, 6.8).
>>
>> Personally, I think that the root cause of this whole issue is
>> the defective definition of indeterminate value.
>
> The problem is much deeper than that.  It all boils down to the
> obsession in the official C community to abuse the concept of
> "undefined" to cover everything from "arbitrary natural semantics
> of the hardware" to "optimizing away code unexpectedly" .  [...]

This discussion looks interesting but it seems better that
there be a separate thread to take it up.

[toc] | [prev] | [next] | [standalone]

Page 2 of 3 — ← Prev page 1 [2] 3 Next page →

csiph-web

Does reading an uninitialized object have undefined behavior?

Contents

#6547

#6556

#6557

#6559

#6560

#6561

#6562

#6563

#6565

#6566

#6567

#6568

#6569

#6514

#6515

#6520

#6536

#6508

#6516

#6521