Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.std.c > #6556

Re: Does reading an uninitialized object have undefined behavior?

From Tim Rentsch <tr.17687@z991.linuxsc.com>
Newsgroups comp.std.c
Subject Re: Does reading an uninitialized object have undefined behavior?
Date 2023-08-26 19:25 -0700
Organization A noiseless patient Spider
Message-ID <868r9xz0ek.fsf@linuxsc.com> (permalink)
References (6 earlier) <fcb2be8f-b346-421f-9804-5f94c93266b0n@googlegroups.com> <864jkz7hrm.fsf@linuxsc.com> <e043af84-3153-4097-9505-666869fcf727n@googlegroups.com> <867cpu5h8w.fsf@linuxsc.com> <a3199783-d8b7-4065-836b-08f647a6808en@googlegroups.com>

Show all headers | View raw


Martin Uecker <ma.uecker@gmail.com> writes:

> On Thursday, August 17, 2023 at 8:13:07?AM UTC+2, Tim Rentsch wrote:
>
>> Martin Uecker <ma.u...@gmail.com> writes:
>>
>> [some unrelated passages removed]
>>
>>> On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote:
>>>
>>>> Martin Uecker <ma.u...@gmail.com> writes:
>>
>> [...]
>>
>>>>> One could still consider the idea that "indeterminate" is an
>>>>> abstract property that yields UB during read even for types
>>>>> that do not have trap representations.  There is no wording
>>>>> in the C standard to support this, but I would not call this
>>>>> idea "fundamentally wrong".  You are right that this is different
>>>>> to provenance provenance which is about values.  What it would
>>>>> have in common with pointer provenance is that there is hidden
>>>>> state in the abstract machine associated with memory that
>>>>> is not part of the representation.  With effective types there
>>>>> is another example of this.
>>>>
>>>> I understand that you want to consider a broader topic, and that,
>>>> in the realm of that broader topic, something like provenance
>>>> could have a role to play.  I think it is worth responding to
>>>> that thesis, and am expecting to do so in a separate reply (or
>>>> new thread?) although probably not right away.
>>>
>>> I would love to hear your comments, because some people
>>> want to have such an abstract of "indeterminate" and
>>> some already believe that this is how the standard should
>>> be understood already today.
>>
>> I've been thinking about this, and am close (I think) to having
>> something to say in response.  Before I do that, thought, let me
>> ask this:  what problem or problems are motivating the question?
>> What problems do you (or "some people") want to solve?  I don't
>> want just examples here;  I'm hoping to get a full list.
>
> There are essentially two main interests driving this.  First,
> there is some interest to precisely formulate the semantics for C.
> The provenance proposal came out of this.
>
> Second, there is the issue of safety problems caused by
> uninitialized reads, together with compiler support for zero
> initialization etc.  So there are various people who want to
> change the semantics for uninitialized variables completely
> in the interest of safety.

This response doesn't answer my question.  What are the problems,
specifically, that people want to solve?  If there isn't a good
understanding of what the problem is, there is little hope of
finding a solution, let alone reaching agreement on whether a
proposed change does in fact solve the problem.  If we don't know
where we're going, any choice of road is equally good.

That said, I understand that you are asking not on your own behalf
but on behalf (perhaps indirectly) of others, and the others might
not know what the problem(s) are that they want to solve.  I think
it's worth asking the question explicitly, What is the problem
that we want to solve here?  Start by simply trying to write a
clear statement of what the problem is;  proceed on to looking for
a solution only after there is agreement (and I don't mean just a
majority vote) about what problem it is the group wants to solve.

(Note added after writing:  I didn't realize when I started how
difficult this subject is and how much there is to say about it.
I hope readers will appreciate the amount of effort that has
been invested, and get some value out of what has been produced,
even if it spends too much time on some less important issues.)

(Also, after having written the whole posting, I see that there
are some aspects that I didn't relate to the indeterminate
question and so didn't address.  If you want me to say more about
formalizing semantics or the issue of safety for uninitialized
variables, I really need some specifics before I can talk about
those.)

(One further thought:  on reading through my comments one last
time, I may have more to say about uninitialized variables.  But
I am deferring that for now, to get this beast out the door.)


> So far, there was no consensus in WG14 that the rules should
> be changed or what the new rules should be.

That's because they don't know what problem it is that they want
to solve.

Consider the question of what happens with padding bits/bytes,
and unnamed members, in structs (unions too of course, but for
now we consider only structs).  The C standard says these bits of
memory take unspecified values whenever there is a store to any
member of the struct (and maybe also at other times, but let's
ignore that).  I understand why this decision was made, namely,
to give more freedom to implementations as to how such operations
are actualized.  But it leaves behind a problem.  Speaking as a
developer, I want the values of these bits to be stable, at least
in certain cases (and I want to be able to choose which cases
those are).  The C language doesn't give me any way to do that,
at least not one that isn't horribly inconvenient.  In making the
decision about padding bits/bytes, the C committee answered the
/question/ but didn't address the /problem/.  I expect that
something similar is going on with the current discussions.

To better understand the landscape, let's look at three different
kinds of undefined behavior.  The illustrating constructions are
signed integer arithmetic, obsolete pointer values, and violating
effective type rules.

Situations where arithmetic on signed integers overflows might be
called /practical/ undefined behavior.  Certainly it would be
possible to require a better-defined semantics (such as giving an
unspecified result), but presumably overflow doesn't come up very
often, it's not clear how useful the "better" result would be,
and the cost in some hardware environments might be prohibitive.
Furthermore there is a fairly easy workaround to avoid overflow:
simply convert to unsigned types, do the operations, and then
convert back.  Overflow being undefined behavior isn't absolutely
necessary but in practical terms it's acceptable.  (I acknowledge
that some people have different views on that last statement.)

An obsolete pointer value is a pointer to an object after the end
of the object's lifetime.  Attempting to make use of an obsolete
pointer value, in any way whatsoever including simply loading it
by means of lvalue conversion, is undefined behavior.  We can
imagine narrowing the scope a bit so simply loading an obsolete
pointer value or comparing one for equality could be better
defined, but any attempt to dereference an obsolete pointer value
is what might be called an /essential/ undefined behavior.  The
problem here is both practical and theoretical:  there is no way
to be sure the underlying hardware will be able to carry out the
asked-for operation (without a machine check, etc), and even if
there were, there is no way to describe what happens in a way
that can be expressed (usefully) in terms that relate to what's
going on in the abstract machine.  There simply is no practical,
useful, sensible way to define the behavior of dereferencing an
obsolete pointer value.

At the other end of the spectrum, violating effective type rules is
what might be called /gratuitous/ undefined behavior.  There is no
particular hardware motivation for choosing UB.  And there is no
problem defining the semantics of a cross-type access, which can be
done definedly in the same way as accessing union members.  So there
is no reason to think that adding cross-type restrictions is
necessary.  An argument can be made that cross-type restrictions
are /desirable/, because they allow code transformations that
improve performance in some cases.

Incidentally, it might seem like effective type rules are similar in
some way to NaT bits or pointer provenance.  They aren't.  NaT bits
are hardware indicators that actually exist, and pointer provenances
are attached to values, not to objects.  Neither of those conditions
hold for effective types.  The seeming similarity to hidden memory
bits is a red herring.

(Also, effective type rules are a lot more complicated than they
seem at first blush, and have some peculiar properties as a result.
They seem to work okay if not looked at too closely, but a closer
look shows some serious shortcomings.  But I digress.)

There are two significant problems with undefined behavior.  The
smaller of the two is that there are no distinctions between the
different classes of undefined behavior.  There is no way around
having some sort of undefined behavior for obsolete pointer values,
but cross-typing rules are a completely different story.  Yet the C
standard puts all the different kinds of undefined behaviors into
the same absolute category.  Sometimes people use compiler options
to turn off, for example, so-called "strict aliasing", and of course
the C standard allows us to do that.  But compilers aren't required
to provide such an option, and if they do the option may not do
exactly what we expect it to do, because there is no standard
specification for it.  The C standard should define officially
sanctioned mechanisms -- as for example standard #pragma's -- to
give standard-defined semantics to certain constructs of undefined
behavior that resemble, eg, -fno-strict-aliasing.

(Let me add in passing that this should be done for some cases of
unspecified behavior as well.  To give one example, the C standard
should provide a way to direct a C compiler to maintain the values
of padding bits and bytes and unnamed members, taking away the
freedom for such things to assume unspecified values.)

The second problem is basically The Law of Unintended Consequences
smashing into The Law of Least Astonishment.  As compiler writers
have gotten more and more clever at exploiting the implications of
"undefined behavior", we see more and more cases of code that looks
reasonable being turned into mush by overly clever "optimizing"
compilers.  There is obviously something wrong with the way this
trend is going -- ever more clever "optimizations", followed by ever
more arcane compiler options to work around the problems caused by
the too-clever compilers.  This problem must be addressed by the C
standard, for if it is not the ecosystem will transform into a
confused state that is exactly what the C standard was put in place
to avoid.  (I do have some ideas about how to address this issue,
but I want to make sure everyone appreciates the extent of the
problem before we start talking about solutions.)

Before leaving the sub-topic of undefined behavior, let me mention
two success stories.  The first is 'restrict':  the performance
implications are local, the choice is under control of the program
(and programmer), and the default choice is to play safe.  Good
show.  The second is the improved sequencing rules introduced in
C11.  A thorny problem, and since C11 handled very deftly.  These
parts of the C language and C standard should be held up as examples
when considering how to go forward on other problems.

And now on to the question of "indeterminate".  Following that, a
somewhat philosophical perspective concerning the nature of the C
standard and the people who work on it.

First an observation.  The idea of "indeterminate values" is
actually two ideas in one:  non-valid abstract /values/ (like
obsolete pointers), and "uninitialized" /objects/ (in quotes
because in some circumstances objects can become "uninitialized"
even after they have been stored into.)  The word "indeterminate"
isn't really right for either of these ideas.  I understand why it
was used in the first C standard, and in that context it seems
okay, but going forward a better word (or words) should be found.
I will keep using it here but please don't get overly attached to
the word, lest it confuse the discussion.

My very strong sense is that some general notion of indeterminate
values (or objects) is a solution in search of a problem.  Let's
look at some different kinds of undefined behavior, while also
considering the lens of "indeterminate values (or objects)".

One: signed integer overflow.  Could this situation somehow produce
an "indeterminate value" that could be stored so it could wreak
havoc later?  Two problems:  no sensible developer is going to want
the bad behavior deferred rather than happening right away, and
besides anything an "indeterminate value" could do can already be
done by virtue of the generating condition being undefined itself.

Two: obsolete pointers.  These values are not indeterminate.  They
start off as valid, become obsolete when their pointed-to object
ends its lifetime, and are always obsolete thereafter.  It isn't
hard to make a formal model for "obsoleteness" (ignoring problems
such as converting pointers to and from integers, and other C-isms).
Of course the formal model doesn't map nicely onto real computer
hardware, because pointers would have far too many bits (and maybe
other problems as well, but let's ignore that).  So we pretend the
extra bits are there, even though they aren't, with a strange
consequence that two pointer objects can have the same object
representation but still be different in that one is obsolete and
the other isn't.  Also a pointer can start off with a non-valid
value, meaning "not null and points to no object".  Here again the
badness remains until a valid pointer value is put into the object;
a pointer object with a non-valid value doesn't ever magically
become valid without having been assigned or stored into.  (Note
that the same formal model for obsolete pointers can accommodate
non-valid pointers, which are simply obsolete at the start.)

Three: effective type rules.  Broken.  One of the weakest areas of
the C standard.  This framework may have started off as not a bad
idea in C90, but looking at it now it's clear that we've gotten
ahead of our skis, sorely in need of a top-to-bottom reformulation,
similar at least in spirit with what was done with sequencing rules
in C11.  Also there should be a standard-defined way of allowing
cross-type interference, with defined behavior, like what was
explained above.  I expect a well-done reformulation of cross-type
(non-)interference rules would have no notion of assigning "magic
state" to objects, and so have no need of any idea of "indeterminate
objects (or values)".

Four: uninitialized objects.  Here we have a question:  Why?  What
problem are we hoping to solve?  Presumably the point of having
uninitialized objects be "indeterminate" is so that reading them
is undefined behavior.  Let's explore that.

I realize of course that any object having a trap representation
(called a non-value representation in the C23 draft) causes
undefined behavior if read using a type in which the object
representation corresponds to a trap representation.  Obviously
there is good reason to say trying to read a trap representation
is undefined behavior.  Some types, notably unsigned char, don't
have any trap representations.  Should reading an uninitialized
object using such a type be undefined behavior?  Speaking as a
developer, I don't see any benefit.  An implementation would have
to go out of its way to do anything other than deliver a valid
unspecified value;  if there is to be undefined behavior, it is
/contrived/ undefined behavior.  Consider:

A:  such UB could allow trapping on any use of an uninitialized
object.  But UB does not guarantee that, and if someone wants
it there are tools like valgrind to get it (and without any
special language support needed to do so).

B:  such UB could allow "optimizations" by clever compiler writers.
The result would be more unexpected code scramblings and more
arcane compiler options to disable them.  A better way to provide
such imagined benefits is by adding one or more new language
constructs, along lines similar to the 'restrict' qualifier, to
selectively enable such performance changes.

C:  future hardware developments might need or take advantage of
such UB.  If and when such things happen it's better to add
specific wording to reflect the new hardware behaviors.  The last
sentence of 6.3.2.1 p2, added in C11, provides an excellent example
of how to accommodate such new hardware developments.

Indeterminate objects is a solution in search of a problem.  To
make progress, first agree on a particular problem.  Only after
that point should possible solutions be considered;  I would be
surprised if some general notion of indeterminateness ever turned
out to be the solution of choice.

Now I would like to offer a perspective on how to view work that
is done in writing the C standard.

In some respects the ISO C committee resembles the US Supreme
Court.  They consider issues, draw conclusions, and ultimately
issue "rulings" in the form of ISO-approved standards documents.
Like the Supreme Court, their decisions are final and cannot be
appealed.

However, the Supreme Court ultimately draws its authority from
how the public views its rulings.  If the rulings get too far out
of line with what the general public believes, confidence in the
Court will decline and its opinions will carry less weight.  (I
don't mean to make a political statement here - I am simply
repeating some analysis I have read recently regarding current
attitudes towards the Court.)

The same is true of the ISO C committee.  They can make whatever
decisions they want, and those decisions will end up being what
goes into the C standard.  At the same time, it's important - I
would say very important - to keep the confidence of people for
whom the C standard is regarded as an important document.  If
that confidence is lost then the C standard will be on its way
to becoming irrelevant.

Unfortunately I have the sense that this trend has already
started.  The most important constituency for the C language (and
so for the C standard) is developers.  Many developers, but in
particular and very especially C developers, want stability.  I
understand the desire to want to "improve" the language.  Getting
agreement on a change has to mean more than a majority vote -- it
needs to be not just accepted but enthusiastically approved and
with overwhelming support.  Too much of what is planned for C23
is coming from the implementation community without regard for
what is beneficial to the development community.  I see the
reported desire for general "indeterminate"-ness as part of this
trend.  It is my hope that those people who are part of the ISO C
committee reflect on this perspective and reconsider where the C
language should go for the next C standard.

Back to comp.std.c | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-07-20 22:16 -0700
  Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-07-21 16:33 +0100
    Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-07-21 11:56 -0700
      Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-07-21 20:54 +0100
        Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-07-21 14:26 -0700
          Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-07-21 23:39 +0100
          Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-12 17:00 -0700
            Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-13 23:41 -0700
              Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-15 21:06 -0700
                Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-15 22:40 -0700
                Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 23:13 -0700
                Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-17 07:08 +0000
                Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-18 12:44 -0700
                Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-19 05:04 +0000
                Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-19 01:36 -0700
                Re: Does reading an uninitialized object have undefined behavior? Richard Damon <Richard@Damon-Family.org> - 2023-08-19 09:18 -0400
                Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-19 11:12 -0700
                Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-18 20:20 -0700
                Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-19 05:23 +0000
                Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-18 22:56 -0700
                Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-18 12:52 -0700
                Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-26 19:25 -0700
                Re: Does reading an uninitialized object have undefined behavior? Spiros Bousbouras <spibou@gmail.com> - 2023-08-27 08:31 +0000
                Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-29 04:35 -0700
                Re: Does reading an uninitialized object have undefined behavior? Spiros Bousbouras <spibou@gmail.com> - 2023-08-30 19:53 +0000
                Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-30 17:40 -0700
                Re: Does reading an uninitialized object have undefined behavior? Spiros Bousbouras <spibou@gmail.com> - 2023-08-31 18:18 +0000
                Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-05 05:39 -0700
                Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-05 17:03 -0700
                Re: Does reading an uninitialized object have undefined behavior? Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2023-09-07 17:09 +0200
                Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-09-07 17:19 +0100
                Re: Does reading an uninitialized object have undefined behavior? Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2023-09-08 23:12 +0200
                Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-09-08 22:31 +0100
    Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-07-22 06:40 +0000
      Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-07-22 06:03 -0700
        Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-07-25 21:53 -0700
      Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 11:11 -0700
  Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-07-21 17:42 +0000
    Re: Does reading an uninitialized object have undefined behavior? Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2023-07-24 07:53 +0200
      Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-07-25 21:57 -0700
  Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-03 13:13 -0700
    Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-03 15:20 -0700
      Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-05 01:15 -0700
      Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 09:19 -0700
      Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-16 19:51 +0000
      Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-16 20:03 +0000
        Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-16 13:43 -0700
          Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-16 21:08 +0000

csiph-web