Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Newsgroups | comp.std.c |
| Subject | Re: Does reading an uninitialized object have undefined behavior? |
| Date | 2023-08-26 19:25 -0700 |
| Organization | A noiseless patient Spider |
| Message-ID | <868r9xz0ek.fsf@linuxsc.com> (permalink) |
| References | (6 earlier) <fcb2be8f-b346-421f-9804-5f94c93266b0n@googlegroups.com> <864jkz7hrm.fsf@linuxsc.com> <e043af84-3153-4097-9505-666869fcf727n@googlegroups.com> <867cpu5h8w.fsf@linuxsc.com> <a3199783-d8b7-4065-836b-08f647a6808en@googlegroups.com> |
Martin Uecker <ma.uecker@gmail.com> writes: > On Thursday, August 17, 2023 at 8:13:07?AM UTC+2, Tim Rentsch wrote: > >> Martin Uecker <ma.u...@gmail.com> writes: >> >> [some unrelated passages removed] >> >>> On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote: >>> >>>> Martin Uecker <ma.u...@gmail.com> writes: >> >> [...] >> >>>>> One could still consider the idea that "indeterminate" is an >>>>> abstract property that yields UB during read even for types >>>>> that do not have trap representations. There is no wording >>>>> in the C standard to support this, but I would not call this >>>>> idea "fundamentally wrong". You are right that this is different >>>>> to provenance provenance which is about values. What it would >>>>> have in common with pointer provenance is that there is hidden >>>>> state in the abstract machine associated with memory that >>>>> is not part of the representation. With effective types there >>>>> is another example of this. >>>> >>>> I understand that you want to consider a broader topic, and that, >>>> in the realm of that broader topic, something like provenance >>>> could have a role to play. I think it is worth responding to >>>> that thesis, and am expecting to do so in a separate reply (or >>>> new thread?) although probably not right away. >>> >>> I would love to hear your comments, because some people >>> want to have such an abstract of "indeterminate" and >>> some already believe that this is how the standard should >>> be understood already today. >> >> I've been thinking about this, and am close (I think) to having >> something to say in response. Before I do that, thought, let me >> ask this: what problem or problems are motivating the question? >> What problems do you (or "some people") want to solve? I don't >> want just examples here; I'm hoping to get a full list. > > There are essentially two main interests driving this. First, > there is some interest to precisely formulate the semantics for C. > The provenance proposal came out of this. > > Second, there is the issue of safety problems caused by > uninitialized reads, together with compiler support for zero > initialization etc. So there are various people who want to > change the semantics for uninitialized variables completely > in the interest of safety. This response doesn't answer my question. What are the problems, specifically, that people want to solve? If there isn't a good understanding of what the problem is, there is little hope of finding a solution, let alone reaching agreement on whether a proposed change does in fact solve the problem. If we don't know where we're going, any choice of road is equally good. That said, I understand that you are asking not on your own behalf but on behalf (perhaps indirectly) of others, and the others might not know what the problem(s) are that they want to solve. I think it's worth asking the question explicitly, What is the problem that we want to solve here? Start by simply trying to write a clear statement of what the problem is; proceed on to looking for a solution only after there is agreement (and I don't mean just a majority vote) about what problem it is the group wants to solve. (Note added after writing: I didn't realize when I started how difficult this subject is and how much there is to say about it. I hope readers will appreciate the amount of effort that has been invested, and get some value out of what has been produced, even if it spends too much time on some less important issues.) (Also, after having written the whole posting, I see that there are some aspects that I didn't relate to the indeterminate question and so didn't address. If you want me to say more about formalizing semantics or the issue of safety for uninitialized variables, I really need some specifics before I can talk about those.) (One further thought: on reading through my comments one last time, I may have more to say about uninitialized variables. But I am deferring that for now, to get this beast out the door.) > So far, there was no consensus in WG14 that the rules should > be changed or what the new rules should be. That's because they don't know what problem it is that they want to solve. Consider the question of what happens with padding bits/bytes, and unnamed members, in structs (unions too of course, but for now we consider only structs). The C standard says these bits of memory take unspecified values whenever there is a store to any member of the struct (and maybe also at other times, but let's ignore that). I understand why this decision was made, namely, to give more freedom to implementations as to how such operations are actualized. But it leaves behind a problem. Speaking as a developer, I want the values of these bits to be stable, at least in certain cases (and I want to be able to choose which cases those are). The C language doesn't give me any way to do that, at least not one that isn't horribly inconvenient. In making the decision about padding bits/bytes, the C committee answered the /question/ but didn't address the /problem/. I expect that something similar is going on with the current discussions. To better understand the landscape, let's look at three different kinds of undefined behavior. The illustrating constructions are signed integer arithmetic, obsolete pointer values, and violating effective type rules. Situations where arithmetic on signed integers overflows might be called /practical/ undefined behavior. Certainly it would be possible to require a better-defined semantics (such as giving an unspecified result), but presumably overflow doesn't come up very often, it's not clear how useful the "better" result would be, and the cost in some hardware environments might be prohibitive. Furthermore there is a fairly easy workaround to avoid overflow: simply convert to unsigned types, do the operations, and then convert back. Overflow being undefined behavior isn't absolutely necessary but in practical terms it's acceptable. (I acknowledge that some people have different views on that last statement.) An obsolete pointer value is a pointer to an object after the end of the object's lifetime. Attempting to make use of an obsolete pointer value, in any way whatsoever including simply loading it by means of lvalue conversion, is undefined behavior. We can imagine narrowing the scope a bit so simply loading an obsolete pointer value or comparing one for equality could be better defined, but any attempt to dereference an obsolete pointer value is what might be called an /essential/ undefined behavior. The problem here is both practical and theoretical: there is no way to be sure the underlying hardware will be able to carry out the asked-for operation (without a machine check, etc), and even if there were, there is no way to describe what happens in a way that can be expressed (usefully) in terms that relate to what's going on in the abstract machine. There simply is no practical, useful, sensible way to define the behavior of dereferencing an obsolete pointer value. At the other end of the spectrum, violating effective type rules is what might be called /gratuitous/ undefined behavior. There is no particular hardware motivation for choosing UB. And there is no problem defining the semantics of a cross-type access, which can be done definedly in the same way as accessing union members. So there is no reason to think that adding cross-type restrictions is necessary. An argument can be made that cross-type restrictions are /desirable/, because they allow code transformations that improve performance in some cases. Incidentally, it might seem like effective type rules are similar in some way to NaT bits or pointer provenance. They aren't. NaT bits are hardware indicators that actually exist, and pointer provenances are attached to values, not to objects. Neither of those conditions hold for effective types. The seeming similarity to hidden memory bits is a red herring. (Also, effective type rules are a lot more complicated than they seem at first blush, and have some peculiar properties as a result. They seem to work okay if not looked at too closely, but a closer look shows some serious shortcomings. But I digress.) There are two significant problems with undefined behavior. The smaller of the two is that there are no distinctions between the different classes of undefined behavior. There is no way around having some sort of undefined behavior for obsolete pointer values, but cross-typing rules are a completely different story. Yet the C standard puts all the different kinds of undefined behaviors into the same absolute category. Sometimes people use compiler options to turn off, for example, so-called "strict aliasing", and of course the C standard allows us to do that. But compilers aren't required to provide such an option, and if they do the option may not do exactly what we expect it to do, because there is no standard specification for it. The C standard should define officially sanctioned mechanisms -- as for example standard #pragma's -- to give standard-defined semantics to certain constructs of undefined behavior that resemble, eg, -fno-strict-aliasing. (Let me add in passing that this should be done for some cases of unspecified behavior as well. To give one example, the C standard should provide a way to direct a C compiler to maintain the values of padding bits and bytes and unnamed members, taking away the freedom for such things to assume unspecified values.) The second problem is basically The Law of Unintended Consequences smashing into The Law of Least Astonishment. As compiler writers have gotten more and more clever at exploiting the implications of "undefined behavior", we see more and more cases of code that looks reasonable being turned into mush by overly clever "optimizing" compilers. There is obviously something wrong with the way this trend is going -- ever more clever "optimizations", followed by ever more arcane compiler options to work around the problems caused by the too-clever compilers. This problem must be addressed by the C standard, for if it is not the ecosystem will transform into a confused state that is exactly what the C standard was put in place to avoid. (I do have some ideas about how to address this issue, but I want to make sure everyone appreciates the extent of the problem before we start talking about solutions.) Before leaving the sub-topic of undefined behavior, let me mention two success stories. The first is 'restrict': the performance implications are local, the choice is under control of the program (and programmer), and the default choice is to play safe. Good show. The second is the improved sequencing rules introduced in C11. A thorny problem, and since C11 handled very deftly. These parts of the C language and C standard should be held up as examples when considering how to go forward on other problems. And now on to the question of "indeterminate". Following that, a somewhat philosophical perspective concerning the nature of the C standard and the people who work on it. First an observation. The idea of "indeterminate values" is actually two ideas in one: non-valid abstract /values/ (like obsolete pointers), and "uninitialized" /objects/ (in quotes because in some circumstances objects can become "uninitialized" even after they have been stored into.) The word "indeterminate" isn't really right for either of these ideas. I understand why it was used in the first C standard, and in that context it seems okay, but going forward a better word (or words) should be found. I will keep using it here but please don't get overly attached to the word, lest it confuse the discussion. My very strong sense is that some general notion of indeterminate values (or objects) is a solution in search of a problem. Let's look at some different kinds of undefined behavior, while also considering the lens of "indeterminate values (or objects)". One: signed integer overflow. Could this situation somehow produce an "indeterminate value" that could be stored so it could wreak havoc later? Two problems: no sensible developer is going to want the bad behavior deferred rather than happening right away, and besides anything an "indeterminate value" could do can already be done by virtue of the generating condition being undefined itself. Two: obsolete pointers. These values are not indeterminate. They start off as valid, become obsolete when their pointed-to object ends its lifetime, and are always obsolete thereafter. It isn't hard to make a formal model for "obsoleteness" (ignoring problems such as converting pointers to and from integers, and other C-isms). Of course the formal model doesn't map nicely onto real computer hardware, because pointers would have far too many bits (and maybe other problems as well, but let's ignore that). So we pretend the extra bits are there, even though they aren't, with a strange consequence that two pointer objects can have the same object representation but still be different in that one is obsolete and the other isn't. Also a pointer can start off with a non-valid value, meaning "not null and points to no object". Here again the badness remains until a valid pointer value is put into the object; a pointer object with a non-valid value doesn't ever magically become valid without having been assigned or stored into. (Note that the same formal model for obsolete pointers can accommodate non-valid pointers, which are simply obsolete at the start.) Three: effective type rules. Broken. One of the weakest areas of the C standard. This framework may have started off as not a bad idea in C90, but looking at it now it's clear that we've gotten ahead of our skis, sorely in need of a top-to-bottom reformulation, similar at least in spirit with what was done with sequencing rules in C11. Also there should be a standard-defined way of allowing cross-type interference, with defined behavior, like what was explained above. I expect a well-done reformulation of cross-type (non-)interference rules would have no notion of assigning "magic state" to objects, and so have no need of any idea of "indeterminate objects (or values)". Four: uninitialized objects. Here we have a question: Why? What problem are we hoping to solve? Presumably the point of having uninitialized objects be "indeterminate" is so that reading them is undefined behavior. Let's explore that. I realize of course that any object having a trap representation (called a non-value representation in the C23 draft) causes undefined behavior if read using a type in which the object representation corresponds to a trap representation. Obviously there is good reason to say trying to read a trap representation is undefined behavior. Some types, notably unsigned char, don't have any trap representations. Should reading an uninitialized object using such a type be undefined behavior? Speaking as a developer, I don't see any benefit. An implementation would have to go out of its way to do anything other than deliver a valid unspecified value; if there is to be undefined behavior, it is /contrived/ undefined behavior. Consider: A: such UB could allow trapping on any use of an uninitialized object. But UB does not guarantee that, and if someone wants it there are tools like valgrind to get it (and without any special language support needed to do so). B: such UB could allow "optimizations" by clever compiler writers. The result would be more unexpected code scramblings and more arcane compiler options to disable them. A better way to provide such imagined benefits is by adding one or more new language constructs, along lines similar to the 'restrict' qualifier, to selectively enable such performance changes. C: future hardware developments might need or take advantage of such UB. If and when such things happen it's better to add specific wording to reflect the new hardware behaviors. The last sentence of 6.3.2.1 p2, added in C11, provides an excellent example of how to accommodate such new hardware developments. Indeterminate objects is a solution in search of a problem. To make progress, first agree on a particular problem. Only after that point should possible solutions be considered; I would be surprised if some general notion of indeterminateness ever turned out to be the solution of choice. Now I would like to offer a perspective on how to view work that is done in writing the C standard. In some respects the ISO C committee resembles the US Supreme Court. They consider issues, draw conclusions, and ultimately issue "rulings" in the form of ISO-approved standards documents. Like the Supreme Court, their decisions are final and cannot be appealed. However, the Supreme Court ultimately draws its authority from how the public views its rulings. If the rulings get too far out of line with what the general public believes, confidence in the Court will decline and its opinions will carry less weight. (I don't mean to make a political statement here - I am simply repeating some analysis I have read recently regarding current attitudes towards the Court.) The same is true of the ISO C committee. They can make whatever decisions they want, and those decisions will end up being what goes into the C standard. At the same time, it's important - I would say very important - to keep the confidence of people for whom the C standard is regarded as an important document. If that confidence is lost then the C standard will be on its way to becoming irrelevant. Unfortunately I have the sense that this trend has already started. The most important constituency for the C language (and so for the C standard) is developers. Many developers, but in particular and very especially C developers, want stability. I understand the desire to want to "improve" the language. Getting agreement on a change has to mean more than a majority vote -- it needs to be not just accepted but enthusiastically approved and with overwhelming support. Too much of what is planned for C23 is coming from the implementation community without regard for what is beneficial to the development community. I see the reported desire for general "indeterminate"-ness as part of this trend. It is my hope that those people who are part of the ISO C committee reflect on this perspective and reconsider where the C language should go for the next C standard.
Back to comp.std.c | Previous | Next — Previous in thread | Next in thread | Find similar
Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-07-20 22:16 -0700
Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-07-21 16:33 +0100
Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-07-21 11:56 -0700
Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-07-21 20:54 +0100
Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-07-21 14:26 -0700
Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-07-21 23:39 +0100
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-12 17:00 -0700
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-13 23:41 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-15 21:06 -0700
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-15 22:40 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 23:13 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-17 07:08 +0000
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-18 12:44 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-19 05:04 +0000
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-19 01:36 -0700
Re: Does reading an uninitialized object have undefined behavior? Richard Damon <Richard@Damon-Family.org> - 2023-08-19 09:18 -0400
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-19 11:12 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-18 20:20 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-19 05:23 +0000
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-18 22:56 -0700
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-18 12:52 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-26 19:25 -0700
Re: Does reading an uninitialized object have undefined behavior? Spiros Bousbouras <spibou@gmail.com> - 2023-08-27 08:31 +0000
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-29 04:35 -0700
Re: Does reading an uninitialized object have undefined behavior? Spiros Bousbouras <spibou@gmail.com> - 2023-08-30 19:53 +0000
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-30 17:40 -0700
Re: Does reading an uninitialized object have undefined behavior? Spiros Bousbouras <spibou@gmail.com> - 2023-08-31 18:18 +0000
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-05 05:39 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-05 17:03 -0700
Re: Does reading an uninitialized object have undefined behavior? Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2023-09-07 17:09 +0200
Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-09-07 17:19 +0100
Re: Does reading an uninitialized object have undefined behavior? Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2023-09-08 23:12 +0200
Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-09-08 22:31 +0100
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-07-22 06:40 +0000
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-07-22 06:03 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-07-25 21:53 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 11:11 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-07-21 17:42 +0000
Re: Does reading an uninitialized object have undefined behavior? Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2023-07-24 07:53 +0200
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-07-25 21:57 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-03 13:13 -0700
Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-03 15:20 -0700
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-05 01:15 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 09:19 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-16 19:51 +0000
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-16 20:03 +0000
Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-16 13:43 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-16 21:08 +0000
csiph-web