Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.std.c > #6505 > unrolled thread
| Started by | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| First post | 2023-07-20 22:16 -0700 |
| Last post | 2023-08-16 21:08 +0000 |
| Articles | 20 on this page of 48 — 8 participants |
Back to article view | Back to comp.std.c
Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-07-20 22:16 -0700
Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-07-21 16:33 +0100
Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-07-21 11:56 -0700
Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-07-21 20:54 +0100
Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-07-21 14:26 -0700
Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-07-21 23:39 +0100
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-12 17:00 -0700
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-13 23:41 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-15 21:06 -0700
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-15 22:40 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 23:13 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-17 07:08 +0000
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-18 12:44 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-19 05:04 +0000
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-19 01:36 -0700
Re: Does reading an uninitialized object have undefined behavior? Richard Damon <Richard@Damon-Family.org> - 2023-08-19 09:18 -0400
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-19 11:12 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-18 20:20 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-19 05:23 +0000
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-18 22:56 -0700
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-18 12:52 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-26 19:25 -0700
Re: Does reading an uninitialized object have undefined behavior? Spiros Bousbouras <spibou@gmail.com> - 2023-08-27 08:31 +0000
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-29 04:35 -0700
Re: Does reading an uninitialized object have undefined behavior? Spiros Bousbouras <spibou@gmail.com> - 2023-08-30 19:53 +0000
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-30 17:40 -0700
Re: Does reading an uninitialized object have undefined behavior? Spiros Bousbouras <spibou@gmail.com> - 2023-08-31 18:18 +0000
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-05 05:39 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-05 17:03 -0700
Re: Does reading an uninitialized object have undefined behavior? Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2023-09-07 17:09 +0200
Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-09-07 17:19 +0100
Re: Does reading an uninitialized object have undefined behavior? Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2023-09-08 23:12 +0200
Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-09-08 22:31 +0100
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-07-22 06:40 +0000
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-07-22 06:03 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-07-25 21:53 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 11:11 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-07-21 17:42 +0000
Re: Does reading an uninitialized object have undefined behavior? Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2023-07-24 07:53 +0200
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-07-25 21:57 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-03 13:13 -0700
Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-03 15:20 -0700
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-05 01:15 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 09:19 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-16 19:51 +0000
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-16 20:03 +0000
Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-16 13:43 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-16 21:08 +0000
Page 2 of 3 — ← Prev page 1 [2] 3 Next page →
| From | Martin Uecker <ma.uecker@gmail.com> |
|---|---|
| Date | 2023-08-18 12:52 -0700 |
| Message-ID | <a3199783-d8b7-4065-836b-08f647a6808en@googlegroups.com> |
| In reply to | #6541 |
On Thursday, August 17, 2023 at 8:13:07 AM UTC+2, Tim Rentsch wrote: > Martin Uecker <ma.u...@gmail.com> writes: > > [some unrelated passages removed] > > On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote: > > > >> Martin Uecker <ma.u...@gmail.com> writes: > [...] > >>> One could still consider the idea that "indeterminate" is an > >>> abstract property that yields UB during read even for types > >>> that do not have trap representations. There is no wording > >>> in the C standard to support this, but I would not call this > >>> idea "fundamentally wrong". You are right that this is different > >>> to provenance provenance which is about values. What it would > >>> have in common with pointer provenance is that there is hidden > >>> state in the abstract machine associated with memory that > >>> is not part of the representation. With effective types there > >>> is another example of this. > >> > >> I understand that you want to consider a broader topic, and that, > >> in the realm of that broader topic, something like provenance > >> could have a role to play. I think it is worth responding to > >> that thesis, and am expecting to do so in a separate reply (or > >> new thread?) although probably not right away. > > > > I would love to hear your comments, because some people > > want to have such an abstract of "indeterminate" and > > some already believe that this is how the standard should > > be understood already today. > I've been thinking about this, and am close (I think) to having > something to say in response. Before I do that, thought, let me > ask this: what problem or problems are motivating the question? > What problems do you (or "some people") want to solve? I don't > want just examples here; I'm hoping to get a full list. There are essentially two main interests driving this. First, there is some interest to precisely formulate the semantics for C. The provenance proposal came out of this. Second, there is the issue of safety problems caused by uninitialized reads, together with compiler support for zero initialization etc. So there are various people who want to change the semantics for uninitialized variables completely in the interest of safety. So far, there was no consensus in WG14 that the rules should be changed or what the new rules should be. Martin
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2023-08-26 19:25 -0700 |
| Message-ID | <868r9xz0ek.fsf@linuxsc.com> |
| In reply to | #6547 |
Martin Uecker <ma.uecker@gmail.com> writes: > On Thursday, August 17, 2023 at 8:13:07?AM UTC+2, Tim Rentsch wrote: > >> Martin Uecker <ma.u...@gmail.com> writes: >> >> [some unrelated passages removed] >> >>> On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote: >>> >>>> Martin Uecker <ma.u...@gmail.com> writes: >> >> [...] >> >>>>> One could still consider the idea that "indeterminate" is an >>>>> abstract property that yields UB during read even for types >>>>> that do not have trap representations. There is no wording >>>>> in the C standard to support this, but I would not call this >>>>> idea "fundamentally wrong". You are right that this is different >>>>> to provenance provenance which is about values. What it would >>>>> have in common with pointer provenance is that there is hidden >>>>> state in the abstract machine associated with memory that >>>>> is not part of the representation. With effective types there >>>>> is another example of this. >>>> >>>> I understand that you want to consider a broader topic, and that, >>>> in the realm of that broader topic, something like provenance >>>> could have a role to play. I think it is worth responding to >>>> that thesis, and am expecting to do so in a separate reply (or >>>> new thread?) although probably not right away. >>> >>> I would love to hear your comments, because some people >>> want to have such an abstract of "indeterminate" and >>> some already believe that this is how the standard should >>> be understood already today. >> >> I've been thinking about this, and am close (I think) to having >> something to say in response. Before I do that, thought, let me >> ask this: what problem or problems are motivating the question? >> What problems do you (or "some people") want to solve? I don't >> want just examples here; I'm hoping to get a full list. > > There are essentially two main interests driving this. First, > there is some interest to precisely formulate the semantics for C. > The provenance proposal came out of this. > > Second, there is the issue of safety problems caused by > uninitialized reads, together with compiler support for zero > initialization etc. So there are various people who want to > change the semantics for uninitialized variables completely > in the interest of safety. This response doesn't answer my question. What are the problems, specifically, that people want to solve? If there isn't a good understanding of what the problem is, there is little hope of finding a solution, let alone reaching agreement on whether a proposed change does in fact solve the problem. If we don't know where we're going, any choice of road is equally good. That said, I understand that you are asking not on your own behalf but on behalf (perhaps indirectly) of others, and the others might not know what the problem(s) are that they want to solve. I think it's worth asking the question explicitly, What is the problem that we want to solve here? Start by simply trying to write a clear statement of what the problem is; proceed on to looking for a solution only after there is agreement (and I don't mean just a majority vote) about what problem it is the group wants to solve. (Note added after writing: I didn't realize when I started how difficult this subject is and how much there is to say about it. I hope readers will appreciate the amount of effort that has been invested, and get some value out of what has been produced, even if it spends too much time on some less important issues.) (Also, after having written the whole posting, I see that there are some aspects that I didn't relate to the indeterminate question and so didn't address. If you want me to say more about formalizing semantics or the issue of safety for uninitialized variables, I really need some specifics before I can talk about those.) (One further thought: on reading through my comments one last time, I may have more to say about uninitialized variables. But I am deferring that for now, to get this beast out the door.) > So far, there was no consensus in WG14 that the rules should > be changed or what the new rules should be. That's because they don't know what problem it is that they want to solve. Consider the question of what happens with padding bits/bytes, and unnamed members, in structs (unions too of course, but for now we consider only structs). The C standard says these bits of memory take unspecified values whenever there is a store to any member of the struct (and maybe also at other times, but let's ignore that). I understand why this decision was made, namely, to give more freedom to implementations as to how such operations are actualized. But it leaves behind a problem. Speaking as a developer, I want the values of these bits to be stable, at least in certain cases (and I want to be able to choose which cases those are). The C language doesn't give me any way to do that, at least not one that isn't horribly inconvenient. In making the decision about padding bits/bytes, the C committee answered the /question/ but didn't address the /problem/. I expect that something similar is going on with the current discussions. To better understand the landscape, let's look at three different kinds of undefined behavior. The illustrating constructions are signed integer arithmetic, obsolete pointer values, and violating effective type rules. Situations where arithmetic on signed integers overflows might be called /practical/ undefined behavior. Certainly it would be possible to require a better-defined semantics (such as giving an unspecified result), but presumably overflow doesn't come up very often, it's not clear how useful the "better" result would be, and the cost in some hardware environments might be prohibitive. Furthermore there is a fairly easy workaround to avoid overflow: simply convert to unsigned types, do the operations, and then convert back. Overflow being undefined behavior isn't absolutely necessary but in practical terms it's acceptable. (I acknowledge that some people have different views on that last statement.) An obsolete pointer value is a pointer to an object after the end of the object's lifetime. Attempting to make use of an obsolete pointer value, in any way whatsoever including simply loading it by means of lvalue conversion, is undefined behavior. We can imagine narrowing the scope a bit so simply loading an obsolete pointer value or comparing one for equality could be better defined, but any attempt to dereference an obsolete pointer value is what might be called an /essential/ undefined behavior. The problem here is both practical and theoretical: there is no way to be sure the underlying hardware will be able to carry out the asked-for operation (without a machine check, etc), and even if there were, there is no way to describe what happens in a way that can be expressed (usefully) in terms that relate to what's going on in the abstract machine. There simply is no practical, useful, sensible way to define the behavior of dereferencing an obsolete pointer value. At the other end of the spectrum, violating effective type rules is what might be called /gratuitous/ undefined behavior. There is no particular hardware motivation for choosing UB. And there is no problem defining the semantics of a cross-type access, which can be done definedly in the same way as accessing union members. So there is no reason to think that adding cross-type restrictions is necessary. An argument can be made that cross-type restrictions are /desirable/, because they allow code transformations that improve performance in some cases. Incidentally, it might seem like effective type rules are similar in some way to NaT bits or pointer provenance. They aren't. NaT bits are hardware indicators that actually exist, and pointer provenances are attached to values, not to objects. Neither of those conditions hold for effective types. The seeming similarity to hidden memory bits is a red herring. (Also, effective type rules are a lot more complicated than they seem at first blush, and have some peculiar properties as a result. They seem to work okay if not looked at too closely, but a closer look shows some serious shortcomings. But I digress.) There are two significant problems with undefined behavior. The smaller of the two is that there are no distinctions between the different classes of undefined behavior. There is no way around having some sort of undefined behavior for obsolete pointer values, but cross-typing rules are a completely different story. Yet the C standard puts all the different kinds of undefined behaviors into the same absolute category. Sometimes people use compiler options to turn off, for example, so-called "strict aliasing", and of course the C standard allows us to do that. But compilers aren't required to provide such an option, and if they do the option may not do exactly what we expect it to do, because there is no standard specification for it. The C standard should define officially sanctioned mechanisms -- as for example standard #pragma's -- to give standard-defined semantics to certain constructs of undefined behavior that resemble, eg, -fno-strict-aliasing. (Let me add in passing that this should be done for some cases of unspecified behavior as well. To give one example, the C standard should provide a way to direct a C compiler to maintain the values of padding bits and bytes and unnamed members, taking away the freedom for such things to assume unspecified values.) The second problem is basically The Law of Unintended Consequences smashing into The Law of Least Astonishment. As compiler writers have gotten more and more clever at exploiting the implications of "undefined behavior", we see more and more cases of code that looks reasonable being turned into mush by overly clever "optimizing" compilers. There is obviously something wrong with the way this trend is going -- ever more clever "optimizations", followed by ever more arcane compiler options to work around the problems caused by the too-clever compilers. This problem must be addressed by the C standard, for if it is not the ecosystem will transform into a confused state that is exactly what the C standard was put in place to avoid. (I do have some ideas about how to address this issue, but I want to make sure everyone appreciates the extent of the problem before we start talking about solutions.) Before leaving the sub-topic of undefined behavior, let me mention two success stories. The first is 'restrict': the performance implications are local, the choice is under control of the program (and programmer), and the default choice is to play safe. Good show. The second is the improved sequencing rules introduced in C11. A thorny problem, and since C11 handled very deftly. These parts of the C language and C standard should be held up as examples when considering how to go forward on other problems. And now on to the question of "indeterminate". Following that, a somewhat philosophical perspective concerning the nature of the C standard and the people who work on it. First an observation. The idea of "indeterminate values" is actually two ideas in one: non-valid abstract /values/ (like obsolete pointers), and "uninitialized" /objects/ (in quotes because in some circumstances objects can become "uninitialized" even after they have been stored into.) The word "indeterminate" isn't really right for either of these ideas. I understand why it was used in the first C standard, and in that context it seems okay, but going forward a better word (or words) should be found. I will keep using it here but please don't get overly attached to the word, lest it confuse the discussion. My very strong sense is that some general notion of indeterminate values (or objects) is a solution in search of a problem. Let's look at some different kinds of undefined behavior, while also considering the lens of "indeterminate values (or objects)". One: signed integer overflow. Could this situation somehow produce an "indeterminate value" that could be stored so it could wreak havoc later? Two problems: no sensible developer is going to want the bad behavior deferred rather than happening right away, and besides anything an "indeterminate value" could do can already be done by virtue of the generating condition being undefined itself. Two: obsolete pointers. These values are not indeterminate. They start off as valid, become obsolete when their pointed-to object ends its lifetime, and are always obsolete thereafter. It isn't hard to make a formal model for "obsoleteness" (ignoring problems such as converting pointers to and from integers, and other C-isms). Of course the formal model doesn't map nicely onto real computer hardware, because pointers would have far too many bits (and maybe other problems as well, but let's ignore that). So we pretend the extra bits are there, even though they aren't, with a strange consequence that two pointer objects can have the same object representation but still be different in that one is obsolete and the other isn't. Also a pointer can start off with a non-valid value, meaning "not null and points to no object". Here again the badness remains until a valid pointer value is put into the object; a pointer object with a non-valid value doesn't ever magically become valid without having been assigned or stored into. (Note that the same formal model for obsolete pointers can accommodate non-valid pointers, which are simply obsolete at the start.) Three: effective type rules. Broken. One of the weakest areas of the C standard. This framework may have started off as not a bad idea in C90, but looking at it now it's clear that we've gotten ahead of our skis, sorely in need of a top-to-bottom reformulation, similar at least in spirit with what was done with sequencing rules in C11. Also there should be a standard-defined way of allowing cross-type interference, with defined behavior, like what was explained above. I expect a well-done reformulation of cross-type (non-)interference rules would have no notion of assigning "magic state" to objects, and so have no need of any idea of "indeterminate objects (or values)". Four: uninitialized objects. Here we have a question: Why? What problem are we hoping to solve? Presumably the point of having uninitialized objects be "indeterminate" is so that reading them is undefined behavior. Let's explore that. I realize of course that any object having a trap representation (called a non-value representation in the C23 draft) causes undefined behavior if read using a type in which the object representation corresponds to a trap representation. Obviously there is good reason to say trying to read a trap representation is undefined behavior. Some types, notably unsigned char, don't have any trap representations. Should reading an uninitialized object using such a type be undefined behavior? Speaking as a developer, I don't see any benefit. An implementation would have to go out of its way to do anything other than deliver a valid unspecified value; if there is to be undefined behavior, it is /contrived/ undefined behavior. Consider: A: such UB could allow trapping on any use of an uninitialized object. But UB does not guarantee that, and if someone wants it there are tools like valgrind to get it (and without any special language support needed to do so). B: such UB could allow "optimizations" by clever compiler writers. The result would be more unexpected code scramblings and more arcane compiler options to disable them. A better way to provide such imagined benefits is by adding one or more new language constructs, along lines similar to the 'restrict' qualifier, to selectively enable such performance changes. C: future hardware developments might need or take advantage of such UB. If and when such things happen it's better to add specific wording to reflect the new hardware behaviors. The last sentence of 6.3.2.1 p2, added in C11, provides an excellent example of how to accommodate such new hardware developments. Indeterminate objects is a solution in search of a problem. To make progress, first agree on a particular problem. Only after that point should possible solutions be considered; I would be surprised if some general notion of indeterminateness ever turned out to be the solution of choice. Now I would like to offer a perspective on how to view work that is done in writing the C standard. In some respects the ISO C committee resembles the US Supreme Court. They consider issues, draw conclusions, and ultimately issue "rulings" in the form of ISO-approved standards documents. Like the Supreme Court, their decisions are final and cannot be appealed. However, the Supreme Court ultimately draws its authority from how the public views its rulings. If the rulings get too far out of line with what the general public believes, confidence in the Court will decline and its opinions will carry less weight. (I don't mean to make a political statement here - I am simply repeating some analysis I have read recently regarding current attitudes towards the Court.) The same is true of the ISO C committee. They can make whatever decisions they want, and those decisions will end up being what goes into the C standard. At the same time, it's important - I would say very important - to keep the confidence of people for whom the C standard is regarded as an important document. If that confidence is lost then the C standard will be on its way to becoming irrelevant. Unfortunately I have the sense that this trend has already started. The most important constituency for the C language (and so for the C standard) is developers. Many developers, but in particular and very especially C developers, want stability. I understand the desire to want to "improve" the language. Getting agreement on a change has to mean more than a majority vote -- it needs to be not just accepted but enthusiastically approved and with overwhelming support. Too much of what is planned for C23 is coming from the implementation community without regard for what is beneficial to the development community. I see the reported desire for general "indeterminate"-ness as part of this trend. It is my hope that those people who are part of the ISO C committee reflect on this perspective and reconsider where the C language should go for the next C standard.
[toc] | [prev] | [next] | [standalone]
| From | Spiros Bousbouras <spibou@gmail.com> |
|---|---|
| Date | 2023-08-27 08:31 +0000 |
| Message-ID | <5+eRe7cp3yQjL4=AX@bongo-ra.co> |
| In reply to | #6556 |
On Sat, 26 Aug 2023 19:25:55 -0700 Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: > Sometimes people use compiler options > to turn off, for example, so-called "strict aliasing", and of course > the C standard allows us to do that. But compilers aren't required > to provide such an option, and if they do the option may not do > exactly what we expect it to do, because there is no standard > specification for it. The C standard should define officially > sanctioned mechanisms -- as for example standard #pragma's -- to > give standard-defined semantics to certain constructs of undefined > behavior that resemble, eg, -fno-strict-aliasing. Surely the starting point for this should be the documentation of the compilers to specify precisely what -fno-strict-aliasing does. If a consensus emerges out of these precise specifications or C programmers indicate that they prefer the specification of some particular compiler then this can become part of the standard. Adding a relevant #pragma should be trivial. > The second problem is basically The Law of Unintended Consequences > smashing into The Law of Least Astonishment. As compiler writers > have gotten more and more clever at exploiting the implications of > "undefined behavior", we see more and more cases of code that looks > reasonable being turned into mush by overly clever "optimizing" > compilers. There is obviously something wrong with the way this > trend is going -- ever more clever "optimizations", followed by ever > more arcane compiler options to work around the problems caused by > the too-clever compilers. This problem must be addressed by the C > standard, for if it is not the ecosystem will transform into a > confused state that is exactly what the C standard was put in place > to avoid. (I do have some ideas about how to address this issue, > but I want to make sure everyone appreciates the extent of the > problem before we start talking about solutions.) Without specific examples , it's impossible to comment on this. Why did the "reasonable" code have the undefined behaviour ? Could the result the programmer was aiming for have been achieved with defined behaviour ? For example it has been pointed out on comp.lang.c that it's impossible to write a malloc() implementation in conforming C. This is certainly a weakness which should be addressed with some appropriate #pragma . > Before leaving the sub-topic of undefined behavior, let me mention > two success stories. The first is 'restrict': the performance > implications are local, the choice is under control of the program > (and programmer), and the default choice is to play safe. Good > show. From my point of view , restrict is not a success because the specification of restrict is the one part of the C1999 standard I have given up trying to understand. I understand the underlying idea but the specifics elude me. I remember many years ago someone asked on this group about some code involving restrict and a member of the standard committee replied and I found the reply counterintuitive. So I have decided to not use restrict in my own code taking also into account that I don't need the microoptimisations which restrict is intended to allow. But for all I know , people who do need these optimisations find the specification of restrict in the standard perfectly adequate. -- It is not widely known that the "CPC" in "Amstrad CPC" actually stands for "cool people club".
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2023-08-29 04:35 -0700 |
| Message-ID | <86sf82ulmb.fsf@linuxsc.com> |
| In reply to | #6557 |
Spiros Bousbouras <spibou@gmail.com> writes:
> On Sat, 26 Aug 2023 19:25:55 -0700
> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>
>> Sometimes people use compiler options to turn off, for example,
>> so-called "strict aliasing", and of course the C standard allows
>> us to do that. But compilers aren't required to provide such an
>> option, and if they do the option may not do exactly what we
>> expect it to do, because there is no standard specification for
>> it. The C standard should define officially sanctioned
>> mechanisms -- as for example standard #pragma's -- to give
>> standard-defined semantics to certain constructs of undefined
>> behavior that resemble, eg, -fno-strict-aliasing.
>
> Surely the starting point for this should be the documentation of
> the compilers to specify precisely what -fno-strict-aliasing does.
> [...]
Not at all. It's easy to write a specification that says what we
want to do, along similar lines to what is said in the footnote
about union member access in section 6.5.2.3
If the member used to access the contents of a union object
is not the same as the member last used to store a value in
the object, the appropriate part of the object representation
of the value is reinterpreted as an object representation in
the new type as described in 6.2.6 (a process sometimes called
"type punning"). This might be a trap representation.
That behavior should be the default, for all accesses. For cases
where a developer wants to give permission to the compiler to
optimize based on cross-type non-interference assumptions, there
should be a #pragma to do something similar to what effective type
rules do now. The effective type rules are in need of re-writing
anyway, and making type punning be the default doesn't break any
programs, because compilers are already free to ignore the
implications of violating effective type conditions.
>> The second problem is basically The Law of Unintended Consequences
>> smashing into The Law of Least Astonishment. As compiler writers
>> have gotten more and more clever at exploiting the implications of
>> "undefined behavior", we see more and more cases of code that looks
>> reasonable being turned into mush by overly clever "optimizing"
>> compilers. There is obviously something wrong with the way this
>> trend is going -- ever more clever "optimizations", followed by
>> ever more arcane compiler options to work around the problems
>> caused by the too-clever compilers. This problem must be addressed
>> by the C standard, for if it is not the ecosystem will transform
>> into a confused state that is exactly what the C standard was put
>> in place to avoid. (I do have some ideas about how to address this
>> issue, but I want to make sure everyone appreciates the extent of
>> the problem before we start talking about solutions.)
>
> Without specific examples , it's impossible to comment on this.
> [...]
I feel that so much has been written about this issue that it
isn't necessary for me to elaborate.
> For example it has been pointed out on comp.lang.c that it's
> impossible to write a malloc() implementation in conforming
> C. This is certainly a weakness which should be addressed with
> some appropriate #pragma .
There isn't any reason to think malloc() should be writable in
completely portable C. That's the point of putting malloc() in
the system library in the first place. By the way, with type
punning semantics mentioned above being the default, and with the
alignment features added in C11, I think it is possible to write
malloc() in portable C without needed any additional language
changes. But even if it isn't that is no cause for concern; one
of the principal reasons for having a system library is to
provide functionality that the core language cannot express (or
cannot express conveniently).
>> Before leaving the sub-topic of undefined behavior, let me mention
>> two success stories. The first is 'restrict': the performance
>> implications are local, the choice is under control of the program
>> (and programmer), and the default choice is to play safe. Good
>> show.
>
> From my point of view , restrict is not a success because the
> specification of restrict is the one part of the C1999 standard I
> have given up trying to understand. I understand the underlying
> idea but the specifics elude me. [...]
I agree the formal definition of restrict is rather daunting. In
practice though I think using restrict with confidence is not
overly difficult. My working model for restrict is something
like this:
1. Use restrict only in the declarations of function
parameters.
2. For a declaration like const T *restrict foo ,
the compiler may assume that any objects that can be
accessed through 'foo' will not be modified.
3. For a declaration like T *restrict bas ,
the compiler may assume that any changes to objects
that can be accessed through 'bas' will be done
using 'bas' or a pointer value derived from 'bas'
(and in particular that no changes will happen
other than through 'bas' or 'bas'-derived pointer
values).
Is this summary description helpful?
[toc] | [prev] | [next] | [standalone]
| From | Spiros Bousbouras <spibou@gmail.com> |
|---|---|
| Date | 2023-08-30 19:53 +0000 |
| Message-ID | <KvVxh3+WExIyDnM+5@bongo-ra.co> |
| In reply to | #6559 |
On Tue, 29 Aug 2023 04:35:40 -0700 Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: > Spiros Bousbouras <spibou@gmail.com> writes: > > > On Sat, 26 Aug 2023 19:25:55 -0700 > > Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: > > > >> Sometimes people use compiler options to turn off, for example, > >> so-called "strict aliasing", and of course the C standard allows > >> us to do that. But compilers aren't required to provide such an > >> option, and if they do the option may not do exactly what we > >> expect it to do, because there is no standard specification for > >> it. The C standard should define officially sanctioned > >> mechanisms -- as for example standard #pragma's -- to give > >> standard-defined semantics to certain constructs of undefined > >> behavior that resemble, eg, -fno-strict-aliasing. > > > > Surely the starting point for this should be the documentation of > > the compilers to specify precisely what -fno-strict-aliasing does. > > [...] > > Not at all. It's easy to write a specification that says what we > want to do, along similar lines to what is said in the footnote > about union member access in section 6.5.2.3 > > If the member used to access the contents of a union object > is not the same as the member last used to store a value in > the object, the appropriate part of the object representation > of the value is reinterpreted as an object representation in > the new type as described in 6.2.6 (a process sometimes called > "type punning"). This might be a trap representation. Works for me but it would be good to know that this is how compiler writers actually understand -fno-strict-aliasing .Is there any compiler documentation which says something like this ? > That behavior should be the default, for all accesses. For cases > where a developer wants to give permission to the compiler to > optimize based on cross-type non-interference assumptions, there > should be a #pragma to do something similar to what effective type > rules do now. The effective type rules are in need of re-writing > anyway, and making type punning be the default doesn't break any > programs, because compilers are already free to ignore the > implications of violating effective type conditions. [...] > > For example it has been pointed out on comp.lang.c that it's > > impossible to write a malloc() implementation in conforming > > C. This is certainly a weakness which should be addressed with > > some appropriate #pragma . > > There isn't any reason to think malloc() should be writable in > completely portable C. That's the point of putting malloc() in > the system library in the first place. By the way, with type > punning semantics mentioned above being the default, and with the > alignment features added in C11, I think it is possible to write > malloc() in portable C without needed any additional language > changes. But even if it isn't that is no cause for concern; one > of the principal reasons for having a system library is to > provide functionality that the core language cannot express (or > cannot express conveniently). One might want to experiment with different allocation algorithms and it seems to me that this sort of thing is within the "remit" of C. So ideally one should be able to write it in C and prove , starting from the standard or precise specifications in compiler documentation , that it works correctly. I don't necessarily mean prove the correctness of the whole code but certain key parts. Another application I have in mind is languages which get translated to C and support garbage collection. Again one might want to use the standard malloc() to allocate a large block of memory and use different parts of this memory for different types of objects. If with the semantics you propose these things are possible , I'm happy. I'm not bothered which is the default as long as there is a precise specification from which you can reason that you get the desired behaviour. > >> Before leaving the sub-topic of undefined behavior, let me mention > >> two success stories. The first is 'restrict': the performance > >> implications are local, the choice is under control of the program > >> (and programmer), and the default choice is to play safe. Good > >> show. > > > > From my point of view , restrict is not a success because the > > specification of restrict is the one part of the C1999 standard I > > have given up trying to understand. I understand the underlying > > idea but the specifics elude me. [...] > > I agree the formal definition of restrict is rather daunting. In > practice though I think using restrict with confidence is not > overly difficult. My working model for restrict is something > like this: > > 1. Use restrict only in the declarations of function > parameters. > > 2. For a declaration like const T *restrict foo , > the compiler may assume that any objects that can be > accessed through 'foo' will not be modified. Wouldn't that also be the case with just const T * foo ? > 3. For a declaration like T *restrict bas , > the compiler may assume that any changes to objects > that can be accessed through 'bas' will be done > using 'bas' or a pointer value derived from 'bas' > (and in particular that no changes will happen > other than through 'bas' or 'bas'-derived pointer > values). > > Is this summary description helpful? It seems clear enough but , as I've said , I don't have any use for restrict anyway and it's not worth it for me to expend the additional mental effort to confirm that my code obeys the additional restrictions of restrict .If I call a function with a preexisting interface which involves restrict then it seems easy enough to obey the restrictions. -- Carrie also narrates the film, providing useful guidelines for those challenged by its intricacies. Sample: "Later that day, Big and I arrived home." http://www.rogerebert.com/reviews/sex-and-the-city-2-2010
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2023-08-30 17:40 -0700 |
| Message-ID | <86zg28t563.fsf@linuxsc.com> |
| In reply to | #6560 |
Spiros Bousbouras <spibou@gmail.com> writes: > On Tue, 29 Aug 2023 04:35:40 -0700 > Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: > >> Spiros Bousbouras <spibou@gmail.com> writes: >> >>> On Sat, 26 Aug 2023 19:25:55 -0700 >>> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: >>> >>>> Sometimes people use compiler options to turn off, for example, >>>> so-called "strict aliasing", and of course the C standard allows >>>> us to do that. But compilers aren't required to provide such an >>>> option, and if they do the option may not do exactly what we >>>> expect it to do, because there is no standard specification for >>>> it. The C standard should define officially sanctioned >>>> mechanisms -- as for example standard #pragma's -- to give >>>> standard-defined semantics to certain constructs of undefined >>>> behavior that resemble, eg, -fno-strict-aliasing. >>> >>> Surely the starting point for this should be the documentation of >>> the compilers to specify precisely what -fno-strict-aliasing does. >>> [...] >> >> Not at all. It's easy to write a specification that says what we >> want to do, along similar lines to what is said in the footnote >> about union member access in section 6.5.2.3 >> >> If the member used to access the contents of a union object >> is not the same as the member last used to store a value in >> the object, the appropriate part of the object representation >> of the value is reinterpreted as an object representation in >> the new type as described in 6.2.6 (a process sometimes called >> "type punning"). This might be a trap representation. > > Works for me but it would be good to know that this is how compiler > writers actually understand -fno-strict-aliasing . [...] No, it wouldn't. Implementations follow the C standard, not the other way around. Looking at what implementations do for the -fno-strict-aliasing flag is worse than a waste of time. >>> For example it has been pointed out on comp.lang.c that it's >>> impossible to write a malloc() implementation in conforming >>> C. This is certainly a weakness which should be addressed with >>> some appropriate #pragma . >> >> There isn't any reason to think malloc() should be writable in >> completely portable C. That's the point of putting malloc() in >> the system library in the first place. By the way, with type >> punning semantics mentioned above being the default, and with the >> alignment features added in C11, I think it is possible to write >> malloc() in portable C without needed any additional language >> changes. But even if it isn't that is no cause for concern; one >> of the principal reasons for having a system library is to >> provide functionality that the core language cannot express (or >> cannot express conveniently). > > One might want to experiment with different allocation algorithms > and it seems to me that this sort of thing is within the "remit" of > C. So ideally one should be able to write it in C [...] You're conflating writing something in C and writing something in completely portable C. It's already possible to do these things writing in C. >>> From my point of view , restrict is not a success because the >>> specification of restrict is the one part of the C1999 standard I >>> have given up trying to understand. I understand the underlying >>> idea but the specifics elude me. [...] >> >> I agree the formal definition of restrict is rather daunting. In >> practice though I think using restrict with confidence is not >> overly difficult. My working model for restrict is something >> like this: >> >> 1. Use restrict only in the declarations of function >> parameters. >> >> 2. For a declaration like const T *restrict foo , >> the compiler may assume that any objects that can be >> accessed through 'foo' will not be modified. > > Wouldn't that also be the case with just const T * foo ? No. >> 3. For a declaration like T *restrict bas , >> the compiler may assume that any changes to objects >> that can be accessed through 'bas' will be done >> using 'bas' or a pointer value derived from 'bas' >> (and in particular that no changes will happen >> other than through 'bas' or 'bas'-derived pointer >> values). >> >> Is this summary description helpful? > > It seems clear enough but , as I've said , I don't have any use > for restrict anyway and it's not worth it for me to expend the > additional mental effort to confirm that my code obeys the > additional restrictions of restrict. [...] If you don't want to use restrict that is quite okay. Part of why I call restrict a success is that it can be ignored, with only minimal effort, by any developer who doesn't want to use it.
[toc] | [prev] | [next] | [standalone]
| From | Spiros Bousbouras <spibou@gmail.com> |
|---|---|
| Date | 2023-08-31 18:18 +0000 |
| Message-ID | <S+3le+7=sc9SPmPL3@bongo-ra.co> |
| In reply to | #6561 |
On Wed, 30 Aug 2023 17:40:52 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
> Spiros Bousbouras <spibou@gmail.com> writes:
>
> > On Tue, 29 Aug 2023 04:35:40 -0700
> > Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
> >
> >> Spiros Bousbouras <spibou@gmail.com> writes:
[...]
> >> Not at all. It's easy to write a specification that says what we
> >> want to do, along similar lines to what is said in the footnote
> >> about union member access in section 6.5.2.3
> >>
> >> If the member used to access the contents of a union object
> >> is not the same as the member last used to store a value in
> >> the object, the appropriate part of the object representation
> >> of the value is reinterpreted as an object representation in
> >> the new type as described in 6.2.6 (a process sometimes called
> >> "type punning"). This might be a trap representation.
> >
> > Works for me but it would be good to know that this is how compiler
> > writers actually understand -fno-strict-aliasing . [...]
>
> No, it wouldn't. Implementations follow the C standard, not
> the other way around. Looking at what implementations do for
> the -fno-strict-aliasing flag is worse than a waste of time.
Actually the influence goes in both directions. In theory the standard is the
ultimate authority , in practice whatever C compilers one has access to. For
now the standard doesn't have something like -fno-strict-aliasing so if one
needs it then looking at what implementations do is the only option. But even
the standard committee should look at it and whether C programmers find it
useful to decide what around such lines (if anything) should go into the
standard.
> >> There isn't any reason to think malloc() should be writable in
> >> completely portable C. That's the point of putting malloc() in
> >> the system library in the first place. By the way, with type
> >> punning semantics mentioned above being the default, and with the
> >> alignment features added in C11, I think it is possible to write
> >> malloc() in portable C without needed any additional language
> >> changes. But even if it isn't that is no cause for concern; one
> >> of the principal reasons for having a system library is to
> >> provide functionality that the core language cannot express (or
> >> cannot express conveniently).
> >
> > One might want to experiment with different allocation algorithms
> > and it seems to me that this sort of thing is within the "remit" of
> > C. So ideally one should be able to write it in C [...]
>
> You're conflating writing something in C and writing something
> in completely portable C. It's already possible to do these
> things writing in C.
I wrote
One might want to experiment with different allocation algorithms and it
seems to me that this sort of thing is within the "remit" of C. So
ideally one should be able to write it in C and prove , starting from the
standard or precise specifications in compiler documentation , that it
works correctly. I don't necessarily mean prove the correctness of the
whole code but certain key parts.
.This doesn't conflate anything. One can do the writing but can one do the
proving or something close ?
--
vlaho.ninja/prog
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2023-09-05 05:39 -0700 |
| Message-ID | <867cp4pzdu.fsf@linuxsc.com> |
| In reply to | #6562 |
Spiros Bousbouras <spibou@gmail.com> writes: > On Wed, 30 Aug 2023 17:40:52 -0700 > Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: [...] >> You're conflating writing something in C and writing something >> in completely portable C. It's already possible to do these >> things writing in C. > > I wrote > > One might want to experiment with different allocation > algorithms and it seems to me that this sort of thing is > within the "remit" of C. So ideally one should be able to > write it in C and prove , starting from the standard or > precise specifications in compiler documentation , that it > works correctly. I don't necessarily mean prove the > correctness of the whole code but certain key parts. > > .This doesn't conflate anything. One can do the writing but > can one do the proving or something close ? A substitute for malloc()/free() can be written in standard C. A substitute for malloc()/free() can not be written in completely portable standard C. I hope this clarifies my earlier comments.
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2023-09-05 17:03 -0700 |
| Message-ID | <861qfcp3q5.fsf@linuxsc.com> |
| In reply to | #6547 |
Martin Uecker <ma.uecker@gmail.com> writes: [...] > There are essentially two main interests driving this. First, > there is some interest to precisely formulate the semantics for > C. The provenance proposal came out of this. > > Second, there is the issue of safety problems caused by > uninitialized reads, together with compiler support for zero > initialization etc. So there are various people who want to > change the semantics for uninitialized variables completely > in the interest of safety. > > So far, there was no consensus in WG14 that the rules should > be changed or what the new rules should be. I have a second reply here, which I hope will come closer to being relevant to the issues of interest. What I think is being looked for is a way to describe the language semantics in areas such as cross-type interference and what is meant when an uninitialized object is read. I thought about this question both while I was writing the longer earlier reply and then more deeply afterwards. What I think is most important is that these areas in particular are not about language semantics in the same way as, for example, array indexing. Rather they are about what transformations a compiler is allowed to do in the presence of various combinations of program constructs. That difference means the C standard should express the rules in a way that more directly reflects what's going on. More specifically, the standard should say or explain what can be done, not by describing language semantics (which is indirect), but explicitly in terms of what compiler transformations are allowed (which is direct). Note that there is precedent for this idea, in how the C standard talks about looping constructs and when they may be assumed to terminate. To give an example, take uninitialized objects, either automatic variables without an initializer, or memory allocated by malloc or added by realloc. The most natural semantics for such situations is to say that newly "created" memory gets an unspecified object representation at the start of its lifetime. (Yes I know that C in its current form lets automatic objects be "uninitialized" whenever their declaration points are reached, but let's ignore that for now.) Now suppose a program has a read access where it is easy to deduce that the object being read is still in the "unspecified object representation" initial state. To simplify the discussion, suppose the type of the access is a pointer type, and so is known to have trap representations (the name is changed in the C23 draft, but the idea is what's important). What is a compiler allowed to do in such circumstances? One thing it might reasonably be allowed to do is to cause the program to be terminated if it ever reaches such an access. Or there might be an option to initialize the pointer to NULL. Or, if a suitable compiler option were invoked, the construct might be flagged with a fatal error (or of course a warning). There are all sorts of actions a developer might want the compiler to take, and a compiler could offer many of those options, as choices selected under control of command line switches (or equivalent). I think a few points are worth making. One, there must be some sort of default action that all compilers have to support. The default action in this case might be to issue a non-fatal diagnostic. Two, there must be a way for the developer to tell the compiler to "proceed blindly" - saying, in effect, I accept that the compiled code might misbehave, but let me take that risk, and generate code like it's going to work. (In other words, for the read access, go ahead and load whatever unspecified object representation happens to be there.) A "proceed blindly" choice probably shouldn't be the default, but it must be available. Three, the consequence must never be "undefined behavior", unless there is an explicit stipulation to that effect. The stipulation might take the form of a #pragma, or a compiler option, or a code decoration using "attribute" (whatever the syntax for such things is). I know my comments here are somewhat sketchy, but hopefully a general sense of the ideas gets across. The suggestions should at least serve to stimulate further discussion.
[toc] | [prev] | [next] | [standalone]
| From | Jakob Bohm <jb-usenet@wisemo.com.invalid> |
|---|---|
| Date | 2023-09-07 17:09 +0200 |
| Message-ID | <b4qdnRse5OVYemT5nZ2dnZfqn_idnZ2d@giganews.com> |
| In reply to | #6565 |
On 2023-09-06 02:03, Tim Rentsch wrote: > Martin Uecker <ma.uecker@gmail.com> writes: > > [...] > >> There are essentially two main interests driving this. First, >> there is some interest to precisely formulate the semantics for >> C. The provenance proposal came out of this. >> >> Second, there is the issue of safety problems caused by >> uninitialized reads, together with compiler support for zero >> initialization etc. So there are various people who want to >> change the semantics for uninitialized variables completely >> in the interest of safety. >> >> So far, there was no consensus in WG14 that the rules should >> be changed or what the new rules should be. > > I have a second reply here, which I hope will come closer to > being relevant to the issues of interest. > > What I think is being looked for is a way to describe the > language semantics in areas such as cross-type interference and > what is meant when an uninitialized object is read. I thought > about this question both while I was writing the longer earlier > reply and then more deeply afterwards. > > What I think is most important is that these areas in particular > are not about language semantics in the same way as, for example, > array indexing. Rather they are about what transformations a > compiler is allowed to do in the presence of various combinations > of program constructs. That difference means the C standard > should express the rules in a way that more directly reflects > what's going on. More specifically, the standard should say or > explain what can be done, not by describing language semantics > (which is indirect), but explicitly in terms of what compiler > transformations are allowed (which is direct). Note that there > is precedent for this idea, in how the C standard talks about > looping constructs and when they may be assumed to terminate. > > To give an example, take uninitialized objects, either automatic > variables without an initializer, or memory allocated by malloc or > added by realloc. The most natural semantics for such situations > is to say that newly "created" memory gets an unspecified object > representation at the start of its lifetime. (Yes I know that C > in its current form lets automatic objects be "uninitialized" > whenever their declaration points are reached, but let's ignore > that for now.) Now suppose a program has a read access where it > is easy to deduce that the object being read is still in the > "unspecified object representation" initial state. To simplify > the discussion, suppose the type of the access is a pointer type, > and so is known to have trap representations (the name is changed > in the C23 draft, but the idea is what's important). > > What is a compiler allowed to do in such circumstances? One thing > it might reasonably be allowed to do is to cause the program to be > terminated if it ever reaches such an access. Or there might be > an option to initialize the pointer to NULL. Or, if a suitable > compiler option were invoked, the construct might be flagged with > a fatal error (or of course a warning). There are all sorts of > actions a developer might want the compiler to take, and a > compiler could offer many of those options, as choices selected > under control of command line switches (or equivalent). I think a > few points are worth making. > > One, there must be some sort of default action that all compilers > have to support. The default action in this case might be to > issue a non-fatal diagnostic. > > Two, there must be a way for the developer to tell the compiler to > "proceed blindly" - saying, in effect, I accept that the compiled > code might misbehave, but let me take that risk, and generate code > like it's going to work. (In other words, for the read access, go > ahead and load whatever unspecified object representation happens > to be there.) A "proceed blindly" choice probably shouldn't be > the default, but it must be available. > > Three, the consequence must never be "undefined behavior", unless > there is an explicit stipulation to that effect. The stipulation > might take the form of a #pragma, or a compiler option, or a code > decoration using "attribute" (whatever the syntax for such things > is). > Agreed so far! As a developer of programs in C with practical but not infinite portability, I very much abhore the mad optimizations that use language lawyering to state that any code path that might, hypothetically, exceed the boundaries of standard-enforced behavior is allowed to be arbitrarily mangled to get a faster bad result. For example, I have one function which intentionally reads an uninitialized variable to get a somewhat arbitrary value of a type with no known trap representation. I have a number of other programs which extensively process a block of data before deciding in some other way if the data is garbage or useful. This is done for sound technical reasons but requires that the compiler doesn't plant landmines all over virgin land. As another example, I have speed critical code that relies on running on 2s complement machines with wraparound on signed integer overflow, and that code is being very clear and explicit in doing so, but there is no C90 notation to tell all ISO-C implementation that this is the intention, thus it is explicit only in comments, not in the tokens passed to the C compiler. > I know my comments here are somewhat sketchy, but hopefully a > general sense of the ideas gets across. The suggestions should at > least serve to stimulate further discussion. > I am writing from a similar perspective . Enjoy Jakob -- Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10 This public discussion message is non-binding and may contain errors. WiseMo - Remote Service Management for PCs, Phones and Embedded
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2023-09-07 17:19 +0100 |
| Message-ID | <87sf7qnefn.fsf@bsb.me.uk> |
| In reply to | #6566 |
Jakob Bohm <jb-usenet@wisemo.com.invalid> writes: > As another example, I have speed critical code that relies on running > on 2s complement machines with wraparound on signed integer overflow, and > that code is being very clear and explicit in doing so, but there > is no C90 notation to tell all ISO-C implementation that this is the > intention, thus it is explicit only in comments, not in the tokens > passed to the C compiler. You can tell the compiler you want 2s complement by using the intN_t types if you can find one that suits your portability requirements. And can you not use unsigned arithmetic, re-interpreting as signed for those places where it matters? The "overflow" can only happen in the arithmetic, not in the re-interpretation. I know this is a deviation from the topic, so feel free to ignore if you don't want to get into it. -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | Jakob Bohm <jb-usenet@wisemo.com.invalid> |
|---|---|
| Date | 2023-09-08 23:12 +0200 |
| Message-ID | <p5KdnX4UMaaDE2b5nZ2dnZeNn_pj4p2d@giganews.com> |
| In reply to | #6567 |
On 2023-09-07 18:19, Ben Bacarisse wrote: > Jakob Bohm <jb-usenet@wisemo.com.invalid> writes: > >> As another example, I have speed critical code that relies on running >> on 2s complement machines with wraparound on signed integer overflow, and >> that code is being very clear and explicit in doing so, but there >> is no C90 notation to tell all ISO-C implementation that this is the >> intention, thus it is explicit only in comments, not in the tokens >> passed to the C compiler. > > You can tell the compiler you want 2s complement by using the intN_t > types if you can find one that suits your portability requirements. > > And can you not use unsigned arithmetic, re-interpreting as signed for > those places where it matters? The "overflow" can only happen in > the arithmetic, not in the re-interpretation. > > I know this is a deviation from the topic, so feel free to ignore if you > don't want to get into it. > The code in question has as explicit design condition that the compiler implements signed versions with wraparound for each unsigned int type . The code cannot rely on the intN_t types because they were not part of C90 and thus do not exist as separate types in some targeted compilers. In the world of C90 compilers, stdint.h was a non-standard system header that provided convenience names for the most closely matching C90 types on the platform, and some platforms simply didn't provide that header, instead documenting how each C90 type mapped to data sizes. Excessive casting where directly using the desired type seems possible is highly counter-intuitive and thus it is inherently wrong for an optimizer to presume the right to mangle code using types such as "int", "short int", "long int" and "signed char". Once again this comes down to a language drift from "undefined" meaning "not defined by this standard" to "An extremely toxic trap condition" . Enjoy Jakob -- Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10 This public discussion message is non-binding and may contain errors. WiseMo - Remote Service Management for PCs, Phones and Embedded
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2023-09-08 22:31 +0100 |
| Message-ID | <87fs3omjxj.fsf@bsb.me.uk> |
| In reply to | #6568 |
Jakob Bohm <jb-usenet@wisemo.com.invalid> writes: > On 2023-09-07 18:19, Ben Bacarisse wrote: >> Jakob Bohm <jb-usenet@wisemo.com.invalid> writes: >> >>> As another example, I have speed critical code that relies on running >>> on 2s complement machines with wraparound on signed integer overflow, and >>> that code is being very clear and explicit in doing so, but there >>> is no C90 notation to tell all ISO-C implementation that this is the >>> intention, thus it is explicit only in comments, not in the tokens >>> passed to the C compiler. >> You can tell the compiler you want 2s complement by using the intN_t >> types if you can find one that suits your portability requirements. >> And can you not use unsigned arithmetic, re-interpreting as signed for >> those places where it matters? The "overflow" can only happen in >> the arithmetic, not in the re-interpretation. >> I know this is a deviation from the topic, so feel free to ignore if you >> don't want to get into it. > > The code in question has as explicit design condition that the compiler > implements signed versions with wraparound for each unsigned int type . > > The code cannot rely on the intN_t types because they were not part of > C90 and thus do not exist as separate types in some targeted > compilers. Ah, I didn't know targetting C90 was still a thing. I've been out of the business for many years. > Excessive casting where directly using the desired type seems possible > is highly counter-intuitive and thus it is inherently wrong for an > optimizer to presume the right to mangle code using types such as "int", > "short int", "long int" and "signed char". I wasn't suggesting casts as they don't remove the undefined behaviour. But you have a design that suits your needs so it's all good. -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <864-117-4973@kylheku.com> |
|---|---|
| Date | 2023-07-22 06:40 +0000 |
| Message-ID | <20230721233227.651@kylheku.com> |
| In reply to | #6507 |
On 2023-07-21, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote: > 6.3.2.1 p2: > > "[...] If the lvalue designates an object of automatic storage > duration that could have been declared with the register storage class > (never had its address taken), and that object is uninitialized (not > declared with an initializer and no assignment to it has been > performed prior to use), the behavior is undefined." > > seems to cover it. The restriction on not having it's address taken > seems odd. Wording like that looks like someone's solo documentation effort, not peer-reviewed by an expert commitee. That looks as if the intent is to allow some diagnoses of uses of uninitialized variables, while discouraging others. However, it doesn't seem a good idea to be constraining implementations in how clever they can be in identifying an erroneous situation. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [next] | [standalone]
| From | Martin Uecker <ma.uecker@gmail.com> |
|---|---|
| Date | 2023-07-22 06:03 -0700 |
| Message-ID | <21265efa-1bfe-4049-950f-45b75f0b4f71n@googlegroups.com> |
| In reply to | #6514 |
On Saturday, July 22, 2023 at 8:40:42 AM UTC+2, Kaz Kylheku wrote: > On 2023-07-21, Ben Bacarisse <ben.u...@bsb.me.uk> wrote: > > 6.3.2.1 p2: > > > > "[...] If the lvalue designates an object of automatic storage > > duration that could have been declared with the register storage class > > (never had its address taken), and that object is uninitialized (not > > declared with an initializer and no assignment to it has been > > performed prior to use), the behavior is undefined." > > > > seems to cover it. The restriction on not having it's address taken > > seems odd. > Wording like that looks like someone's solo documentation effort, > not peer-reviewed by an expert commitee. > > That looks as if the intent is to allow some diagnoses of uses of > uninitialized variables, while discouraging others. > > However, it doesn't seem a good idea to be constraining > implementations in how clever they can be in identifying > an erroneous situation. I personally like this rule (but I am speaking about me. there is no full consensus about the exact interpretation of the standard nor about what it should say). I will try to explain why. In C, we also can access objects using character points. This should work in all cases, even for non-value (trap) representations, and is also used in practice a lot to copy uninitialized or partially initialized objects. If one makes all reads of objects with indeterminate representation have undefined behavior, than this would not work anymore. If one wants to allow this (and a lot of real-world programs rely on this), then one has to invent rules how this works with an abstract (provenance-based) notion of indeterminate values. This turns out to be difficult. But if we keep this rule, it becomes very simple: On the one hand, all reads of uninitialized automatic variables whose address is not taken are undefined behavior. This is the most useful behavior for detecting bugs and/or optimization. On the other hand, taking an address and working with character pointer to copy or manipulate an object is always defined, one simply gets unspecified representation bytes (which may be a non-value representation for some type and it is UB to read them using a lvalue of this type). So low-level operations with partially initialized objects work as expected without having to introduce complicated rules. It will cost a tiny bit of optimization opportunities, but avoid a lot of trouble. Martin
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2023-07-25 21:53 -0700 |
| Message-ID | <864jlrs28d.fsf@linuxsc.com> |
| In reply to | #6515 |
Martin Uecker <ma.uecker@gmail.com> writes: > On Saturday, July 22, 2023 at 8:40:42?AM UTC+2, Kaz Kylheku wrote: > >> On 2023-07-21, Ben Bacarisse <ben.u...@bsb.me.uk> wrote: >> >>> 6.3.2.1 p2: >>> >>> "[...] If the lvalue designates an object of automatic storage >>> duration that could have been declared with the register storage >>> class (never had its address taken), and that object is >>> uninitialized (not declared with an initializer and no >>> assignment to it has been performed prior to use), the behavior >>> is undefined." >>> >>> seems to cover it. The restriction on not having it's address >>> taken seems odd. >> >> [...] > > I personally like this rule (but I am speaking about me. there is > no full consensus about the exact interpretation of the standard > nor about what it should say). I will try to explain why. [...] It's a good rule. I agree with your comments. I guess it's possible the wording could be improved, but compared to other parts of the C standard the clarity of this passage is closer to the top than it is to the bottom.
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2023-08-16 11:11 -0700 |
| Message-ID | <86r0o26en6.fsf@linuxsc.com> |
| In reply to | #6514 |
Kaz Kylheku <864-117-4973@kylheku.com> writes: > On 2023-07-21, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote: > >> 6.3.2.1 p2: >> >> "[...] If the lvalue designates an object of automatic storage >> duration that could have been declared with the register storage class >> (never had its address taken), and that object is uninitialized (not >> declared with an initializer and no assignment to it has been >> performed prior to use), the behavior is undefined." >> >> seems to cover it. The restriction on not having it's address taken >> seems odd. > > Wording like that looks like someone's solo documentation effort, > not peer-reviewed by an expert commitee. > > That looks as if the intent is to allow some diagnoses of uses of > uninitialized variables, while discouraging others. That isn't at all what this passage is about.
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <864-117-4973@kylheku.com> |
|---|---|
| Date | 2023-07-21 17:42 +0000 |
| Message-ID | <20230721002225.404@kylheku.com> |
| In reply to | #6505 |
On 2023-07-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: > N3096 is the last public draft of the upcoming C23 standard. > > N3096 J.2 says: > > The behavior is undefined in the following circumstances: > [...] > (11) The value of an object with automatic storage duration is > used while the object has an indeterminate representation > (6.2.4, 6.7.10, 6.8). Personally, I think that the root cause of this whole issue is the defective definition of indeterminate value. Indeterminacy must be an abstract concept that is not encoded in the bits of the object; it is a matter of provenance. An indeterminate integer could have a valid bit pattern, such as all zero, yet the implementation should be free to terminate with a diagnostic (or behave in other ways) when it is accessed. It should not be possible to tell whether an object is indeterminate by looking at its bits. An implementation can track this with meta data. Translation time flow-analysis data can catch some uses of uninitialized objects; that's how we get classic uninitialized variable warnings. An implementation can track uninitialized bits at run-time with hidden meta-data. The Valgrind debugging tool does this; for every bit, whose value is necessarily always 0 or 1, it tracks whether the bit is initialized. That poor definition of indeterminate value should go. Otherwise the standard is contradicting itself and doing silly things like asserting that using an indeterminate value is undefined behavior if it is a local variable with automatic storage. A reasonable definition of indeterminate might be: indeterminate an abstract status indicating that a value is invalid, irrespective of the content of the bits which constitute that value. An improperly obtained value is indeterminate(1). A previously valid value may lapse into indeterminate status.(2) Any use of an indeterminate value is undefined behavior. -- (1) For example, a value obtained accessing an uninitialized object defined in automatic storage, or in an uninitializeed region of memory obtained from malloc (2) For example, a pointer to an object becomes indeterminate if that object is deallocated. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [next] | [standalone]
| From | Jakob Bohm <jb-usenet@wisemo.com.invalid> |
|---|---|
| Date | 2023-07-24 07:53 +0200 |
| Message-ID | <hV2dneL6E4fNjSP5nZ2dnZeNn_pg4p2d@giganews.com> |
| In reply to | #6508 |
On 2023-07-21 19:42, Kaz Kylheku wrote: > On 2023-07-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >> N3096 is the last public draft of the upcoming C23 standard. >> >> N3096 J.2 says: >> >> The behavior is undefined in the following circumstances: >> [...] >> (11) The value of an object with automatic storage duration is >> used while the object has an indeterminate representation >> (6.2.4, 6.7.10, 6.8). > > Personally, I think that the root cause of this whole issue is > the defective definition of indeterminate value. > The problem is much deeper than that. It all boils down to the obsession in the official C community to abuse the concept of "undefined" to cover everything from "arbitrary natural semantics of the hardware" to "optimizing away code unexpectedly" . It would be highly beneficial to a cleanup in C30 or even a corrective TR to split up the concept into explicit cases that vary for each situation. For example, runtime error reporting should be very different from optimizing away code that may encounter runtime errors on different hardware than the one it is actually run on. From a simplified conceptual machine model that resembles a modern von Neumann architecture with only floating point types having actual trap representations, a lot of rules that have at various times been rephrased using the word "undefined" seem utterly absurd, and applying the current meaning of "undefined" back to the actual machines that inspired them will tend to cause even more absurdities. For example that ability of the IA64 CPUs to raise an actual trap exception in response to reading an uninitialized register is very different from aggressively optimizing away code that might use an unknown stray value, especially with the aggressive optimization settings required by the IA64 Explicitly Parallel design. Some of the things that "undefined" in the current text could map to: - anyof(A,B,C) = An implementation specific and possibly uncontrolled choice between A, B and C (with no others permitted). - Continuing as if nothing happened - Aborting execution, possibly with an error indication. - raise(X) where X is specified in the standard. - An implementation specific value to be listed in the implementation documentation. - A standard specified value. - Executing machine code at a specified memory address in accordance with the actual machine behavior (This is common for calling a function pointer that isn't set to a C function of proper type). - Causing the code to be eliminated (think assume(0);) - Reserved for future standardization in future editions. - Reserved for standardization in other ISO documents (such as POSIX or C++). - Reserved for implementation specific behavior to be listed in the implementation documentation. For example, the effect of calling assert() with a false value is "anyof(continuing as if nothing, abort with error)", with it being implementation defined how to force either choice (many implementations will use the status of the DEBUG define). There should also be a way for limits.h (one of the few headers required in free-standing implementations) to specify via new standard defines if the implementation conforms to common sets of implementation specific behaviors such as "twos complement int with wraparound", "ones complement int with wraparound", "sign and magnitude int with wraparound", "unsigned with wraparound", "IEEE nnnn floating point with/without overflow exceptions", "negative int division by positive int rounds towards zero" (and the other possibilities for division special cases) etc. etc. Enjoy Jakob -- Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10 This public discussion message is non-binding and may contain errors. WiseMo - Remote Service Management for PCs, Phones and Embedded
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2023-07-25 21:57 -0700 |
| Message-ID | <86zg3jqngv.fsf@linuxsc.com> |
| In reply to | #6516 |
Jakob Bohm <jb-usenet@wisemo.com.invalid> writes: > On 2023-07-21 19:42, Kaz Kylheku wrote: > >> On 2023-07-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >> >>> N3096 is the last public draft of the upcoming C23 standard. >>> >>> N3096 J.2 says: >>> >>> The behavior is undefined in the following circumstances: >>> [...] >>> (11) The value of an object with automatic storage duration is >>> used while the object has an indeterminate representation >>> (6.2.4, 6.7.10, 6.8). >> >> Personally, I think that the root cause of this whole issue is >> the defective definition of indeterminate value. > > The problem is much deeper than that. It all boils down to the > obsession in the official C community to abuse the concept of > "undefined" to cover everything from "arbitrary natural semantics > of the hardware" to "optimizing away code unexpectedly" . [...] This discussion looks interesting but it seems better that there be a separate thread to take it up.
[toc] | [prev] | [next] | [standalone]
Page 2 of 3 — ← Prev page 1 [2] 3 Next page →
Back to top | Article view | comp.std.c
csiph-web