Path: csiph.com!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch Newsgroups: comp.std.c Subject: Arrays and pointer arithmetic Date: Mon, 28 Feb 2022 12:07:33 -0800 Organization: A noiseless patient Spider Lines: 131 Message-ID: <867d9ehi0q.fsf@linuxsc.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: reader02.eternal-september.org; posting-host="7d0634b361e5517f2a8ab668d07a1578"; logging-data="5375"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18f/VzNyXFQaS0KM4woRsOmp66nuUTG0HY=" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:clukOp3MvsvQZ6AnGp9DtKnjGHU= sha1:in9i9gKyYcAhiPIOp954xbQfOJI= Xref: csiph.com comp.std.c:6429 This posting is prompted by a discussion in comp.lang.c++ about arrays and pointer arithmetic. An excerpt from that discussion is given below, to provide context for anyone who did not see the recent comp.lang.c++ discussion. Consider three aspects of behavior in C: * reading a union member after storing into a different member * sequencing rules for expressions * implications for code reordering around a volatile access Here are some notes about these areas. For reading a union member after a different member has been written, C90 says the value is implementation defined. C99 does not say that, and explains what appears to be a different rule in a non-normative footnote. Yet meeting notes from the ISO C website indicate that the C99 description is meant to convey the same semantics as the C90 description (or vice versa). For sequencing within a single expression, there was a famous debate about whether (for C90 and C99) an assignment such as 'a[a[0]] = 4;', where a[0] initially has the value 0, has defined behavior or undefined behavior. A straightforward reading of the C90/C99 text suggests it was undefined. In C11, the description of sequencing rules was revised, and under the C11 description the behavior is, pretty unambiguously, well defined. Yet there is no mention of the C11 sequencing rules constituting a change from the C90/C99 rules; apparently the C11 description was meant to be, at best, a clarification, but without any change to what the semantics are. For code reordering around a volatile access, it's easy to draw the conclusion that the C standard allows no movement (i.e., for purposes of optimization) of any earlier or later reads or writes across the volatile access expression. Yet discussion with committee members definitely indicates that some such code movement is allowed, despite what the C standard text would plainly indicate. Another example has to do with type rules for printf() arguments. If there is a printf() call such as printf( "%u", 7 ); is the behavior defined or undefined? There are reasonable positions both pro and con. How are we to understand which view better represents the judgment of the committee members? (I take it as given that a judgment from the ISO C committee constitutes the ultimate authority as to what the C standard either requires or allows.) Incidentally, in the recent draft N2731, there is new wording that answers this question in favor of the behavior being defined, not undefined. How are we to make sense of these apparent incongruities? All of these cases can be understood using a single explanation: members of the ISO C committee have a mental model for how the language is supposed to behave in each case, and what is written in the ISO C standard is meant to reflect those models, but sometimes the writing falls short. When it does, the model prevails, because as far as the members' view is concerned, "the truth" is what the model says, not what the words say. The description of semantic rules for pointer arithmetic talk about situations where "the expression P points [...] to an element of an array object [...]", but it isn't always clear what "array object" is being referenced (in particular in the presence of allocated memory). My understanding of what C allows for pointer arithmetic is as follows. What matters is where the pointer value in question originally came from. If the original pointer value pointed to an element of an array (with suitable language to handle the case of pointing one past the last element of the array), further use of that pointer value (e.g., by means of casting) is allowed to access all the memory occupied by the array of the element of the original pointer value source. Thus in the example below the address &foo points to a single element array that coincides with all of the memory occupied by foo, and thus it may access (after the castint) all of the int elements of the two-dimensional array. Evidence for this mental model, and for committee members holding it, can be seen in various official ISO C writings on their website, when the "provenance" of pointer values is discussed. My understanding of what C allows here is based partly or perhaps mostly on those written discussions. When I say below "an argument could be made...", it doesn't mean that I feel unsure about my own understanding. What it does mean is that someone reading just the text in the C standard, and nothing else, might very well reach a different conclusion. My comment is meant to acknowledge that such positions may exist, even though I myself don't find them persuasive. I hope this explanation clarifies both what I meant and why I have reached the conclusions that I have. Tim Rentsch writes: > Ben Bacarisse writes: > >> Tim Rentsch writes: > > [edited for brevity] > >>> If we have this code fragment >>> >>> int foo[10][20]; >>> extern void set_elements( int *, size_t, int ) >>> >>> set_elements( (int*) &foo, 10*20, -1 ); >>> >>> an argument could be made that set_elements() cannot use pointer >>> arithmetic (including that implied by use of []) on its first >>> argument other than to access between foo[0][0] and foo[0][19] (or >>> to construct a pointer to foo[0][20]). [...] >> >> [...] It's what direct additions and subtractions are permitted >> for any given pointer that I no longer feel sure about. Your "a >> case could be made" suggests you are not entirely sure either, >> though it does suggest you consider that case is a stretch. > > The implied question here has a somewhat longish answer. I'll > get to it when I can. Also, as it seems we have drifted rather > far from C++, comp.std.c is I think a better place to continue. (This concludes the quoted excerpt.)