Path: csiph.com!news.swapon.de!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: James Kuyper Newsgroups: comp.std.c Subject: Re: May a string span multiple, independent objects? Date: Fri, 5 Jul 2024 01:37:35 -0400 Organization: A noiseless patient Spider Lines: 121 Message-ID: References: <20240703141500$00ed@vinc17.org> <87zfqy6v54.fsf@bsb.me.uk> <20240704130236$a100@vinc17.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Injection-Date: Fri, 05 Jul 2024 07:37:51 +0200 (CEST) Injection-Info: dont-email.me; posting-host="89514857720af21795a159cded758515"; logging-data="3291345"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/bvUa3EmkoPjobQRK2CHVdTvE6s+7kuos=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:jylJtO3muanxr4KHvDd2bDlsknE= Content-Language: en-US In-Reply-To: <20240704130236$a100@vinc17.org> Xref: csiph.com comp.std.c:6655 On 7/4/24 09:22, Vincent Lefevre wrote: > In article <87zfqy6v54.fsf@bsb.me.uk>, > Ben Bacarisse wrote: > >> James Kuyper writes: > >>> On 7/3/24 10:31, Vincent Lefevre wrote: >>>> ISO C17 (and C23 draft) 7.1.1 defines a string as follows: "A string >>>> is a contiguous sequence of characters terminated by and including >>>> the first null character." >>>> >>>> But may a string span multiple, independent objects that happens >>>> to be contiguous in memory? >> ... >>>> For instance, is the following program valid and what does the ISO C >>>> standard say about that? >>>> >>>> #include >>>> #include >>>> >>>> typedef char *volatile vp; >>>> >>>> int main (void) >>>> { >>>> char a = '\0', b = '\0'; >>> >>> a and b are not guaranteed to be contiguous. >>> >>>> vp p = &a, q = &b; >>>> >>>> printf ("%p\n", (void *) p); >>>> printf ("%p\n", (void *) q); >>>> if (p + 1 == q) >>>> { >>> >>> That comparison is legal, and has well-defined behavior. It will be true >>> only if they are in fact contiguous. >>> >>>> a = 'x'; >>>> printf ("%zd\n", strlen (p)); >>> >>> Because strlen() must take a pointer to 'a' (which is treated, for these >>> purposes, as a array of char of length 1), and increment it one past the >>> end of that array, and then dereference that pointer to check whether it >>> points as a null character, the behavior is undefined. > >> I think this is slightly misleading. It suggests that the UB comes from >> something strlen /must/ do, but strlen must be thought of as a black >> box. We can't base anyhting on a assumed implementation. > > I agree (and note that strlen is not necessarily written in C). > >> But our conclusion is correct because there is explicit wording covering >> this case. The section on "String function conventions" (7.24.1) >> states: > >> "If an array is accessed beyond the end of an object, the behavior is >> undefined." > > Arguments of these functions are either arrays and strings, where a > string is not defined as being an array (or a part of an array). So > I don't see why this text, as written, would apply to strings. > > BTW, the definition of an object is rather vague: "region of data > storage in the execution environment, the contents of which can > represent values". But it is not excluded that contiguous areas > can form an object. Not everything you need to know about a term defined in the C standard is included in its definition. Other parts of the standard tell you that objects are created by declarations of identifiers for those objects with static, thread_local, or automatic storage duration. Other parts tell you that anonymous objects can be created by the presence of string or compound literals. The description of the standard library tells you that objects with allocated storage duration are created by calling memory allocation functions. Nowhere does it say that a larger C object can be created simply by having two C objects that happen to be adjacent with each other. The basic rule, even though it is not explicitly part of the definition of "object", is that you don't have a C object unless some clause of the C standard tells you that it is an object, and the clauses I've summarized above are the only ones that do so. Note: if they don't just "happen" to be adjacent - if the C standard guarantees that two objects are adjacent to each other by reason of being sub-objects of some larger object - then the existence of that larger object is what makes the behavior defined when incrementing a pointer into the first object through the second. > Similarly, malloc() is specified as allocating space for an object, > but this does not mean that one initially has an object in the Actually, it does. "The lifetime of an allocated object extends from the allocation until the deallocation." (7.24.3p1). It becomes an object as soon as allocated. "The effective type of an object for an access to its stored value is the declared type of the object, if any." (6.5p6). Note that allocated memory is the only kind that doesn't start out with a declared type. That paragraph goes on to say that "If a value is stored into an object having no declared type through an lvalue having a type that is not a non-atomic character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value." Note that this wording describes it as already being an object before any value has been written into the allocated memory. The second way to give allocated memory an effective type uses wording with that same implication: "If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one."