Path: csiph.com!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Ben Bacarisse Newsgroups: comp.std.c Subject: Re: May a string span multiple, independent objects? Date: Fri, 05 Jul 2024 05:14:52 +0100 Organization: A noiseless patient Spider Lines: 84 Message-ID: <87v81kv5j7.fsf@bsb.me.uk> References: <20240703141500$00ed@vinc17.org> <87zfqy6v54.fsf@bsb.me.uk> <20240704130236$a100@vinc17.org> MIME-Version: 1.0 Content-Type: text/plain Injection-Date: Fri, 05 Jul 2024 06:14:52 +0200 (CEST) Injection-Info: dont-email.me; posting-host="14a2042bac898c3ce52ccc0f569c4592"; logging-data="3306546"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19tJtkFZsHZ4zUNfaN8TG6+Hqf4Dno1JMo=" User-Agent: Gnus/5.13 (Gnus v5.13) Cancel-Lock: sha1:sHwGlpWRHqNS/qUjpK93As1kpnU= sha1:tfgFSMJtmItFZZMQeo0K4XYHJIc= X-BSB-Auth: 1.3eefd2b5ed14cc335e75.20240705051452BST.87v81kv5j7.fsf@bsb.me.uk Xref: csiph.com comp.std.c:6654 Vincent Lefevre writes: > In article <87zfqy6v54.fsf@bsb.me.uk>, > Ben Bacarisse wrote: > >> James Kuyper writes: > >> > On 7/3/24 10:31, Vincent Lefevre wrote: >> >> ISO C17 (and C23 draft) 7.1.1 defines a string as follows: "A string >> >> is a contiguous sequence of characters terminated by and including >> >> the first null character." >> >> >> >> But may a string span multiple, independent objects that happens >> >> to be contiguous in memory? >> ... >> >> For instance, is the following program valid and what does the ISO C >> >> standard say about that? >> >> >> >> #include >> >> #include >> >> >> >> typedef char *volatile vp; >> >> >> >> int main (void) >> >> { >> >> char a = '\0', b = '\0'; >> > >> > a and b are not guaranteed to be contiguous. >> > >> >> vp p = &a, q = &b; >> >> >> >> printf ("%p\n", (void *) p); >> >> printf ("%p\n", (void *) q); >> >> if (p + 1 == q) >> >> { >> > >> > That comparison is legal, and has well-defined behavior. It will be true >> > only if they are in fact contiguous. >> > >> >> a = 'x'; >> >> printf ("%zd\n", strlen (p)); >> > >> > Because strlen() must take a pointer to 'a' (which is treated, for these >> > purposes, as a array of char of length 1), and increment it one past the >> > end of that array, and then dereference that pointer to check whether it >> > points as a null character, the behavior is undefined. > >> I think this is slightly misleading. It suggests that the UB comes from >> something strlen /must/ do, but strlen must be thought of as a black >> box. We can't base anyhting on a assumed implementation. > > I agree (and note that strlen is not necessarily written in C). > >> But our conclusion is correct because there is explicit wording covering >> this case. The section on "String function conventions" (7.24.1) >> states: > >> "If an array is accessed beyond the end of an object, the behavior is >> undefined." > > Arguments of these functions are either arrays and strings, where a > string is not defined as being an array (or a part of an array). So > I don't see why this text, as written, would apply to strings. I'd say because an object like a (or b) is considered to be an array of length one. > BTW, the definition of an object is rather vague: "region of data > storage in the execution environment, the contents of which can > represent values". But it is not excluded that contiguous areas > can form an object. Indeed. In fact an array is an object made up of contiguous objects, but &a points to an array of length one. > Similarly, malloc() is specified as allocating space for an object, > but this does not mean that one initially has an object in the > allocated space, though with the above restriction, this would > be important to be able to use memset() on this storage area. I think you have an object as soon all the storage is allocated. -- Ben.