Path: csiph.com!news.swapon.de!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: James Kuyper Newsgroups: comp.std.c Subject: Re: May a string span multiple, independent objects? Date: Wed, 3 Jul 2024 11:59:06 -0400 Organization: A noiseless patient Spider Lines: 82 Message-ID: References: <20240703141500$00ed@vinc17.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Injection-Date: Wed, 03 Jul 2024 18:00:48 +0200 (CEST) Injection-Info: dont-email.me; posting-host="d7676d8b49e8d1fc9338a815d9368d40"; logging-data="2375336"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/UzikP26ee9IYIkmDCsZbgOIktHymMHl0=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:trIXSC2H8dMwYw5nBDPTZjkOecc= In-Reply-To: <20240703141500$00ed@vinc17.org> Content-Language: en-US Xref: csiph.com comp.std.c:6649 On 7/3/24 10:31, Vincent Lefevre wrote: > ISO C17 (and C23 draft) 7.1.1 defines a string as follows: "A string > is a contiguous sequence of characters terminated by and including > the first null character." > > But may a string span multiple, independent objects that happens > to be contiguous in memory? If they're truly independent, you cannot portably guarantee that they are contiguous, but they might happen to be contiguous. If they happen to be contiguous, they can together qualify as a string, but there's very little that can usefully be done with such a string. That's because if you start with a pointer to one array, and increment it until it points one past the end of that array, it is permitted for that pointer to be compared for equality to a pointer to the start of another array, and it will compare true if and only if they are contiguous. However, it is undefined behavior to dereference such a pointer, or to increment it even one step further. Therefore, any code that tries to do anything useful with such an accidental string will generally have undefined behavior. While, in principle, undefined behavior could be arbitrarily bad, in many cases this will not cause problems except on an implementation that does run-time bounds checking of pointer, for instance raising a signal if the behavior is undefined. Run time bounds checking would be very slow, so it would probably only be turning on in a debugging mode. Far more likely is a much more subtle possibility. Any time that code has undefined behavior, an implementation might perform optimizations based upon the assumption that you will not write such code. Specifically, consider two pointers, one of which started out pointing into one array, but was incremented to the point where the behavior was undefined, and ended up pointing into a second array. The other pointer started out pointing into that second array, and still does. They currently both point at the same location. Because the behavior of such code is undefined, an implementation is not obliged to make sure that reads and writes through the two pointers are synchronized. If you have *p = 'z', there's no guarantee that subsequently *q == 'z', even though p and q both currently point at the same location. The 'z' might,for instance, still be stored in a register waiting to be written to the actual memory location at some later time. > For instance, is the following program valid and what does the ISO C > standard say about that? > > #include > #include > > typedef char *volatile vp; > > int main (void) > { > char a = '\0', b = '\0'; a and b are not guaranteed to be contiguous. > vp p = &a, q = &b; > > printf ("%p\n", (void *) p); > printf ("%p\n", (void *) q); > if (p + 1 == q) > { That comparison is legal, and has well-defined behavior. It will be true only if they are in fact contiguous. > a = 'x'; > printf ("%zd\n", strlen (p)); Because strlen() must take a pointer to 'a' (which is treated, for these purposes, as a array of char of length 1), and increment it one past the end of that array, and then dereference that pointer to check whether it points as a null character, the behavior is undefined. ... > If such a program is valid, would there be issues by working with > pointers on such a string, say, dereferencing p[1] in the first "if" > (which is normally UB)? Yes.