Path: csiph.com!news.swapon.de!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Kaz Kylheku <643-408-1753@kylheku.com> Newsgroups: comp.std.c Subject: Re: May a string span multiple, independent objects? Date: Fri, 5 Jul 2024 07:14:43 -0000 (UTC) Organization: A noiseless patient Spider Lines: 67 Message-ID: <20240705000419.170@kylheku.com> References: <20240703141500$00ed@vinc17.org> Injection-Date: Fri, 05 Jul 2024 09:14:43 +0200 (CEST) Injection-Info: dont-email.me; posting-host="0e37005f4851073825761cec33a52e11"; logging-data="3358387"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19E9ZlS7QcRvAtpXXSFt5rleXferwdmVx4=" User-Agent: slrn/pre1.0.4-9 (Linux) Cancel-Lock: sha1:QtYtBqWPfmbykcioDtqhFecE0LI= Xref: csiph.com comp.std.c:6656 On 2024-07-03, Vincent Lefevre wrote: > ISO C17 (and C23 draft) 7.1.1 defines a string as follows: "A string > is a contiguous sequence of characters terminated by and including > the first null character." > > But may a string span multiple, independent objects that happens > to be contiguous in memory? It is undefined behavior. Implementations are allowed to track the provenance of a displaced pointer, and diagnose when it is out of bounds even if the displaced value points into a valid object, and even if th eprogram validates that via a well-defined equality test. > For instance, is the following program valid and what does the ISO C > standard say about that? > > #include > #include > > typedef char *volatile vp; > > int main (void) > { > char a = '\0', b = '\0'; > vp p = &a, q = &b; > > printf ("%p\n", (void *) p); > printf ("%p\n", (void *) q); > if (p + 1 == q) > { > a = 'x'; > printf ("%zd\n", strlen (p)); > } In this situation, the p + 1 expression is well-defined as well the p + 1 == q test. However, while *q is a valid expression that evaluates to zero, *(p + 1) isn't valid. The one byte past the object pointer value may not be dereferenced. The equivalence p + 1 == q doesn't save it; p + 1 is displaced from p, unrelated to q. > if (q + 1 == p) > { > b = 'x'; > printf ("%zd\n", strlen (q)); > } > return 0; > } > > If such a program is valid, would there be issues by working with > pointers on such a string, say, dereferencing p[1] in the first "if" > (which is normally UB)? An issue could be that the implementation's optimizer assumes that p + 1 and q are poiners to distinct objects, even in the middle of a block of code that is conditional on p + 1 == q. If the code executes *(p + 1) = 'a', a subsequent evaluation of *q or b can still produce 0. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca