Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.std.c > #6505 > unrolled thread
| Started by | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| First post | 2023-07-20 22:16 -0700 |
| Last post | 2023-08-16 21:08 +0000 |
| Articles | 20 on this page of 48 — 8 participants |
Back to article view | Back to comp.std.c
Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-07-20 22:16 -0700
Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-07-21 16:33 +0100
Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-07-21 11:56 -0700
Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-07-21 20:54 +0100
Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-07-21 14:26 -0700
Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-07-21 23:39 +0100
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-12 17:00 -0700
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-13 23:41 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-15 21:06 -0700
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-15 22:40 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 23:13 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-17 07:08 +0000
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-18 12:44 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-19 05:04 +0000
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-19 01:36 -0700
Re: Does reading an uninitialized object have undefined behavior? Richard Damon <Richard@Damon-Family.org> - 2023-08-19 09:18 -0400
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-19 11:12 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-18 20:20 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-19 05:23 +0000
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-18 22:56 -0700
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-18 12:52 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-26 19:25 -0700
Re: Does reading an uninitialized object have undefined behavior? Spiros Bousbouras <spibou@gmail.com> - 2023-08-27 08:31 +0000
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-29 04:35 -0700
Re: Does reading an uninitialized object have undefined behavior? Spiros Bousbouras <spibou@gmail.com> - 2023-08-30 19:53 +0000
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-30 17:40 -0700
Re: Does reading an uninitialized object have undefined behavior? Spiros Bousbouras <spibou@gmail.com> - 2023-08-31 18:18 +0000
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-05 05:39 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-05 17:03 -0700
Re: Does reading an uninitialized object have undefined behavior? Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2023-09-07 17:09 +0200
Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-09-07 17:19 +0100
Re: Does reading an uninitialized object have undefined behavior? Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2023-09-08 23:12 +0200
Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-09-08 22:31 +0100
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-07-22 06:40 +0000
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-07-22 06:03 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-07-25 21:53 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 11:11 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-07-21 17:42 +0000
Re: Does reading an uninitialized object have undefined behavior? Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2023-07-24 07:53 +0200
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-07-25 21:57 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-03 13:13 -0700
Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-03 15:20 -0700
Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-05 01:15 -0700
Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 09:19 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-16 19:51 +0000
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-16 20:03 +0000
Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-16 13:43 -0700
Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-16 21:08 +0000
Page 1 of 3 [1] 2 3 Next page →
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2023-07-20 22:16 -0700 |
| Subject | Does reading an uninitialized object have undefined behavior? |
| Message-ID | <87zg3pq1ym.fsf@nosuchdomain.example.com> |
N3096 is the last public draft of the upcoming C23 standard.
N3096 J.2 says:
The behavior is undefined in the following circumstances:
[...]
(11) The value of an object with automatic storage duration is
used while the object has an indeterminate representation
(6.2.4, 6.7.10, 6.8).
I'll use an `int` object in my example.
Reading an object that holds a non-value representation has undefined
behavior, but not all integer types have non-value representations
-- and if an implementation has certain characteristics, we can
reliably infer that int has no non-value representations (called
"trap representations" in C99, C11, and C17).
Consider this program:
```
#include <limits.h>
int main(void) {
int foo;
if (sizeof (int) == 4 &&
CHAR_BIT == 8 &&
INT_MAX == 2147483647 &&
INT_MIN == -INT_MAX-1)
{
int bar = foo;
}
}
```
If the condition is true (as it is for many real-world
implementations), then int has no padding bits and no trap
representations. The object `foo` has an indeterminate representation
when it's used to initialize `bar`. Since it cannot have a non-value
representation, it has an unspecified value.
If J.2(11) is correct, then the use of the value results in undefined
behavior.
But Annex J is non-normative, and as far as I can tell there is no
normative text in the standard that says the behavior is undefined.
6.2.4 discusses storage duration.
6.7.10 discusses initialization; p11 implies that the representation of
`foo` is indeterminate. It does not say
6.8 discusses statements and blocks, and repeats that "the
representation of objects without an initializer becomes
indeterminate".
None of these discuss what happens when the value of an object with
an indeterminate representation is used -- nor does any other text
I found by searching the standard for "indeterminate representation".
I see no relevant changes between C11 and C23 (except that C23 changes
the term "trap representation" to "non-value representation").
I suggest there are three possible resolutions:
1. J.2(11) is correct and I've missed something (always a possibility,
but so far nobody in comp.lang.c has come up with anything).
2. J.2(11) reflects the intent, and normative text somewhere else
in the standard needs to be updated or added to make it clear
that using the value of an object with automatic storage duration
while the object has an indeterminate representation has undefined
behavior.
3. J.2(11) is incorrect and needs to be modified or deleted.
(This would also imply that compilers may not perform certain
optimizations. I have no idea whether any compilers would actually
be affected.)
I'm going to post this to comp.std.c and email it to the C23 editors.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2023-07-21 16:33 +0100 |
| Message-ID | <87zg3pnuse.fsf@bsb.me.uk> |
| In reply to | #6505 |
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
> N3096 is the last public draft of the upcoming C23 standard.
>
> N3096 J.2 says:
>
> The behavior is undefined in the following circumstances:
> [...]
> (11) The value of an object with automatic storage duration is
> used while the object has an indeterminate representation
> (6.2.4, 6.7.10, 6.8).
>
> I'll use an `int` object in my example.
>
> Reading an object that holds a non-value representation has undefined
> behavior, but not all integer types have non-value representations
> -- and if an implementation has certain characteristics, we can
> reliably infer that int has no non-value representations (called
> "trap representations" in C99, C11, and C17).
>
> Consider this program:
> ```
> #include <limits.h>
> int main(void) {
> int foo;
> if (sizeof (int) == 4 &&
> CHAR_BIT == 8 &&
> INT_MAX == 2147483647 &&
> INT_MIN == -INT_MAX-1)
> {
> int bar = foo;
> }
> }
> ```
>
> If the condition is true (as it is for many real-world
> implementations), then int has no padding bits and no trap
> representations. The object `foo` has an indeterminate representation
> when it's used to initialize `bar`. Since it cannot have a non-value
> representation, it has an unspecified value.
>
> If J.2(11) is correct, then the use of the value results in undefined
> behavior.
>
> But Annex J is non-normative, and as far as I can tell there is no
> normative text in the standard that says the behavior is undefined.
6.3.2.1 p2:
"[...] If the lvalue designates an object of automatic storage
duration that could have been declared with the register storage class
(never had its address taken), and that object is uninitialized (not
declared with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined."
seems to cover it. The restriction on not having it's address taken
seems odd.
--
Ben.
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2023-07-21 11:56 -0700 |
| Message-ID | <874jlxozzz.fsf@nosuchdomain.example.com> |
| In reply to | #6507 |
Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
>> N3096 is the last public draft of the upcoming C23 standard.
>>
>> N3096 J.2 says:
>>
>> The behavior is undefined in the following circumstances:
>> [...]
>> (11) The value of an object with automatic storage duration is
>> used while the object has an indeterminate representation
>> (6.2.4, 6.7.10, 6.8).
>>
>> I'll use an `int` object in my example.
>>
>> Reading an object that holds a non-value representation has undefined
>> behavior, but not all integer types have non-value representations
>> -- and if an implementation has certain characteristics, we can
>> reliably infer that int has no non-value representations (called
>> "trap representations" in C99, C11, and C17).
>>
>> Consider this program:
>> ```
>> #include <limits.h>
>> int main(void) {
>> int foo;
>> if (sizeof (int) == 4 &&
>> CHAR_BIT == 8 &&
>> INT_MAX == 2147483647 &&
>> INT_MIN == -INT_MAX-1)
>> {
>> int bar = foo;
>> }
>> }
>> ```
>>
>> If the condition is true (as it is for many real-world
>> implementations), then int has no padding bits and no trap
>> representations. The object `foo` has an indeterminate representation
>> when it's used to initialize `bar`. Since it cannot have a non-value
>> representation, it has an unspecified value.
>>
>> If J.2(11) is correct, then the use of the value results in undefined
>> behavior.
>>
>> But Annex J is non-normative, and as far as I can tell there is no
>> normative text in the standard that says the behavior is undefined.
>
> 6.3.2.1 p2:
>
> "[...] If the lvalue designates an object of automatic storage
> duration that could have been declared with the register storage class
> (never had its address taken), and that object is uninitialized (not
> declared with an initializer and no assignment to it has been
> performed prior to use), the behavior is undefined."
>
> seems to cover it. The restriction on not having it's address taken
> seems odd.
Good find.
That sentence was added in C11 (it doesn't appear in C99 or in
N1256, which consists of C99 plus the three Technical Corrigenda)
in response to DR #338. Since the wording in Annex J goes back to
C99 in its current form, and to C90 in a slightly different form,
that can't be what Annex J is referring to. And the statement
in Annex J is more general, so we can't quite use 6.3.2.1p2 as a
retroactive justification.
https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_338.htm
Yes, that restriction does seem strange. It was inspired by the
IA64 (Itanium) architecture, which has an extra trap bit in each
CPU register (NaT, "not a thing"). The "could have been declared
with the register storage class" wording is there because the IA64
NaT bit exists only in CPU registers, not in memory.
An object with automatic storage duration might be stored in an IA64
CPU register. If the object is not initialized, the register's
NaT bit would be set. Any attempt to read it would cause a trap.
Writing it would clear the NaT bit.
Which means that a hypothetical CPU with something like a NaT bit
on each word of memory (iAPX 432? i960?) might cause a trap in
circumstances not covered by that wording -- but it *is* covered
by the wording in Annex J.
(Normally, an object whose address is taken can still be stored in
a CPU register for part of its lifetime. The effect is to forbid
certain optimizations on I64-like systems.)
It's tempting to conclude that reading an uninitialized automatic
object whose address is taken is *not* undefined behavior
(https://en.wikipedia.org/wiki/Exception_that_proves_the_rule),
but the standard doesn't say so.
C90's Annex G (renamed to Annex J in later editions) says:
The behavior in the following circumstances is undefined:
[...]
- The value of an uninitialized object that has automatic storage
duration is used before a value is assigned (6.5.7).
6.5.7 discusses initialization, but doesn't say that reading an
uninitialized object has undefined behave, so the issue is an old one.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2023-07-21 20:54 +0100 |
| Message-ID | <87fs5hnipv.fsf@bsb.me.uk> |
| In reply to | #6509 |
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
> Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
>> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
>>> N3096 is the last public draft of the upcoming C23 standard.
>>>
>>> N3096 J.2 says:
>>>
>>> The behavior is undefined in the following circumstances:
>>> [...]
>>> (11) The value of an object with automatic storage duration is
>>> used while the object has an indeterminate representation
>>> (6.2.4, 6.7.10, 6.8).
>>>
>>> I'll use an `int` object in my example.
>>>
>>> Reading an object that holds a non-value representation has undefined
>>> behavior, but not all integer types have non-value representations
>>> -- and if an implementation has certain characteristics, we can
>>> reliably infer that int has no non-value representations (called
>>> "trap representations" in C99, C11, and C17).
>>>
>>> Consider this program:
>>> ```
>>> #include <limits.h>
>>> int main(void) {
>>> int foo;
>>> if (sizeof (int) == 4 &&
>>> CHAR_BIT == 8 &&
>>> INT_MAX == 2147483647 &&
>>> INT_MIN == -INT_MAX-1)
>>> {
>>> int bar = foo;
>>> }
>>> }
>>> ```
>>>
>>> If the condition is true (as it is for many real-world
>>> implementations), then int has no padding bits and no trap
>>> representations. The object `foo` has an indeterminate representation
>>> when it's used to initialize `bar`. Since it cannot have a non-value
>>> representation, it has an unspecified value.
>>>
>>> If J.2(11) is correct, then the use of the value results in undefined
>>> behavior.
>>>
>>> But Annex J is non-normative, and as far as I can tell there is no
>>> normative text in the standard that says the behavior is undefined.
>>
>> 6.3.2.1 p2:
>>
>> "[...] If the lvalue designates an object of automatic storage
>> duration that could have been declared with the register storage class
>> (never had its address taken), and that object is uninitialized (not
>> declared with an initializer and no assignment to it has been
>> performed prior to use), the behavior is undefined."
>>
>> seems to cover it. The restriction on not having it's address taken
>> seems odd.
>
> Good find.
>
> That sentence was added in C11 (it doesn't appear in C99 or in
> N1256, which consists of C99 plus the three Technical Corrigenda)
> in response to DR #338. Since the wording in Annex J goes back to
> C99 in its current form, and to C90 in a slightly different form,
> that can't be what Annex J is referring to. And the statement
> in Annex J is more general, so we can't quite use 6.3.2.1p2 as a
> retroactive justification.
Thanks for looking into the history. I was going to do that when I had
some time.
There are three relevant clauses in Annex J, and I think we should keep
them all in mind. Sadly, they are not numbered (until C23) so I've
given then 'UB' numbers taken from the similar wording in C23.
— The value of an object with automatic storage duration is used while
it is indeterminate (6.2.4, 6.7.9, 6.8). [UB-11]
— A trap representation is read by an lvalue expression that does not
have character type (6.2.6.1). [UB-12]
— An lvalue designating an object of automatic storage duration that
could have been declared with the register storage class is used in
a context that requires the value of the designated object, but the
object is uninitialized. (6.3.2.1). [UB-20]
Clearly, UB-20 is explained by the quote I posted, but UB-11 (the one we
are talking about) is there as well and, as you say, can't be fully
explained by that normative quote.
> https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_338.htm
>
> Yes, that restriction does seem strange. It was inspired by the
> IA64 (Itanium) architecture, which has an extra trap bit in each
> CPU register (NaT, "not a thing"). The "could have been declared
> with the register storage class" wording is there because the IA64
> NaT bit exists only in CPU registers, not in memory.
Thanks. I wondered if might have been some hardware consideration...
> An object with automatic storage duration might be stored in an IA64
> CPU register. If the object is not initialized, the register's
> NaT bit would be set. Any attempt to read it would cause a trap.
> Writing it would clear the NaT bit.
>
> Which means that a hypothetical CPU with something like a NaT bit
> on each word of memory (iAPX 432? i960?) might cause a trap in
> circumstances not covered by that wording -- but it *is* covered
> by the wording in Annex J.
It's covered by UB-12 and that's backed up by normative text,
specifically paragraph 5 of the section cited in UB-12.
> (Normally, an object whose address is taken can still be stored in
> a CPU register for part of its lifetime. The effect is to forbid
> certain optimizations on I64-like systems.)
>
> It's tempting to conclude that reading an uninitialized automatic
> object whose address is taken is *not* undefined behavior
> (https://en.wikipedia.org/wiki/Exception_that_proves_the_rule),
> but the standard doesn't say so.
But it doesn't say that it is UB either, does it? That case is excluded
in 6.3.2.1 p2, but there's not else covering it but the non-normative
Annex J.
> C90's Annex G (renamed to Annex J in later editions) says:
>
> The behavior in the following circumstances is undefined:
> [...]
> - The value of an uninitialized object that has automatic storage
> duration is used before a value is assigned (6.5.7).
>
> 6.5.7 discusses initialization, but doesn't say that reading an
> uninitialized object has undefined behave, so the issue is an old one.
--
Ben.
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2023-07-21 14:26 -0700 |
| Message-ID | <87a5vpnegz.fsf@nosuchdomain.example.com> |
| In reply to | #6510 |
Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
[...]
> There are three relevant clauses in Annex J, and I think we should keep
> them all in mind. Sadly, they are not numbered (until C23) so I've
> given then 'UB' numbers taken from the similar wording in C23.
>
> — The value of an object with automatic storage duration is used while
> it is indeterminate (6.2.4, 6.7.9, 6.8). [UB-11]
>
> — A trap representation is read by an lvalue expression that does not
> have character type (6.2.6.1). [UB-12]
>
> — An lvalue designating an object of automatic storage duration that
> could have been declared with the register storage class is used in
> a context that requires the value of the designated object, but the
> object is uninitialized. (6.3.2.1). [UB-20]
[...]
>> An object with automatic storage duration might be stored in an IA64
>> CPU register. If the object is not initialized, the register's
>> NaT bit would be set. Any attempt to read it would cause a trap.
>> Writing it would clear the NaT bit.
>>
>> Which means that a hypothetical CPU with something like a NaT bit
>> on each word of memory (iAPX 432? i960?) might cause a trap in
>> circumstances not covered by that wording -- but it *is* covered
>> by the wording in Annex J.
>
> It's covered by UB-12 and that's backed up by normative text,
> specifically paragraph 5 of the section cited in UB-12.
I don't think so. A "non-value representation" (formerly a "trap
representation") is determined by the bits making up the representation
of an object. For an integer type, such a representation can occur only
if the type has padding bits. The IA64 NaT bit is not part of the
representation; it's neither a value bit nor a padding bit.
For a 64-bit integer type, given CHAR_BIT==8, its *representation* is
defined as a set of 8 bytes that can be copied into an object of type
`unsigned char[8]`. The NaT bit does not contribute to the size of the
object.
I think the right way for C to permit NaT-like bits is, as Kaz
suggested, to define "indeterminate value" in terms of provenance,
not just the bits that make up its current representation.
An automatic object with no initialization, or a malloc()ed object,
starts with an indeterminate value, and accessing that value
(other than as an array of characters) has undefined behavior.
(This is a proposal, not what the standard currently says.)
IA64 happens to have a way of (partially) representing that
provenance in hardware, outside the object in question. Other or
future architectures might do a more complete job.
[...]
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2023-07-21 23:39 +0100 |
| Message-ID | <874jlwopn5.fsf@bsb.me.uk> |
| In reply to | #6511 |
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes: > Ben Bacarisse <ben.usenet@bsb.me.uk> writes: >> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes: > [...] >> There are three relevant clauses in Annex J, and I think we should keep >> them all in mind. Sadly, they are not numbered (until C23) so I've >> given then 'UB' numbers taken from the similar wording in C23. >> >> — The value of an object with automatic storage duration is used while >> it is indeterminate (6.2.4, 6.7.9, 6.8). [UB-11] >> >> — A trap representation is read by an lvalue expression that does not >> have character type (6.2.6.1). [UB-12] >> >> — An lvalue designating an object of automatic storage duration that >> could have been declared with the register storage class is used in >> a context that requires the value of the designated object, but the >> object is uninitialized. (6.3.2.1). [UB-20] > [...] >>> An object with automatic storage duration might be stored in an IA64 >>> CPU register. If the object is not initialized, the register's >>> NaT bit would be set. Any attempt to read it would cause a trap. >>> Writing it would clear the NaT bit. >>> >>> Which means that a hypothetical CPU with something like a NaT bit >>> on each word of memory (iAPX 432? i960?) might cause a trap in >>> circumstances not covered by that wording -- but it *is* covered >>> by the wording in Annex J. >> >> It's covered by UB-12 and that's backed up by normative text, >> specifically paragraph 5 of the section cited in UB-12. > > I don't think so. A "non-value representation" (formerly a "trap > representation") is determined by the bits making up the representation > of an object. For an integer type, such a representation can occur only > if the type has padding bits. The IA64 NaT bit is not part of the > representation; it's neither a value bit nor a padding bit. > > For a 64-bit integer type, given CHAR_BIT==8, its *representation* is > defined as a set of 8 bytes that can be copied into an object of type > `unsigned char[8]`. The NaT bit does not contribute to the size of the > object. Ah, right. I thought you were including it as a padding bit. > I think the right way for C to permit NaT-like bits is, as Kaz > suggested, to define "indeterminate value" in terms of provenance, > not just the bits that make up its current representation. > An automatic object with no initialization, or a malloc()ed object, > starts with an indeterminate value, and accessing that value > (other than as an array of characters) has undefined behavior. > (This is a proposal, not what the standard currently says.) > IA64 happens to have a way of (partially) representing that > provenance in hardware, outside the object in question. Other or > future architectures might do a more complete job. > > [...] That would work. -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2023-08-12 17:00 -0700 |
| Message-ID | <86a5uv95g7.fsf@linuxsc.com> |
| In reply to | #6511 |
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes: > I think the right way for C to permit NaT-like bits is, as Kaz > suggested, to define "indeterminate value" in terms of provenance, > not just the bits that make up its current representation. [...] This idea is fundamentally wrong. NaT bits are associated with particular areas of memory, which is to say objects. The point of provenance is that non-viability is associated with /values/, not with objects. Once an area of memory acquires an object representation, the NaT bit or NaT bits for that memory are set to zero, end of story. Note also that NaT bits are independent of what type is used to access an object - if the NaT bit is set then any access is illegal, no matter what type is used to do the access. By contrast, provenance is used in situations where non-viability is associated with values, not with objects. But values are always type dependent; a pointer object that holds a value that has been passed to free() is "indeterminate" when accessed as a pointer type, but perfectly okay to access as an unsigned char type. The two kinds of situations are essentially different, and the theoretical models used to characterize the rules in the two kinds of situations should therefore be correspondingly essentially different.
[toc] | [prev] | [next] | [standalone]
| From | Martin Uecker <ma.uecker@gmail.com> |
|---|---|
| Date | 2023-08-13 23:41 -0700 |
| Message-ID | <fcb2be8f-b346-421f-9804-5f94c93266b0n@googlegroups.com> |
| In reply to | #6529 |
On Sunday, August 13, 2023 at 2:00:45 AM UTC+2, Tim Rentsch wrote: > Keith Thompson <Keith.S.T...@gmail.com> writes: > > > I think the right way for C to permit NaT-like bits is, as Kaz > > suggested, to define "indeterminate value" in terms of provenance, > > not just the bits that make up its current representation. [...] > > This idea is fundamentally wrong. NaT bits are associated with > particular areas of memory, which is to say objects. The point > of provenance is that non-viability is associated with /values/, > not with objects. Once an area of memory acquires an object > representation, the NaT bit or NaT bits for that memory are set > to zero, end of story. Note also that NaT bits are independent > of what type is used to access an object - if the NaT bit is set > then any access is illegal, no matter what type is used to do the > access. By contrast, provenance is used in situations where > non-viability is associated with values, not with objects. But > values are always type dependent; a pointer object that holds > a value that has been passed to free() is "indeterminate" when > accessed as a pointer type, but perfectly okay to access as an > unsigned char type. The two kinds of situations are essentially > different, and the theoretical models used to characterize the > rules in the two kinds of situations should therefore be > correspondingly essentially different. One could still consider the idea that "indeterminate" is an abstract property that yields UB during read even for types that do not have trap representations. There is no wording in the C standard to support this, but I would not call this idea "fundamentally wrong". You are right that this is different to provenance provenance which is about values. What it would have in common with pointer provenance is that there is hidden state in the abstract machine associated with memory that is not part of the representation. With effective types there is another example of this. Martin
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2023-08-15 21:06 -0700 |
| Message-ID | <864jkz7hrm.fsf@linuxsc.com> |
| In reply to | #6532 |
Martin Uecker <ma.uecker@gmail.com> writes: > On Sunday, August 13, 2023 at 2:00:45?AM UTC+2, Tim Rentsch wrote: > >> Keith Thompson <Keith.S.T...@gmail.com> writes: >> >>> I think the right way for C to permit NaT-like bits is, as Kaz >>> suggested, to define "indeterminate value" in terms of provenance, >>> not just the bits that make up its current representation. [...] >> >> This idea is fundamentally wrong. NaT bits are associated with >> particular areas of memory, which is to say objects. The point >> of provenance is that non-viability is associated with /values/, >> not with objects. Once an area of memory acquires an object >> representation, the NaT bit or NaT bits for that memory are set >> to zero, end of story. Note also that NaT bits are independent >> of what type is used to access an object - if the NaT bit is set >> then any access is illegal, no matter what type is used to do the >> access. By contrast, provenance is used in situations where >> non-viability is associated with values, not with objects. But >> values are always type dependent; a pointer object that holds >> a value that has been passed to free() is "indeterminate" when >> accessed as a pointer type, but perfectly okay to access as an >> unsigned char type. The two kinds of situations are essentially >> different, and the theoretical models used to characterize the >> rules in the two kinds of situations should therefore be >> correspondingly essentially different. > > One could still consider the idea that "indeterminate" is an > abstract property that yields UB during read even for types > that do not have trap representations. There is no wording > in the C standard to support this, but I would not call this > idea "fundamentally wrong". You are right that this is different > to provenance provenance which is about values. What it would > have in common with pointer provenance is that there is hidden > state in the abstract machine associated with memory that > is not part of the representation. With effective types there > is another example of this. My preceding comments were meant to be only about NaT bits (or NaT-like bits) and provenance. There is an inherent mismatch between the two, as I have tried to explain. It is only the idea that provenence would provide a good foundation for defining the semantics of "NaT everywhere" that I am saying is fundamentally wrong. I understand that you want to consider a broader topic, and that, in the realm of that broader topic, something like provenance could have a role to play. I think it is worth responding to that thesis, and am expecting to do so in a separate reply (or new thread?) although probably not right away.
[toc] | [prev] | [next] | [standalone]
| From | Martin Uecker <ma.uecker@gmail.com> |
|---|---|
| Date | 2023-08-15 22:40 -0700 |
| Message-ID | <e043af84-3153-4097-9505-666869fcf727n@googlegroups.com> |
| In reply to | #6533 |
On Wednesday, August 16, 2023 at 6:06:43 AM UTC+2, Tim Rentsch wrote: > Martin Uecker <ma.u...@gmail.com> writes: > > On Sunday, August 13, 2023 at 2:00:45?AM UTC+2, Tim Rentsch wrote: > > > >> Keith Thompson <Keith.S.T...@gmail.com> writes: > >> > >>> I think the right way for C to permit NaT-like bits is, as Kaz > >>> suggested, to define "indeterminate value" in terms of provenance, > >>> not just the bits that make up its current representation. [...] > >> > >> This idea is fundamentally wrong. NaT bits are associated with > >> particular areas of memory, which is to say objects. The point > >> of provenance is that non-viability is associated with /values/, > >> not with objects. Once an area of memory acquires an object > >> representation, the NaT bit or NaT bits for that memory are set > >> to zero, end of story. Note also that NaT bits are independent > >> of what type is used to access an object - if the NaT bit is set > >> then any access is illegal, no matter what type is used to do the > >> access. By contrast, provenance is used in situations where > >> non-viability is associated with values, not with objects. But > >> values are always type dependent; a pointer object that holds > >> a value that has been passed to free() is "indeterminate" when > >> accessed as a pointer type, but perfectly okay to access as an > >> unsigned char type. The two kinds of situations are essentially > >> different, and the theoretical models used to characterize the > >> rules in the two kinds of situations should therefore be > >> correspondingly essentially different. > > > > One could still consider the idea that "indeterminate" is an > > abstract property that yields UB during read even for types > > that do not have trap representations. There is no wording > > in the C standard to support this, but I would not call this > > idea "fundamentally wrong". You are right that this is different > > to provenance provenance which is about values. What it would > > have in common with pointer provenance is that there is hidden > > state in the abstract machine associated with memory that > > is not part of the representation. With effective types there > > is another example of this. > My preceding comments were meant to be only about NaT bits (or > NaT-like bits) and provenance. There is an inherent mismatch > between the two, as I have tried to explain. It is only the idea > that provenence would provide a good foundation for defining the > semantics of "NaT everywhere" that I am saying is fundamentally > wrong. > > I understand that you want to consider a broader topic, and that, > in the realm of that broader topic, something like provenance > could have a role to play. I think it is worth responding to > that thesis, and am expecting to do so in a separate reply (or > new thread?) although probably not right away. I would love to hear your comments, because some people want to have such an abstract of "indeterminate" and some already believe that this is how the standard should be understood already today. Martin
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2023-08-16 23:13 -0700 |
| Message-ID | <867cpu5h8w.fsf@linuxsc.com> |
| In reply to | #6534 |
Martin Uecker <ma.uecker@gmail.com> writes: [some unrelated passages removed] > On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote: > >> Martin Uecker <ma.u...@gmail.com> writes: [...] >>> One could still consider the idea that "indeterminate" is an >>> abstract property that yields UB during read even for types >>> that do not have trap representations. There is no wording >>> in the C standard to support this, but I would not call this >>> idea "fundamentally wrong". You are right that this is different >>> to provenance provenance which is about values. What it would >>> have in common with pointer provenance is that there is hidden >>> state in the abstract machine associated with memory that >>> is not part of the representation. With effective types there >>> is another example of this. >> >> I understand that you want to consider a broader topic, and that, >> in the realm of that broader topic, something like provenance >> could have a role to play. I think it is worth responding to >> that thesis, and am expecting to do so in a separate reply (or >> new thread?) although probably not right away. > > I would love to hear your comments, because some people > want to have such an abstract of "indeterminate" and > some already believe that this is how the standard should > be understood already today. I've been thinking about this, and am close (I think) to having something to say in response. Before I do that, thought, let me ask this: what problem or problems are motivating the question? What problems do you (or "some people") want to solve? I don't want just examples here; I'm hoping to get a full list.
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <864-117-4973@kylheku.com> |
|---|---|
| Date | 2023-08-17 07:08 +0000 |
| Message-ID | <20230816235712.844@kylheku.com> |
| In reply to | #6541 |
On 2023-08-17, Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: > Martin Uecker <ma.uecker@gmail.com> writes: > > [some unrelated passages removed] > >> On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote: >> >>> Martin Uecker <ma.u...@gmail.com> writes: > > [...] > >>>> One could still consider the idea that "indeterminate" is an >>>> abstract property that yields UB during read even for types >>>> that do not have trap representations. There is no wording >>>> in the C standard to support this, but I would not call this >>>> idea "fundamentally wrong". You are right that this is different >>>> to provenance provenance which is about values. What it would >>>> have in common with pointer provenance is that there is hidden >>>> state in the abstract machine associated with memory that >>>> is not part of the representation. With effective types there >>>> is another example of this. >>> >>> I understand that you want to consider a broader topic, and that, >>> in the realm of that broader topic, something like provenance >>> could have a role to play. I think it is worth responding to >>> that thesis, and am expecting to do so in a separate reply (or >>> new thread?) although probably not right away. >> >> I would love to hear your comments, because some people >> want to have such an abstract of "indeterminate" and >> some already believe that this is how the standard should >> be understood already today. > > I've been thinking about this, and am close (I think) to having > something to say in response. Before I do that, thought, let me > ask this: what problem or problems are motivating the question? > What problems do you (or "some people") want to solve? I don't > want just examples here; I'm hoping to get a full list. I'm all about the diagnosis. Even on machines in which all representations are values, and therefore safe, a program whose external effect or output depends on unintialized data, and is therefore nondeterministic (a bad form of nondeterministic), is a repugnant program. I'd like to have clear rules which allow an implementation to to go great depths to diagnose all such situations, while remaining conforming. (The language agrees that those situations are erroneous, granting the tools license to diagnose.) At the same time, certain situations in which uninitialized data are used in ways that don't have a visible effect, would be nuisance if they generated diagnostics, the primary example being the copying of objects. I would like it so that memcpy isn't magic. I want it so that the programmer can write a bytewise memcpy which doesn't violate the rules even if it moves uninitialized data. I would like a model of uninitialized data which usefully lends itself to different depths with different trade-offs, like complexity of analysis and use of run-time resources. Limits should be imposed by implementations (what cases they want to diagnose) rather than by the model. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [next] | [standalone]
| From | Martin Uecker <ma.uecker@gmail.com> |
|---|---|
| Date | 2023-08-18 12:44 -0700 |
| Message-ID | <d6d5f930-1943-424f-a572-7d62cfd2bda0n@googlegroups.com> |
| In reply to | #6542 |
On Thursday, August 17, 2023 at 9:08:48 AM UTC+2, Kaz Kylheku wrote: > On 2023-08-17, Tim Rentsch <tr.1...@z991.linuxsc.com> wrote: > > Martin Uecker <ma.u...@gmail.com> writes: > > > > [some unrelated passages removed] > > > >> On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote: > >> > >>> Martin Uecker <ma.u...@gmail.com> writes: > > > > [...] > > > >>>> One could still consider the idea that "indeterminate" is an > >>>> abstract property that yields UB during read even for types > >>>> that do not have trap representations. There is no wording > >>>> in the C standard to support this, but I would not call this > >>>> idea "fundamentally wrong". You are right that this is different > >>>> to provenance provenance which is about values. What it would > >>>> have in common with pointer provenance is that there is hidden > >>>> state in the abstract machine associated with memory that > >>>> is not part of the representation. With effective types there > >>>> is another example of this. > >>> > >>> I understand that you want to consider a broader topic, and that, > >>> in the realm of that broader topic, something like provenance > >>> could have a role to play. I think it is worth responding to > >>> that thesis, and am expecting to do so in a separate reply (or > >>> new thread?) although probably not right away. > >> > >> I would love to hear your comments, because some people > >> want to have such an abstract of "indeterminate" and > >> some already believe that this is how the standard should > >> be understood already today. > > > > I've been thinking about this, and am close (I think) to having > > something to say in response. Before I do that, thought, let me > > ask this: what problem or problems are motivating the question? > > What problems do you (or "some people") want to solve? I don't > > want just examples here; I'm hoping to get a full list. > I'm all about the diagnosis. Even on machines in which all > representations are values, and therefore safe, I do not agree with the idea that "absence of UB = safe ". > a program whose external > effect or output depends on unintialized data, and is therefore > nondeterministic (a bad form of nondeterministic), is a repugnant > program. I would expect a debugger to output the memory as it seen by the CPU. But yes, it would not be a strictly conforming program. > I'd like to have clear rules which allow an implementation to > to go great depths to diagnose all such situations, while > remaining conforming. (The language agrees that those situations > are erroneous, granting the tools license to diagnose.) An implementation does not need a license from the standard to diagnose anything. I can already diagnose whatever seems useful and this does not affect conformance at all. But it becomes easier to usefully diagnose behavior which is undefined, because then one can expect that in portable C it is not used intentionally. > At the same time, certain situations in which uninitialized data are > used in ways that don't have a visible effect, would be nuisance if they > generated diagnostics, the primary example being the copying of objects. > I would like it so that memcpy isn't magic. I want it so that the > programmer can write a bytewise memcpy which doesn't violate the > rules even if it moves uninitialized data. Yes, I think for C this is rather important. > I would like a model of uninitialized data which usefully lends itself > to different depths with different trade-offs, like complexity of > analysis and use of run-time resources. Limits should be imposed by > implementations (what cases they want to diagnose) rather than by the > model. Tools can already do complex analysis and track down use of uninitialized variables. But with respect to conformance, I think the current standard has very good rules: memcpy/memcmp and similar code works as expected. Locally, where a compiler can be expected to give good diagnostics via static analysis the use of uninitialized variables is UB. But this does not spread via pointers elsewhere, where useful diagnostics are unlikely and optimizer induced problems based on UB might be far more difficult to debug. Martin
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <864-117-4973@kylheku.com> |
|---|---|
| Date | 2023-08-19 05:04 +0000 |
| Message-ID | <20230818215322.47@kylheku.com> |
| In reply to | #6546 |
On 2023-08-18, Martin Uecker <ma.uecker@gmail.com> wrote: > On Thursday, August 17, 2023 at 9:08:48 AM UTC+2, Kaz Kylheku wrote: > An implementation does not need a license from the standard > to diagnose anything. I can already diagnose whatever seems > useful and this does not affect conformance at all. That's true about diagnostics at translation time. It's not clear about that happen at run time and indistinguishable from the program's output on stdout or stderr. Also, it might be desirable for it to be conforming to terminate the program if it has run afoul of the rules. >> I would like a model of uninitialized data which usefully lends itself >> to different depths with different trade-offs, like complexity of >> analysis and use of run-time resources. Limits should be imposed by >> implementations (what cases they want to diagnose) rather than by the >> model. > > Tools can already do complex analysis and track down use of > uninitialized variables. But with respect to conformance, I think > the current standard has very good rules: memcpy/memcmp > and similar code works as expected. Locally, where a compiler > can be expected to give good diagnostics via static analysis > the use of uninitialized variables is UB. But this does not > spread via pointers elsewhere, where useful diagnostics > are unlikely and optimizer induced problems based on UB > might be far more difficult to debug. Dynamic instrumentation and tracking makes that possible for that information to follow pointer data flows, globally in the program. E.g. under the Valgrind tool, if one module passes an unitialized object into another, and that other one relies on it to make a conditional branch, it will be diagnosed. You can get the backtrace of where that object was created as well as where the use took place. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [next] | [standalone]
| From | Martin Uecker <ma.uecker@gmail.com> |
|---|---|
| Date | 2023-08-19 01:36 -0700 |
| Message-ID | <137bed86-8fd4-42d5-aaf0-96ccce615376n@googlegroups.com> |
| In reply to | #6549 |
On Saturday, August 19, 2023 at 7:04:10 AM UTC+2, Kaz Kylheku wrote: > On 2023-08-18, Martin Uecker <ma.u...@gmail.com> wrote: > > On Thursday, August 17, 2023 at 9:08:48 AM UTC+2, Kaz Kylheku wrote: > > An implementation does not need a license from the standard > > to diagnose anything. I can already diagnose whatever seems > > useful and this does not affect conformance at all. > That's true about diagnostics at translation time. It's not clear > about that happen at run time and indistinguishable from the > program's output on stdout or stderr. The observable behavior has to stay the same, so yes, it could not output to stdout or stderr. But there is nothing stopping it to log debugging information somewhere else, where it could be accessed. > Also, it might be desirable for it to be conforming to terminate the > program if it has run afoul of the rules. Yes, this is one main reason to make certain things UB. But then it can have false positives and needs to be backward compatible, which limits what is possible. > >> I would like a model of uninitialized data which usefully lends itself > >> to different depths with different trade-offs, like complexity of > >> analysis and use of run-time resources. Limits should be imposed by > >> implementations (what cases they want to diagnose) rather than by the > >> model. > > > > Tools can already do complex analysis and track down use of > > uninitialized variables. But with respect to conformance, I think > > the current standard has very good rules: memcpy/memcmp > > and similar code works as expected. Locally, where a compiler > > can be expected to give good diagnostics via static analysis > > the use of uninitialized variables is UB. But this does not > > spread via pointers elsewhere, where useful diagnostics > > are unlikely and optimizer induced problems based on UB > > might be far more difficult to debug. > Dynamic instrumentation and tracking makes that possible > for that information to follow pointer data flows, globally > in the program. > > E.g. under the Valgrind tool, if one module passes an unitialized > object into another, and that other one relies on it to make > a conditional branch, it will be diagnosed. You can get the > backtrace of where that object was created as well as where > the use took place. And valgrind exists and is a useful tool (I use it myself) despite not everything it diagnoses is UB. But it also has false positives, so using the same rules for deciding what should be UB in the standard as valgrind uses seems difficult. Also note that of the output of a program relies on unspecified values, then it is already not strictly conforming even when the behavior itself is not undefined. So if an implementation is smart enough to see this, it could already reject the program. Making already the use of unspecified values in conditional branches be UB seems problematic. E.g. you could not compute a hash over data structures with padding and then compare it later to see whether something has changed (taking into account false positives). This seems similar to memcpy / memcmp but involved conditions, and such techniques would become non-conforming. Martin
[toc] | [prev] | [next] | [standalone]
| From | Richard Damon <Richard@Damon-Family.org> |
|---|---|
| Date | 2023-08-19 09:18 -0400 |
| Message-ID | <ui3EM.518408$TCKc.407024@fx13.iad> |
| In reply to | #6552 |
On 8/19/23 4:36 AM, Martin Uecker wrote: > On Saturday, August 19, 2023 at 7:04:10 AM UTC+2, Kaz Kylheku wrote: >> On 2023-08-18, Martin Uecker <ma.u...@gmail.com> wrote: >>> On Thursday, August 17, 2023 at 9:08:48 AM UTC+2, Kaz Kylheku wrote: >>> An implementation does not need a license from the standard >>> to diagnose anything. I can already diagnose whatever seems >>> useful and this does not affect conformance at all. >> That's true about diagnostics at translation time. It's not clear >> about that happen at run time and indistinguishable from the >> program's output on stdout or stderr. > > The observable behavior has to stay the same, so yes, it could > not output to stdout or stderr. But there is nothing stopping it > to log debugging information somewhere else, where it could > be accessed. > >> Also, it might be desirable for it to be conforming to terminate the >> program if it has run afoul of the rules. > > Yes, this is one main reason to make certain things UB. But > then it can have false positives and needs to be backward > compatible, which limits what is possible. > >>>> I would like a model of uninitialized data which usefully lends itself >>>> to different depths with different trade-offs, like complexity of >>>> analysis and use of run-time resources. Limits should be imposed by >>>> implementations (what cases they want to diagnose) rather than by the >>>> model. >>> >>> Tools can already do complex analysis and track down use of >>> uninitialized variables. But with respect to conformance, I think >>> the current standard has very good rules: memcpy/memcmp >>> and similar code works as expected. Locally, where a compiler >>> can be expected to give good diagnostics via static analysis >>> the use of uninitialized variables is UB. But this does not >>> spread via pointers elsewhere, where useful diagnostics >>> are unlikely and optimizer induced problems based on UB >>> might be far more difficult to debug. >> Dynamic instrumentation and tracking makes that possible >> for that information to follow pointer data flows, globally >> in the program. >> >> E.g. under the Valgrind tool, if one module passes an unitialized >> object into another, and that other one relies on it to make >> a conditional branch, it will be diagnosed. You can get the >> backtrace of where that object was created as well as where >> the use took place. > > And valgrind exists and is a useful tool (I use it myself) > despite not everything it diagnoses is UB. But it also has > false positives, so using the same rules for deciding what > should be UB in the standard as valgrind uses seems difficult. > > Also note that of the output of a program relies on > unspecified values, then it is already not strictly conforming > even when the behavior itself is not undefined. So if an > implementation is smart enough to see this, it could already > reject the program. > > Making already the use of unspecified values in conditional > branches be UB seems problematic. E.g. you could not > compute a hash over data structures with padding and > then compare it later to see whether something has > changed (taking into account false positives). This seems > similar to memcpy / memcmp but involved conditions, > and such techniques would become non-conforming. > > Martin My understanding is that there is no requirement that the values of the padding bytes remains constant over time. I can't imagine a case where they will just change at an arbitrary time, but setting a member of the structure to a value (even if it is the same value it had) might easily affect the value of the padding bytes, so the hash changes.
[toc] | [prev] | [next] | [standalone]
| From | Martin Uecker <ma.uecker@gmail.com> |
|---|---|
| Date | 2023-08-19 11:12 -0700 |
| Message-ID | <4a526949-06dd-404d-a299-cb30953e7a5fn@googlegroups.com> |
| In reply to | #6553 |
On Saturday, August 19, 2023 at 3:18:22 PM UTC+2, Richard Damon wrote: > On 8/19/23 4:36 AM, Martin Uecker wrote: > > On Saturday, August 19, 2023 at 7:04:10 AM UTC+2, Kaz Kylheku wrote: > >> On 2023-08-18, Martin Uecker <ma.u...@gmail.com> wrote: > >>> On Thursday, August 17, 2023 at 9:08:48 AM UTC+2, Kaz Kylheku wrote: > >>> An implementation does not need a license from the standard > >>> to diagnose anything. I can already diagnose whatever seems > >>> useful and this does not affect conformance at all. > >> That's true about diagnostics at translation time. It's not clear > >> about that happen at run time and indistinguishable from the > >> program's output on stdout or stderr. > > > > The observable behavior has to stay the same, so yes, it could > > not output to stdout or stderr. But there is nothing stopping it > > to log debugging information somewhere else, where it could > > be accessed. > > > >> Also, it might be desirable for it to be conforming to terminate the > >> program if it has run afoul of the rules. > > > > Yes, this is one main reason to make certain things UB. But > > then it can have false positives and needs to be backward > > compatible, which limits what is possible. > > > >>>> I would like a model of uninitialized data which usefully lends itself > >>>> to different depths with different trade-offs, like complexity of > >>>> analysis and use of run-time resources. Limits should be imposed by > >>>> implementations (what cases they want to diagnose) rather than by the > >>>> model. > >>> > >>> Tools can already do complex analysis and track down use of > >>> uninitialized variables. But with respect to conformance, I think > >>> the current standard has very good rules: memcpy/memcmp > >>> and similar code works as expected. Locally, where a compiler > >>> can be expected to give good diagnostics via static analysis > >>> the use of uninitialized variables is UB. But this does not > >>> spread via pointers elsewhere, where useful diagnostics > >>> are unlikely and optimizer induced problems based on UB > >>> might be far more difficult to debug. > >> Dynamic instrumentation and tracking makes that possible > >> for that information to follow pointer data flows, globally > >> in the program. > >> > >> E.g. under the Valgrind tool, if one module passes an unitialized > >> object into another, and that other one relies on it to make > >> a conditional branch, it will be diagnosed. You can get the > >> backtrace of where that object was created as well as where > >> the use took place. > > > > And valgrind exists and is a useful tool (I use it myself) > > despite not everything it diagnoses is UB. But it also has > > false positives, so using the same rules for deciding what > > should be UB in the standard as valgrind uses seems difficult. > > > > Also note that of the output of a program relies on > > unspecified values, then it is already not strictly conforming > > even when the behavior itself is not undefined. So if an > > implementation is smart enough to see this, it could already > > reject the program. > > > > Making already the use of unspecified values in conditional > > branches be UB seems problematic. E.g. you could not > > compute a hash over data structures with padding and > > then compare it later to see whether something has > > changed (taking into account false positives). This seems > > similar to memcpy / memcmp but involved conditions, > > and such techniques would become non-conforming. > > > > Martin > My understanding is that there is no requirement that the values of the > padding bytes remains constant over time. The C standard specifies when they can change: "When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values" > I can't imagine a case where > they will just change at an arbitrary time, but setting a member of the > structure to a value (even if it is the same value it had) might easily > affect the value of the padding bytes, so the hash changes. Sure, writing to object may change the padding and then the hash changes. This is why I mentioned false positives. Martin
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2023-08-18 20:20 -0700 |
| Message-ID | <86wmxr4t22.fsf@linuxsc.com> |
| In reply to | #6542 |
Kaz Kylheku <864-117-4973@kylheku.com> writes: > I'm all about the diagnosis. Even on machines in which all > representations are values, and therefore safe, a program whose > external effect or output depends on unintialized data, and is > therefore nondeterministic (a bad form of nondeterministic), is a > repugnant program. > > I'd like to have clear rules which allow an implementation to to > go great depths to diagnose all such situations, while remaining > conforming. (The language agrees that those situations are > erroneous, granting the tools license to diagnose.) The C standard allows compilers to do whatever analysis they want and to issue diagnostics for whatever conditions or circumstances they choose. What you want is orthogonal to what is being discussed.
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <864-117-4973@kylheku.com> |
|---|---|
| Date | 2023-08-19 05:23 +0000 |
| Message-ID | <20230818220442.950@kylheku.com> |
| In reply to | #6548 |
On 2023-08-19, Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: > Kaz Kylheku <864-117-4973@kylheku.com> writes: > >> I'm all about the diagnosis. Even on machines in which all >> representations are values, and therefore safe, a program whose >> external effect or output depends on unintialized data, and is >> therefore nondeterministic (a bad form of nondeterministic), is a >> repugnant program. >> >> I'd like to have clear rules which allow an implementation to to >> go great depths to diagnose all such situations, while remaining >> conforming. (The language agrees that those situations are >> erroneous, granting the tools license to diagnose.) > > The C standard allows compilers to do whatever analysis they > want and to issue diagnostics for whatever conditions or > circumstances they choose. And stop translating? If some use of an uninitialized object isn't undefined, and you make the diagnostic a fatal error, then you don't have a conforming compiler at that point. > What you want is orthogonal to what is being discussed. I'm mainly concerned about run-time. If the program hasn't invoked undefined behavior, I don't thinkk it's conforming to inject gratuitous diagnostics into the program's run-time, such that they appear as if they were its output on stderr or stdout. Those diagnostics have to go to some special debug port. Also, not conforming to arbitrarily terminate the program. (Other than in some weasly language lawyering way, by declaring that it has exceeded an implementation limit or something.) -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2023-08-18 22:56 -0700 |
| Message-ID | <86sf8f4lt9.fsf@linuxsc.com> |
| In reply to | #6550 |
Kaz Kylheku <864-117-4973@kylheku.com> writes: > On 2023-08-19, Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: [...] >> The C standard allows compilers to do whatever analysis they >> want and to issue diagnostics for whatever conditions or >> circumstances they choose. > > And stop translating? If some use of an uninitialized object > isn't undefined, and you make the diagnostic a fatal error, > then you don't have a conforming compiler at that point. > > [also] > > If the program hasn't invoked undefined behavior, I don't thinkk > it's conforming to inject gratuitous diagnostics [..or..] > to arbitrarily terminate the program. [...] You need to learn how to say what you mean. Your earlier posting didn't say anything about failing to compile or altering program behavior. If you can't learn how to say what you mean then there is roughly a 1e-29 percent chance that you'll get what you want.
[toc] | [prev] | [next] | [standalone]
Page 1 of 3 [1] 2 3 Next page →
Back to top | Article view | comp.std.c
csiph-web