Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.std.c > #6505 > unrolled thread

Does reading an uninitialized object have undefined behavior?

Started byKeith Thompson <Keith.S.Thompson+u@gmail.com>
First post2023-07-20 22:16 -0700
Last post2023-08-16 21:08 +0000
Articles 20 on this page of 48 — 8 participants

Back to article view | Back to comp.std.c


Contents

  Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-07-20 22:16 -0700
    Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-07-21 16:33 +0100
      Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-07-21 11:56 -0700
        Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-07-21 20:54 +0100
          Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-07-21 14:26 -0700
            Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-07-21 23:39 +0100
            Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-12 17:00 -0700
              Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-13 23:41 -0700
                Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-15 21:06 -0700
                  Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-15 22:40 -0700
                    Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 23:13 -0700
                      Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-17 07:08 +0000
                        Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-18 12:44 -0700
                          Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-19 05:04 +0000
                            Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-19 01:36 -0700
                              Re: Does reading an uninitialized object have undefined behavior? Richard Damon <Richard@Damon-Family.org> - 2023-08-19 09:18 -0400
                                Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-19 11:12 -0700
                        Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-18 20:20 -0700
                          Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-19 05:23 +0000
                            Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-18 22:56 -0700
                      Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-18 12:52 -0700
                        Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-26 19:25 -0700
                          Re: Does reading an uninitialized object have undefined behavior? Spiros Bousbouras <spibou@gmail.com> - 2023-08-27 08:31 +0000
                            Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-29 04:35 -0700
                              Re: Does reading an uninitialized object have undefined behavior? Spiros Bousbouras <spibou@gmail.com> - 2023-08-30 19:53 +0000
                                Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-30 17:40 -0700
                                  Re: Does reading an uninitialized object have undefined behavior? Spiros Bousbouras <spibou@gmail.com> - 2023-08-31 18:18 +0000
                                    Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-05 05:39 -0700
                        Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-09-05 17:03 -0700
                          Re: Does reading an uninitialized object have undefined behavior? Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2023-09-07 17:09 +0200
                            Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-09-07 17:19 +0100
                              Re: Does reading an uninitialized object have undefined behavior? Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2023-09-08 23:12 +0200
                                Re: Does reading an uninitialized object have undefined behavior? Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-09-08 22:31 +0100
      Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-07-22 06:40 +0000
        Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-07-22 06:03 -0700
          Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-07-25 21:53 -0700
        Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 11:11 -0700
    Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-07-21 17:42 +0000
      Re: Does reading an uninitialized object have undefined behavior? Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2023-07-24 07:53 +0200
        Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-07-25 21:57 -0700
    Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-03 13:13 -0700
      Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-03 15:20 -0700
        Re: Does reading an uninitialized object have undefined behavior? Martin Uecker <ma.uecker@gmail.com> - 2023-08-05 01:15 -0700
        Re: Does reading an uninitialized object have undefined behavior? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-08-16 09:19 -0700
        Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-16 19:51 +0000
        Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-16 20:03 +0000
          Re: Does reading an uninitialized object have undefined behavior? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2023-08-16 13:43 -0700
            Re: Does reading an uninitialized object have undefined behavior? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-08-16 21:08 +0000

Page 1 of 3  [1] 2 3  Next page →


#6505 — Does reading an uninitialized object have undefined behavior?

FromKeith Thompson <Keith.S.Thompson+u@gmail.com>
Date2023-07-20 22:16 -0700
SubjectDoes reading an uninitialized object have undefined behavior?
Message-ID<87zg3pq1ym.fsf@nosuchdomain.example.com>
N3096 is the last public draft of the upcoming C23 standard.

N3096 J.2 says:

    The behavior is undefined in the following circumstances:
    [...]
    (11) The value of an object with automatic storage duration is
         used while the object has an indeterminate representation
         (6.2.4, 6.7.10, 6.8).

I'll use an `int` object in my example.

Reading an object that holds a non-value representation has undefined
behavior, but not all integer types have non-value representations
-- and if an implementation has certain characteristics, we can
reliably infer that int has no non-value representations (called
"trap representations" in C99, C11, and C17).

Consider this program:
```
#include <limits.h>
int main(void) {
    int foo;
    if (sizeof (int) == 4 &&
        CHAR_BIT == 8 &&
        INT_MAX == 2147483647 &&
        INT_MIN == -INT_MAX-1)
    {
        int bar = foo;
    }
}
```

If the condition is true (as it is for many real-world
implementations), then int has no padding bits and no trap
representations.  The object `foo` has an indeterminate representation
when it's used to initialize `bar`.  Since it cannot have a non-value
representation, it has an unspecified value.

If J.2(11) is correct, then the use of the value results in undefined
behavior.

But Annex J is non-normative, and as far as I can tell there is no
normative text in the standard that says the behavior is undefined.

6.2.4 discusses storage duration.

6.7.10 discusses initialization; p11 implies that the representation of
`foo` is indeterminate.  It does not say 

6.8 discusses statements and blocks,  and repeats that "the
representation of objects without an initializer becomes
indeterminate".

None of these discuss what happens when the value of an object with
an indeterminate representation is used -- nor does any other text
I found by searching the standard for "indeterminate representation".

I see no relevant changes between C11 and C23 (except that C23 changes
the term "trap representation" to "non-value representation").

I suggest there are three possible resolutions:

1. J.2(11) is correct and I've missed something (always a possibility,
   but so far nobody in comp.lang.c has come up with anything).

2. J.2(11) reflects the intent, and normative text somewhere else
   in the standard needs to be updated or added to make it clear
   that using the value of an object with automatic storage duration
   while the object has an indeterminate representation has undefined
   behavior.

3. J.2(11) is incorrect and needs to be modified or deleted.
   (This would also imply that compilers may not perform certain
   optimizations.  I have no idea whether any compilers would actually
   be affected.)

I'm going to post this to comp.std.c and email it to the C23 editors.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [next] | [standalone]


#6507

FromBen Bacarisse <ben.usenet@bsb.me.uk>
Date2023-07-21 16:33 +0100
Message-ID<87zg3pnuse.fsf@bsb.me.uk>
In reply to#6505
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

> N3096 is the last public draft of the upcoming C23 standard.
>
> N3096 J.2 says:
>
>     The behavior is undefined in the following circumstances:
>     [...]
>     (11) The value of an object with automatic storage duration is
>          used while the object has an indeterminate representation
>          (6.2.4, 6.7.10, 6.8).
>
> I'll use an `int` object in my example.
>
> Reading an object that holds a non-value representation has undefined
> behavior, but not all integer types have non-value representations
> -- and if an implementation has certain characteristics, we can
> reliably infer that int has no non-value representations (called
> "trap representations" in C99, C11, and C17).
>
> Consider this program:
> ```
> #include <limits.h>
> int main(void) {
>     int foo;
>     if (sizeof (int) == 4 &&
>         CHAR_BIT == 8 &&
>         INT_MAX == 2147483647 &&
>         INT_MIN == -INT_MAX-1)
>     {
>         int bar = foo;
>     }
> }
> ```
>
> If the condition is true (as it is for many real-world
> implementations), then int has no padding bits and no trap
> representations.  The object `foo` has an indeterminate representation
> when it's used to initialize `bar`.  Since it cannot have a non-value
> representation, it has an unspecified value.
>
> If J.2(11) is correct, then the use of the value results in undefined
> behavior.
>
> But Annex J is non-normative, and as far as I can tell there is no
> normative text in the standard that says the behavior is undefined.

6.3.2.1 p2:

  "[...] If the lvalue designates an object of automatic storage
  duration that could have been declared with the register storage class
  (never had its address taken), and that object is uninitialized (not
  declared with an initializer and no assignment to it has been
  performed prior to use), the behavior is undefined."

seems to cover it.  The restriction on not having it's address taken
seems odd.

-- 
Ben.

[toc] | [prev] | [next] | [standalone]


#6509

FromKeith Thompson <Keith.S.Thompson+u@gmail.com>
Date2023-07-21 11:56 -0700
Message-ID<874jlxozzz.fsf@nosuchdomain.example.com>
In reply to#6507
Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
>> N3096 is the last public draft of the upcoming C23 standard.
>>
>> N3096 J.2 says:
>>
>>     The behavior is undefined in the following circumstances:
>>     [...]
>>     (11) The value of an object with automatic storage duration is
>>          used while the object has an indeterminate representation
>>          (6.2.4, 6.7.10, 6.8).
>>
>> I'll use an `int` object in my example.
>>
>> Reading an object that holds a non-value representation has undefined
>> behavior, but not all integer types have non-value representations
>> -- and if an implementation has certain characteristics, we can
>> reliably infer that int has no non-value representations (called
>> "trap representations" in C99, C11, and C17).
>>
>> Consider this program:
>> ```
>> #include <limits.h>
>> int main(void) {
>>     int foo;
>>     if (sizeof (int) == 4 &&
>>         CHAR_BIT == 8 &&
>>         INT_MAX == 2147483647 &&
>>         INT_MIN == -INT_MAX-1)
>>     {
>>         int bar = foo;
>>     }
>> }
>> ```
>>
>> If the condition is true (as it is for many real-world
>> implementations), then int has no padding bits and no trap
>> representations.  The object `foo` has an indeterminate representation
>> when it's used to initialize `bar`.  Since it cannot have a non-value
>> representation, it has an unspecified value.
>>
>> If J.2(11) is correct, then the use of the value results in undefined
>> behavior.
>>
>> But Annex J is non-normative, and as far as I can tell there is no
>> normative text in the standard that says the behavior is undefined.
>
> 6.3.2.1 p2:
>
>   "[...] If the lvalue designates an object of automatic storage
>   duration that could have been declared with the register storage class
>   (never had its address taken), and that object is uninitialized (not
>   declared with an initializer and no assignment to it has been
>   performed prior to use), the behavior is undefined."
>
> seems to cover it.  The restriction on not having it's address taken
> seems odd.

Good find.

That sentence was added in C11 (it doesn't appear in C99 or in
N1256, which consists of C99 plus the three Technical Corrigenda)
in response to DR #338.  Since the wording in Annex J goes back to
C99 in its current form, and to C90 in a slightly different form,
that can't be what Annex J is referring to.  And the statement
in Annex J is more general, so we can't quite use 6.3.2.1p2 as a
retroactive justification.

https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_338.htm

Yes, that restriction does seem strange.  It was inspired by the
IA64 (Itanium) architecture, which has an extra trap bit in each
CPU register (NaT, "not a thing").  The "could have been declared
with the register storage class" wording is there because the IA64
NaT bit exists only in CPU registers, not in memory.

An object with automatic storage duration might be stored in an IA64
CPU register.  If the object is not initialized, the register's
NaT bit would be set.  Any attempt to read it would cause a trap.
Writing it would clear the NaT bit.

Which means that a hypothetical CPU with something like a NaT bit
on each word of memory (iAPX 432? i960?) might cause a trap in
circumstances not covered by that wording -- but it *is* covered
by the wording in Annex J.

(Normally, an object whose address is taken can still be stored in
a CPU register for part of its lifetime.  The effect is to forbid
certain optimizations on I64-like systems.)

It's tempting to conclude that reading an uninitialized automatic
object whose address is taken is *not* undefined behavior
(https://en.wikipedia.org/wiki/Exception_that_proves_the_rule),
but the standard doesn't say so.

C90's Annex G (renamed to Annex J in later editions) says:

    The behavior in the following circumstances is undefined:
    [...]
    - The value of an uninitialized object that has automatic storage
      duration is used before a value is assigned (6.5.7).

6.5.7 discusses initialization, but doesn't say that reading an
uninitialized object has undefined behave, so the issue is an old one.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]


#6510

FromBen Bacarisse <ben.usenet@bsb.me.uk>
Date2023-07-21 20:54 +0100
Message-ID<87fs5hnipv.fsf@bsb.me.uk>
In reply to#6509
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

> Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
>> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
>>> N3096 is the last public draft of the upcoming C23 standard.
>>>
>>> N3096 J.2 says:
>>>
>>>     The behavior is undefined in the following circumstances:
>>>     [...]
>>>     (11) The value of an object with automatic storage duration is
>>>          used while the object has an indeterminate representation
>>>          (6.2.4, 6.7.10, 6.8).
>>>
>>> I'll use an `int` object in my example.
>>>
>>> Reading an object that holds a non-value representation has undefined
>>> behavior, but not all integer types have non-value representations
>>> -- and if an implementation has certain characteristics, we can
>>> reliably infer that int has no non-value representations (called
>>> "trap representations" in C99, C11, and C17).
>>>
>>> Consider this program:
>>> ```
>>> #include <limits.h>
>>> int main(void) {
>>>     int foo;
>>>     if (sizeof (int) == 4 &&
>>>         CHAR_BIT == 8 &&
>>>         INT_MAX == 2147483647 &&
>>>         INT_MIN == -INT_MAX-1)
>>>     {
>>>         int bar = foo;
>>>     }
>>> }
>>> ```
>>>
>>> If the condition is true (as it is for many real-world
>>> implementations), then int has no padding bits and no trap
>>> representations.  The object `foo` has an indeterminate representation
>>> when it's used to initialize `bar`.  Since it cannot have a non-value
>>> representation, it has an unspecified value.
>>>
>>> If J.2(11) is correct, then the use of the value results in undefined
>>> behavior.
>>>
>>> But Annex J is non-normative, and as far as I can tell there is no
>>> normative text in the standard that says the behavior is undefined.
>>
>> 6.3.2.1 p2:
>>
>>   "[...] If the lvalue designates an object of automatic storage
>>   duration that could have been declared with the register storage class
>>   (never had its address taken), and that object is uninitialized (not
>>   declared with an initializer and no assignment to it has been
>>   performed prior to use), the behavior is undefined."
>>
>> seems to cover it.  The restriction on not having it's address taken
>> seems odd.
>
> Good find.
>
> That sentence was added in C11 (it doesn't appear in C99 or in
> N1256, which consists of C99 plus the three Technical Corrigenda)
> in response to DR #338.  Since the wording in Annex J goes back to
> C99 in its current form, and to C90 in a slightly different form,
> that can't be what Annex J is referring to.  And the statement
> in Annex J is more general, so we can't quite use 6.3.2.1p2 as a
> retroactive justification.

Thanks for looking into the history.  I was going to do that when I had
some time.

There are three relevant clauses in Annex J, and I think we should keep
them all in mind.  Sadly, they are not numbered (until C23) so I've
given then 'UB' numbers taken from the similar wording in C23.

  — The value of an object with automatic storage duration is used while
    it is indeterminate (6.2.4, 6.7.9, 6.8).  [UB-11]

  — A trap representation is read by an lvalue expression that does not
    have character type (6.2.6.1).  [UB-12]

  — An lvalue designating an object of automatic storage duration that
    could have been declared with the register storage class is used in
    a context that requires the value of the designated object, but the
    object is uninitialized. (6.3.2.1).  [UB-20]

Clearly, UB-20 is explained by the quote I posted, but UB-11 (the one we
are talking about) is there as well and, as you say, can't be fully
explained by that normative quote.

> https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_338.htm
>
> Yes, that restriction does seem strange.  It was inspired by the
> IA64 (Itanium) architecture, which has an extra trap bit in each
> CPU register (NaT, "not a thing").  The "could have been declared
> with the register storage class" wording is there because the IA64
> NaT bit exists only in CPU registers, not in memory.

Thanks.  I wondered if might have been some hardware consideration...

> An object with automatic storage duration might be stored in an IA64
> CPU register.  If the object is not initialized, the register's
> NaT bit would be set.  Any attempt to read it would cause a trap.
> Writing it would clear the NaT bit.
>
> Which means that a hypothetical CPU with something like a NaT bit
> on each word of memory (iAPX 432? i960?) might cause a trap in
> circumstances not covered by that wording -- but it *is* covered
> by the wording in Annex J.

It's covered by UB-12 and that's backed up by normative text,
specifically paragraph 5 of the section cited in UB-12.

> (Normally, an object whose address is taken can still be stored in
> a CPU register for part of its lifetime.  The effect is to forbid
> certain optimizations on I64-like systems.)
>
> It's tempting to conclude that reading an uninitialized automatic
> object whose address is taken is *not* undefined behavior
> (https://en.wikipedia.org/wiki/Exception_that_proves_the_rule),
> but the standard doesn't say so.

But it doesn't say that it is UB either, does it?  That case is excluded
in 6.3.2.1 p2, but there's not else covering it but the non-normative
Annex J.

> C90's Annex G (renamed to Annex J in later editions) says:
>
>     The behavior in the following circumstances is undefined:
>     [...]
>     - The value of an uninitialized object that has automatic storage
>       duration is used before a value is assigned (6.5.7).
>
> 6.5.7 discusses initialization, but doesn't say that reading an
> uninitialized object has undefined behave, so the issue is an old one.

-- 
Ben.

[toc] | [prev] | [next] | [standalone]


#6511

FromKeith Thompson <Keith.S.Thompson+u@gmail.com>
Date2023-07-21 14:26 -0700
Message-ID<87a5vpnegz.fsf@nosuchdomain.example.com>
In reply to#6510
Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
[...]
> There are three relevant clauses in Annex J, and I think we should keep
> them all in mind.  Sadly, they are not numbered (until C23) so I've
> given then 'UB' numbers taken from the similar wording in C23.
>
>   — The value of an object with automatic storage duration is used while
>     it is indeterminate (6.2.4, 6.7.9, 6.8).  [UB-11]
>
>   — A trap representation is read by an lvalue expression that does not
>     have character type (6.2.6.1).  [UB-12]
>
>   — An lvalue designating an object of automatic storage duration that
>     could have been declared with the register storage class is used in
>     a context that requires the value of the designated object, but the
>     object is uninitialized. (6.3.2.1).  [UB-20]
[...]
>> An object with automatic storage duration might be stored in an IA64
>> CPU register.  If the object is not initialized, the register's
>> NaT bit would be set.  Any attempt to read it would cause a trap.
>> Writing it would clear the NaT bit.
>>
>> Which means that a hypothetical CPU with something like a NaT bit
>> on each word of memory (iAPX 432? i960?) might cause a trap in
>> circumstances not covered by that wording -- but it *is* covered
>> by the wording in Annex J.
>
> It's covered by UB-12 and that's backed up by normative text,
> specifically paragraph 5 of the section cited in UB-12.

I don't think so.  A "non-value representation" (formerly a "trap
representation") is determined by the bits making up the representation
of an object.  For an integer type, such a representation can occur only
if the type has padding bits.  The IA64 NaT bit is not part of the
representation; it's neither a value bit nor a padding bit.

For a 64-bit integer type, given CHAR_BIT==8, its *representation* is
defined as a set of 8 bytes that can be copied into an object of type
`unsigned char[8]`.  The NaT bit does not contribute to the size of the
object.

I think the right way for C to permit NaT-like bits is, as Kaz
suggested, to define "indeterminate value" in terms of provenance,
not just the bits that make up its current representation.
An automatic object with no initialization, or a malloc()ed object,
starts with an indeterminate value, and accessing that value
(other than as an array of characters) has undefined behavior.
(This is a proposal, not what the standard currently says.)
IA64 happens to have a way of (partially) representing that
provenance in hardware, outside the object in question.  Other or
future architectures might do a more complete job.

[...]

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]


#6512

FromBen Bacarisse <ben.usenet@bsb.me.uk>
Date2023-07-21 23:39 +0100
Message-ID<874jlwopn5.fsf@bsb.me.uk>
In reply to#6511
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

> Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
>> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
> [...]
>> There are three relevant clauses in Annex J, and I think we should keep
>> them all in mind.  Sadly, they are not numbered (until C23) so I've
>> given then 'UB' numbers taken from the similar wording in C23.
>>
>>   — The value of an object with automatic storage duration is used while
>>     it is indeterminate (6.2.4, 6.7.9, 6.8).  [UB-11]
>>
>>   — A trap representation is read by an lvalue expression that does not
>>     have character type (6.2.6.1).  [UB-12]
>>
>>   — An lvalue designating an object of automatic storage duration that
>>     could have been declared with the register storage class is used in
>>     a context that requires the value of the designated object, but the
>>     object is uninitialized. (6.3.2.1).  [UB-20]
> [...]
>>> An object with automatic storage duration might be stored in an IA64
>>> CPU register.  If the object is not initialized, the register's
>>> NaT bit would be set.  Any attempt to read it would cause a trap.
>>> Writing it would clear the NaT bit.
>>>
>>> Which means that a hypothetical CPU with something like a NaT bit
>>> on each word of memory (iAPX 432? i960?) might cause a trap in
>>> circumstances not covered by that wording -- but it *is* covered
>>> by the wording in Annex J.
>>
>> It's covered by UB-12 and that's backed up by normative text,
>> specifically paragraph 5 of the section cited in UB-12.
>
> I don't think so.  A "non-value representation" (formerly a "trap
> representation") is determined by the bits making up the representation
> of an object.  For an integer type, such a representation can occur only
> if the type has padding bits.  The IA64 NaT bit is not part of the
> representation; it's neither a value bit nor a padding bit.
>
> For a 64-bit integer type, given CHAR_BIT==8, its *representation* is
> defined as a set of 8 bytes that can be copied into an object of type
> `unsigned char[8]`.  The NaT bit does not contribute to the size of the
> object.

Ah, right.  I thought you were including it as a padding bit.

> I think the right way for C to permit NaT-like bits is, as Kaz
> suggested, to define "indeterminate value" in terms of provenance,
> not just the bits that make up its current representation.
> An automatic object with no initialization, or a malloc()ed object,
> starts with an indeterminate value, and accessing that value
> (other than as an array of characters) has undefined behavior.
> (This is a proposal, not what the standard currently says.)
> IA64 happens to have a way of (partially) representing that
> provenance in hardware, outside the object in question.  Other or
> future architectures might do a more complete job.
>
> [...]

That would work.

-- 
Ben.

[toc] | [prev] | [next] | [standalone]


#6529

FromTim Rentsch <tr.17687@z991.linuxsc.com>
Date2023-08-12 17:00 -0700
Message-ID<86a5uv95g7.fsf@linuxsc.com>
In reply to#6511
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

> I think the right way for C to permit NaT-like bits is, as Kaz
> suggested, to define "indeterminate value" in terms of provenance,
> not just the bits that make up its current representation.  [...]

This idea is fundamentally wrong.  NaT bits are associated with
particular areas of memory, which is to say objects.  The point
of provenance is that non-viability is associated with /values/,
not with objects.  Once an area of memory acquires an object
representation, the NaT bit or NaT bits for that memory are set
to zero, end of story.  Note also that NaT bits are independent
of what type is used to access an object - if the NaT bit is set
then any access is illegal, no matter what type is used to do the
access.  By contrast, provenance is used in situations where
non-viability is associated with values, not with objects.  But
values are always type dependent;  a pointer object that holds
a value that has been passed to free() is "indeterminate" when
accessed as a pointer type, but perfectly okay to access as an
unsigned char type.  The two kinds of situations are essentially
different, and the theoretical models used to characterize the
rules in the two kinds of situations should therefore be
correspondingly essentially different.

[toc] | [prev] | [next] | [standalone]


#6532

FromMartin Uecker <ma.uecker@gmail.com>
Date2023-08-13 23:41 -0700
Message-ID<fcb2be8f-b346-421f-9804-5f94c93266b0n@googlegroups.com>
In reply to#6529
On Sunday, August 13, 2023 at 2:00:45 AM UTC+2, Tim Rentsch wrote:
> Keith Thompson <Keith.S.T...@gmail.com> writes: 
> 
> > I think the right way for C to permit NaT-like bits is, as Kaz 
> > suggested, to define "indeterminate value" in terms of provenance,
> > not just the bits that make up its current representation. [...] 
> 
> This idea is fundamentally wrong. NaT bits are associated with 
> particular areas of memory, which is to say objects. The point 
> of provenance is that non-viability is associated with /values/, 
> not with objects. Once an area of memory acquires an object 
> representation, the NaT bit or NaT bits for that memory are set 
> to zero, end of story. Note also that NaT bits are independent 
> of what type is used to access an object - if the NaT bit is set 
> then any access is illegal, no matter what type is used to do the 
> access. By contrast, provenance is used in situations where 
> non-viability is associated with values, not with objects. But 
> values are always type dependent; a pointer object that holds 
> a value that has been passed to free() is "indeterminate" when 
> accessed as a pointer type, but perfectly okay to access as an 
> unsigned char type. The two kinds of situations are essentially 
> different, and the theoretical models used to characterize the 
> rules in the two kinds of situations should therefore be 
> correspondingly essentially different.

One could still consider the idea that "indeterminate" is an
abstract property that yields UB during read even for types
that do not have trap representations.   There is no wording
in the C standard to support this, but I would not call this
idea "fundamentally wrong".  You are right that this is different
to provenance provenance which is about values. What it would
have in common with pointer provenance is that there is hidden
state in the abstract machine associated with memory that
is not part of the representation.  With effective types there
is another example of this.

Martin


[toc] | [prev] | [next] | [standalone]


#6533

FromTim Rentsch <tr.17687@z991.linuxsc.com>
Date2023-08-15 21:06 -0700
Message-ID<864jkz7hrm.fsf@linuxsc.com>
In reply to#6532
Martin Uecker <ma.uecker@gmail.com> writes:

> On Sunday, August 13, 2023 at 2:00:45?AM UTC+2, Tim Rentsch wrote:
>
>> Keith Thompson <Keith.S.T...@gmail.com> writes:
>>
>>> I think the right way for C to permit NaT-like bits is, as Kaz
>>> suggested, to define "indeterminate value" in terms of provenance,
>>> not just the bits that make up its current representation.  [...]
>>
>> This idea is fundamentally wrong.  NaT bits are associated with
>> particular areas of memory, which is to say objects.  The point
>> of provenance is that non-viability is associated with /values/,
>> not with objects.  Once an area of memory acquires an object
>> representation, the NaT bit or NaT bits for that memory are set
>> to zero, end of story.  Note also that NaT bits are independent
>> of what type is used to access an object - if the NaT bit is set
>> then any access is illegal, no matter what type is used to do the
>> access.  By contrast, provenance is used in situations where
>> non-viability is associated with values, not with objects.  But
>> values are always type dependent;  a pointer object that holds
>> a value that has been passed to free() is "indeterminate" when
>> accessed as a pointer type, but perfectly okay to access as an
>> unsigned char type.  The two kinds of situations are essentially
>> different, and the theoretical models used to characterize the
>> rules in the two kinds of situations should therefore be
>> correspondingly essentially different.
>
> One could still consider the idea that "indeterminate" is an
> abstract property that yields UB during read even for types
> that do not have trap representations.   There is no wording
> in the C standard to support this, but I would not call this
> idea "fundamentally wrong".  You are right that this is different
> to provenance provenance which is about values.  What it would
> have in common with pointer provenance is that there is hidden
> state in the abstract machine associated with memory that
> is not part of the representation.  With effective types there
> is another example of this.

My preceding comments were meant to be only about NaT bits (or
NaT-like bits) and provenance.  There is an inherent mismatch
between the two, as I have tried to explain.  It is only the idea
that provenence would provide a good foundation for defining the
semantics of "NaT everywhere" that I am saying is fundamentally
wrong.

I understand that you want to consider a broader topic, and that,
in the realm of that broader topic, something like provenance
could have a role to play.  I think it is worth responding to
that thesis, and am expecting to do so in a separate reply (or
new thread?) although probably not right away.

[toc] | [prev] | [next] | [standalone]


#6534

FromMartin Uecker <ma.uecker@gmail.com>
Date2023-08-15 22:40 -0700
Message-ID<e043af84-3153-4097-9505-666869fcf727n@googlegroups.com>
In reply to#6533
On Wednesday, August 16, 2023 at 6:06:43 AM UTC+2, Tim Rentsch wrote:
> Martin Uecker <ma.u...@gmail.com> writes:
> > On Sunday, August 13, 2023 at 2:00:45?AM UTC+2, Tim Rentsch wrote: 
> > 
> >> Keith Thompson <Keith.S.T...@gmail.com> writes: 
> >> 
> >>> I think the right way for C to permit NaT-like bits is, as Kaz 
> >>> suggested, to define "indeterminate value" in terms of provenance, 
> >>> not just the bits that make up its current representation. [...] 
> >> 
> >> This idea is fundamentally wrong. NaT bits are associated with 
> >> particular areas of memory, which is to say objects. The point 
> >> of provenance is that non-viability is associated with /values/, 
> >> not with objects. Once an area of memory acquires an object 
> >> representation, the NaT bit or NaT bits for that memory are set 
> >> to zero, end of story. Note also that NaT bits are independent 
> >> of what type is used to access an object - if the NaT bit is set 
> >> then any access is illegal, no matter what type is used to do the 
> >> access. By contrast, provenance is used in situations where 
> >> non-viability is associated with values, not with objects. But 
> >> values are always type dependent; a pointer object that holds 
> >> a value that has been passed to free() is "indeterminate" when 
> >> accessed as a pointer type, but perfectly okay to access as an 
> >> unsigned char type. The two kinds of situations are essentially 
> >> different, and the theoretical models used to characterize the 
> >> rules in the two kinds of situations should therefore be 
> >> correspondingly essentially different. 
> > 
> > One could still consider the idea that "indeterminate" is an 
> > abstract property that yields UB during read even for types 
> > that do not have trap representations. There is no wording 
> > in the C standard to support this, but I would not call this 
> > idea "fundamentally wrong". You are right that this is different 
> > to provenance provenance which is about values. What it would 
> > have in common with pointer provenance is that there is hidden 
> > state in the abstract machine associated with memory that 
> > is not part of the representation. With effective types there 
> > is another example of this.
> My preceding comments were meant to be only about NaT bits (or 
> NaT-like bits) and provenance. There is an inherent mismatch 
> between the two, as I have tried to explain. It is only the idea 
> that provenence would provide a good foundation for defining the 
> semantics of "NaT everywhere" that I am saying is fundamentally 
> wrong. 
> 
> I understand that you want to consider a broader topic, and that, 
> in the realm of that broader topic, something like provenance 
> could have a role to play. I think it is worth responding to 
> that thesis, and am expecting to do so in a separate reply (or 
> new thread?) although probably not right away.

I would love to hear your comments, because some people
want to have such an abstract of "indeterminate" and
some already believe that this is how the standard should
be understood already today.

Martin


[toc] | [prev] | [next] | [standalone]


#6541

FromTim Rentsch <tr.17687@z991.linuxsc.com>
Date2023-08-16 23:13 -0700
Message-ID<867cpu5h8w.fsf@linuxsc.com>
In reply to#6534
Martin Uecker <ma.uecker@gmail.com> writes:

[some unrelated passages removed]

> On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote:
>
>> Martin Uecker <ma.u...@gmail.com> writes:

[...]

>>> One could still consider the idea that "indeterminate" is an
>>> abstract property that yields UB during read even for types
>>> that do not have trap representations.  There is no wording
>>> in the C standard to support this, but I would not call this
>>> idea "fundamentally wrong".  You are right that this is different
>>> to provenance provenance which is about values.  What it would
>>> have in common with pointer provenance is that there is hidden
>>> state in the abstract machine associated with memory that
>>> is not part of the representation.  With effective types there
>>> is another example of this.
>>
>> I understand that you want to consider a broader topic, and that,
>> in the realm of that broader topic, something like provenance
>> could have a role to play.  I think it is worth responding to
>> that thesis, and am expecting to do so in a separate reply (or
>> new thread?) although probably not right away.
>
> I would love to hear your comments, because some people
> want to have such an abstract of "indeterminate" and
> some already believe that this is how the standard should
> be understood already today.

I've been thinking about this, and am close (I think) to having
something to say in response.  Before I do that, thought, let me
ask this:  what problem or problems are motivating the question?
What problems do you (or "some people") want to solve?  I don't
want just examples here;  I'm hoping to get a full list.

[toc] | [prev] | [next] | [standalone]


#6542

FromKaz Kylheku <864-117-4973@kylheku.com>
Date2023-08-17 07:08 +0000
Message-ID<20230816235712.844@kylheku.com>
In reply to#6541
On 2023-08-17, Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
> Martin Uecker <ma.uecker@gmail.com> writes:
>
> [some unrelated passages removed]
>
>> On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote:
>>
>>> Martin Uecker <ma.u...@gmail.com> writes:
>
> [...]
>
>>>> One could still consider the idea that "indeterminate" is an
>>>> abstract property that yields UB during read even for types
>>>> that do not have trap representations.  There is no wording
>>>> in the C standard to support this, but I would not call this
>>>> idea "fundamentally wrong".  You are right that this is different
>>>> to provenance provenance which is about values.  What it would
>>>> have in common with pointer provenance is that there is hidden
>>>> state in the abstract machine associated with memory that
>>>> is not part of the representation.  With effective types there
>>>> is another example of this.
>>>
>>> I understand that you want to consider a broader topic, and that,
>>> in the realm of that broader topic, something like provenance
>>> could have a role to play.  I think it is worth responding to
>>> that thesis, and am expecting to do so in a separate reply (or
>>> new thread?) although probably not right away.
>>
>> I would love to hear your comments, because some people
>> want to have such an abstract of "indeterminate" and
>> some already believe that this is how the standard should
>> be understood already today.
>
> I've been thinking about this, and am close (I think) to having
> something to say in response.  Before I do that, thought, let me
> ask this:  what problem or problems are motivating the question?
> What problems do you (or "some people") want to solve?  I don't
> want just examples here;  I'm hoping to get a full list.

I'm all about the diagnosis. Even on machines in which all
representations are values, and therefore safe, a program whose external
effect or output depends on unintialized data, and is therefore
nondeterministic (a bad form of nondeterministic), is a repugnant
program.

I'd like to have clear rules which allow an implementation to
to go great depths to diagnose all such situations, while
remaining conforming. (The language agrees that those situations
are erroneous, granting the tools license to diagnose.)

At the same time, certain situations in which uninitialized data are
used in ways that don't have a visible effect, would be nuisance if they
generated diagnostics, the primary example being the copying of objects.
I would like it so that memcpy isn't magic. I want it so that the
programmer can write a bytewise memcpy which doesn't violate the
rules even if it moves uninitialized data.

I would like a model of uninitialized data which usefully lends itself
to different depths with different trade-offs, like complexity of
analysis and use of run-time resources. Limits should be imposed by
implementations (what cases they want to diagnose) rather than by the
model.


-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [prev] | [next] | [standalone]


#6546

FromMartin Uecker <ma.uecker@gmail.com>
Date2023-08-18 12:44 -0700
Message-ID<d6d5f930-1943-424f-a572-7d62cfd2bda0n@googlegroups.com>
In reply to#6542
On Thursday, August 17, 2023 at 9:08:48 AM UTC+2, Kaz Kylheku wrote:
> On 2023-08-17, Tim Rentsch <tr.1...@z991.linuxsc.com> wrote: 
> > Martin Uecker <ma.u...@gmail.com> writes: 
> > 
> > [some unrelated passages removed] 
> > 
> >> On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote: 
> >> 
> >>> Martin Uecker <ma.u...@gmail.com> writes: 
> > 
> > [...] 
> > 
> >>>> One could still consider the idea that "indeterminate" is an 
> >>>> abstract property that yields UB during read even for types 
> >>>> that do not have trap representations. There is no wording 
> >>>> in the C standard to support this, but I would not call this 
> >>>> idea "fundamentally wrong". You are right that this is different 
> >>>> to provenance provenance which is about values. What it would 
> >>>> have in common with pointer provenance is that there is hidden 
> >>>> state in the abstract machine associated with memory that 
> >>>> is not part of the representation. With effective types there 
> >>>> is another example of this. 
> >>> 
> >>> I understand that you want to consider a broader topic, and that, 
> >>> in the realm of that broader topic, something like provenance 
> >>> could have a role to play. I think it is worth responding to 
> >>> that thesis, and am expecting to do so in a separate reply (or 
> >>> new thread?) although probably not right away. 
> >> 
> >> I would love to hear your comments, because some people 
> >> want to have such an abstract of "indeterminate" and 
> >> some already believe that this is how the standard should 
> >> be understood already today. 
> > 
> > I've been thinking about this, and am close (I think) to having 
> > something to say in response. Before I do that, thought, let me 
> > ask this: what problem or problems are motivating the question? 
> > What problems do you (or "some people") want to solve? I don't 
> > want just examples here; I'm hoping to get a full list.
> I'm all about the diagnosis. Even on machines in which all 
> representations are values, and therefore safe, 

I do not agree with the idea that "absence of UB = safe ".

> a program whose external 
> effect or output depends on unintialized data, and is therefore 
> nondeterministic (a bad form of nondeterministic), is a repugnant 
> program. 

I would expect a debugger to output the memory as it seen 
by the CPU. But yes, it would not be a strictly conforming program.

> I'd like to have clear rules which allow an implementation to 
> to go great depths to diagnose all such situations, while 
> remaining conforming. (The language agrees that those situations 
> are erroneous, granting the tools license to diagnose.) 

An implementation does not need a license from the standard
to diagnose anything. I can already diagnose whatever seems
useful and this does not affect conformance at all. 

But it becomes easier to usefully diagnose behavior which is
undefined, because then one can expect that in portable C it
is not used intentionally.

> At the same time, certain situations in which uninitialized data are 
> used in ways that don't have a visible effect, would be nuisance if they 
> generated diagnostics, the primary example being the copying of objects. 
> I would like it so that memcpy isn't magic. I want it so that the 
> programmer can write a bytewise memcpy which doesn't violate the 
> rules even if it moves uninitialized data. 

Yes, I think for C this is rather important.

> I would like a model of uninitialized data which usefully lends itself 
> to different depths with different trade-offs, like complexity of 
> analysis and use of run-time resources. Limits should be imposed by 
> implementations (what cases they want to diagnose) rather than by the 
> model.

Tools can already do complex analysis and track down use of
uninitialized variables. But with respect to conformance, I think 
the current standard has very good rules:   memcpy/memcmp
and similar code works as expected. Locally,  where a compiler
can be expected to give good diagnostics via static analysis
the use of uninitialized variables is UB. But this does not
spread via pointers elsewhere, where useful diagnostics
are unlikely and optimizer induced problems based on UB
might be far more difficult to debug.

Martin



[toc] | [prev] | [next] | [standalone]


#6549

FromKaz Kylheku <864-117-4973@kylheku.com>
Date2023-08-19 05:04 +0000
Message-ID<20230818215322.47@kylheku.com>
In reply to#6546
On 2023-08-18, Martin Uecker <ma.uecker@gmail.com> wrote:
> On Thursday, August 17, 2023 at 9:08:48 AM UTC+2, Kaz Kylheku wrote:
> An implementation does not need a license from the standard
> to diagnose anything. I can already diagnose whatever seems
> useful and this does not affect conformance at all. 

That's true about diagnostics at translation time. It's not clear
about that happen at run time and indistinguishable from the
program's output on stdout or stderr.

Also, it might be desirable for it to be conforming to terminate the
program if it has run afoul of the rules.

>> I would like a model of uninitialized data which usefully lends itself 
>> to different depths with different trade-offs, like complexity of 
>> analysis and use of run-time resources. Limits should be imposed by 
>> implementations (what cases they want to diagnose) rather than by the 
>> model.
>
> Tools can already do complex analysis and track down use of
> uninitialized variables. But with respect to conformance, I think 
> the current standard has very good rules:   memcpy/memcmp
> and similar code works as expected. Locally,  where a compiler
> can be expected to give good diagnostics via static analysis
> the use of uninitialized variables is UB. But this does not
> spread via pointers elsewhere, where useful diagnostics
> are unlikely and optimizer induced problems based on UB
> might be far more difficult to debug.

Dynamic instrumentation and tracking makes that possible
for that information to follow pointer data flows, globally
in the program.

E.g. under the Valgrind tool, if one module passes an unitialized
object into another, and that other one relies on it to make
a conditional branch, it will be diagnosed. You can get the
backtrace of where that object was created as well as where
the use took place.

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [prev] | [next] | [standalone]


#6552

FromMartin Uecker <ma.uecker@gmail.com>
Date2023-08-19 01:36 -0700
Message-ID<137bed86-8fd4-42d5-aaf0-96ccce615376n@googlegroups.com>
In reply to#6549
On Saturday, August 19, 2023 at 7:04:10 AM UTC+2, Kaz Kylheku wrote:
> On 2023-08-18, Martin Uecker <ma.u...@gmail.com> wrote: 
> > On Thursday, August 17, 2023 at 9:08:48 AM UTC+2, Kaz Kylheku wrote: 
> > An implementation does not need a license from the standard 
> > to diagnose anything. I can already diagnose whatever seems 
> > useful and this does not affect conformance at all.
> That's true about diagnostics at translation time. It's not clear 
> about that happen at run time and indistinguishable from the 
> program's output on stdout or stderr. 

The observable behavior has to stay the same, so yes, it could
not output to stdout or stderr.  But there is nothing stopping it
to log debugging information somewhere else, where it could
be accessed. 

> Also, it might be desirable for it to be conforming to terminate the 
> program if it has run afoul of the rules.

Yes, this is one main reason to make certain things UB. But
then it can have false positives and needs to be backward
compatible, which limits what is possible.

> >> I would like a model of uninitialized data which usefully lends itself 
> >> to different depths with different trade-offs, like complexity of 
> >> analysis and use of run-time resources. Limits should be imposed by 
> >> implementations (what cases they want to diagnose) rather than by the 
> >> model. 
> > 
> > Tools can already do complex analysis and track down use of 
> > uninitialized variables. But with respect to conformance, I think 
> > the current standard has very good rules: memcpy/memcmp 
> > and similar code works as expected. Locally, where a compiler 
> > can be expected to give good diagnostics via static analysis 
> > the use of uninitialized variables is UB. But this does not 
> > spread via pointers elsewhere, where useful diagnostics 
> > are unlikely and optimizer induced problems based on UB 
> > might be far more difficult to debug.
> Dynamic instrumentation and tracking makes that possible 
> for that information to follow pointer data flows, globally 
> in the program. 
> 
> E.g. under the Valgrind tool, if one module passes an unitialized 
> object into another, and that other one relies on it to make 
> a conditional branch, it will be diagnosed. You can get the 
> backtrace of where that object was created as well as where 
> the use took place.

And valgrind exists and is a useful tool (I use it myself)
despite not  everything it diagnoses is UB.  But it also has
false positives, so using the same rules for deciding what
should be UB in the standard as valgrind uses seems difficult.

Also note that of the output of a program relies on 
unspecified values, then it is already not strictly conforming
even when the behavior itself is not undefined.  So if an
implementation is smart enough to see this, it could already
reject the program. 

Making already the use of unspecified values in conditional
branches be UB seems problematic. E.g. you could not
compute a hash over data structures with padding and
then compare it later to see whether something has
changed (taking into account  false positives). This seems
similar to memcpy / memcmp but involved conditions,
and such techniques would become non-conforming.

Martin

[toc] | [prev] | [next] | [standalone]


#6553

FromRichard Damon <Richard@Damon-Family.org>
Date2023-08-19 09:18 -0400
Message-ID<ui3EM.518408$TCKc.407024@fx13.iad>
In reply to#6552
On 8/19/23 4:36 AM, Martin Uecker wrote:
> On Saturday, August 19, 2023 at 7:04:10 AM UTC+2, Kaz Kylheku wrote:
>> On 2023-08-18, Martin Uecker <ma.u...@gmail.com> wrote:
>>> On Thursday, August 17, 2023 at 9:08:48 AM UTC+2, Kaz Kylheku wrote:
>>> An implementation does not need a license from the standard
>>> to diagnose anything. I can already diagnose whatever seems
>>> useful and this does not affect conformance at all.
>> That's true about diagnostics at translation time. It's not clear
>> about that happen at run time and indistinguishable from the
>> program's output on stdout or stderr.
> 
> The observable behavior has to stay the same, so yes, it could
> not output to stdout or stderr.  But there is nothing stopping it
> to log debugging information somewhere else, where it could
> be accessed.
> 
>> Also, it might be desirable for it to be conforming to terminate the
>> program if it has run afoul of the rules.
> 
> Yes, this is one main reason to make certain things UB. But
> then it can have false positives and needs to be backward
> compatible, which limits what is possible.
> 
>>>> I would like a model of uninitialized data which usefully lends itself
>>>> to different depths with different trade-offs, like complexity of
>>>> analysis and use of run-time resources. Limits should be imposed by
>>>> implementations (what cases they want to diagnose) rather than by the
>>>> model.
>>>
>>> Tools can already do complex analysis and track down use of
>>> uninitialized variables. But with respect to conformance, I think
>>> the current standard has very good rules: memcpy/memcmp
>>> and similar code works as expected. Locally, where a compiler
>>> can be expected to give good diagnostics via static analysis
>>> the use of uninitialized variables is UB. But this does not
>>> spread via pointers elsewhere, where useful diagnostics
>>> are unlikely and optimizer induced problems based on UB
>>> might be far more difficult to debug.
>> Dynamic instrumentation and tracking makes that possible
>> for that information to follow pointer data flows, globally
>> in the program.
>>
>> E.g. under the Valgrind tool, if one module passes an unitialized
>> object into another, and that other one relies on it to make
>> a conditional branch, it will be diagnosed. You can get the
>> backtrace of where that object was created as well as where
>> the use took place.
> 
> And valgrind exists and is a useful tool (I use it myself)
> despite not  everything it diagnoses is UB.  But it also has
> false positives, so using the same rules for deciding what
> should be UB in the standard as valgrind uses seems difficult.
> 
> Also note that of the output of a program relies on
> unspecified values, then it is already not strictly conforming
> even when the behavior itself is not undefined.  So if an
> implementation is smart enough to see this, it could already
> reject the program.
> 
> Making already the use of unspecified values in conditional
> branches be UB seems problematic. E.g. you could not
> compute a hash over data structures with padding and
> then compare it later to see whether something has
> changed (taking into account  false positives). This seems
> similar to memcpy / memcmp but involved conditions,
> and such techniques would become non-conforming.
> 
> Martin

My understanding is that there is no requirement that the values of the 
padding bytes remains constant over time. I can't imagine a case where 
they will just change at an arbitrary time, but setting a member of the 
structure to a value (even if it is the same value it had) might easily 
affect the value of the padding bytes, so the hash changes.

[toc] | [prev] | [next] | [standalone]


#6554

FromMartin Uecker <ma.uecker@gmail.com>
Date2023-08-19 11:12 -0700
Message-ID<4a526949-06dd-404d-a299-cb30953e7a5fn@googlegroups.com>
In reply to#6553
On Saturday, August 19, 2023 at 3:18:22 PM UTC+2, Richard Damon wrote:
> On 8/19/23 4:36 AM, Martin Uecker wrote: 
> > On Saturday, August 19, 2023 at 7:04:10 AM UTC+2, Kaz Kylheku wrote: 
> >> On 2023-08-18, Martin Uecker <ma.u...@gmail.com> wrote: 
> >>> On Thursday, August 17, 2023 at 9:08:48 AM UTC+2, Kaz Kylheku wrote: 
> >>> An implementation does not need a license from the standard 
> >>> to diagnose anything. I can already diagnose whatever seems 
> >>> useful and this does not affect conformance at all. 
> >> That's true about diagnostics at translation time. It's not clear 
> >> about that happen at run time and indistinguishable from the 
> >> program's output on stdout or stderr. 
> > 
> > The observable behavior has to stay the same, so yes, it could 
> > not output to stdout or stderr. But there is nothing stopping it 
> > to log debugging information somewhere else, where it could 
> > be accessed. 
> > 
> >> Also, it might be desirable for it to be conforming to terminate the 
> >> program if it has run afoul of the rules. 
> > 
> > Yes, this is one main reason to make certain things UB. But 
> > then it can have false positives and needs to be backward 
> > compatible, which limits what is possible. 
> > 
> >>>> I would like a model of uninitialized data which usefully lends itself 
> >>>> to different depths with different trade-offs, like complexity of 
> >>>> analysis and use of run-time resources. Limits should be imposed by 
> >>>> implementations (what cases they want to diagnose) rather than by the 
> >>>> model. 
> >>> 
> >>> Tools can already do complex analysis and track down use of 
> >>> uninitialized variables. But with respect to conformance, I think 
> >>> the current standard has very good rules: memcpy/memcmp 
> >>> and similar code works as expected. Locally, where a compiler 
> >>> can be expected to give good diagnostics via static analysis 
> >>> the use of uninitialized variables is UB. But this does not 
> >>> spread via pointers elsewhere, where useful diagnostics 
> >>> are unlikely and optimizer induced problems based on UB 
> >>> might be far more difficult to debug. 
> >> Dynamic instrumentation and tracking makes that possible 
> >> for that information to follow pointer data flows, globally 
> >> in the program. 
> >> 
> >> E.g. under the Valgrind tool, if one module passes an unitialized 
> >> object into another, and that other one relies on it to make 
> >> a conditional branch, it will be diagnosed. You can get the 
> >> backtrace of where that object was created as well as where 
> >> the use took place. 
> > 
> > And valgrind exists and is a useful tool (I use it myself) 
> > despite not everything it diagnoses is UB. But it also has 
> > false positives, so using the same rules for deciding what 
> > should be UB in the standard as valgrind uses seems difficult. 
> > 
> > Also note that of the output of a program relies on 
> > unspecified values, then it is already not strictly conforming 
> > even when the behavior itself is not undefined. So if an 
> > implementation is smart enough to see this, it could already 
> > reject the program. 
> > 
> > Making already the use of unspecified values in conditional 
> > branches be UB seems problematic. E.g. you could not 
> > compute a hash over data structures with padding and 
> > then compare it later to see whether something has 
> > changed (taking into account false positives). This seems 
> > similar to memcpy / memcmp but involved conditions, 
> > and such techniques would become non-conforming. 
> > 
> > Martin
> My understanding is that there is no requirement that the values of the 
> padding bytes remains constant over time. 

The C standard specifies  when they can change:

"When a value is stored in an object of structure or union type,
including in a member object, the bytes of the object representation
that correspond to any padding bytes take unspecified values"

> I can't imagine a case where 
> they will just change at an arbitrary time, but setting a member of the 
> structure to a value (even if it is the same value it had) might easily 
> affect the value of the padding bytes, so the hash changes.

Sure, writing to object may change the padding and then the
hash changes. This is why I mentioned false positives.

Martin


[toc] | [prev] | [next] | [standalone]


#6548

FromTim Rentsch <tr.17687@z991.linuxsc.com>
Date2023-08-18 20:20 -0700
Message-ID<86wmxr4t22.fsf@linuxsc.com>
In reply to#6542
Kaz Kylheku <864-117-4973@kylheku.com> writes:

> I'm all about the diagnosis.  Even on machines in which all
> representations are values, and therefore safe, a program whose
> external effect or output depends on unintialized data, and is
> therefore nondeterministic (a bad form of nondeterministic), is a
> repugnant program.
>
> I'd like to have clear rules which allow an implementation to to
> go great depths to diagnose all such situations, while remaining
> conforming.  (The language agrees that those situations are
> erroneous, granting the tools license to diagnose.)

The C standard allows compilers to do whatever analysis they
want and to issue diagnostics for whatever conditions or
circumstances they choose.  What you want is orthogonal to
what is being discussed.

[toc] | [prev] | [next] | [standalone]


#6550

FromKaz Kylheku <864-117-4973@kylheku.com>
Date2023-08-19 05:23 +0000
Message-ID<20230818220442.950@kylheku.com>
In reply to#6548
On 2023-08-19, Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
> Kaz Kylheku <864-117-4973@kylheku.com> writes:
>
>> I'm all about the diagnosis.  Even on machines in which all
>> representations are values, and therefore safe, a program whose
>> external effect or output depends on unintialized data, and is
>> therefore nondeterministic (a bad form of nondeterministic), is a
>> repugnant program.
>>
>> I'd like to have clear rules which allow an implementation to to
>> go great depths to diagnose all such situations, while remaining
>> conforming.  (The language agrees that those situations are
>> erroneous, granting the tools license to diagnose.)
>
> The C standard allows compilers to do whatever analysis they
> want and to issue diagnostics for whatever conditions or
> circumstances they choose.

And stop translating? If some use of an uninitialized object
isn't undefined, and you make the diagnostic a fatal error,
then you don't have a conforming compiler at that point.

> What you want is orthogonal to what is being discussed.

I'm mainly concerned about run-time.

If the program hasn't invoked undefined behavior, I don't thinkk it's
conforming to inject gratuitous diagnostics into the program's run-time,
such that they appear as if they were its output on stderr or stdout.
Those diagnostics have to go to some special debug port.

Also, not conforming to arbitrarily terminate the program. (Other
than in some weasly language lawyering way, by declaring that it
has exceeded an implementation limit or something.)

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

[toc] | [prev] | [next] | [standalone]


#6551

FromTim Rentsch <tr.17687@z991.linuxsc.com>
Date2023-08-18 22:56 -0700
Message-ID<86sf8f4lt9.fsf@linuxsc.com>
In reply to#6550
Kaz Kylheku <864-117-4973@kylheku.com> writes:

> On 2023-08-19, Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

[...]

>> The C standard allows compilers to do whatever analysis they
>> want and to issue diagnostics for whatever conditions or
>> circumstances they choose.
>
> And stop translating?  If some use of an uninitialized object
> isn't undefined, and you make the diagnostic a fatal error,
> then you don't have a conforming compiler at that point.
>
> [also]
>
> If the program hasn't invoked undefined behavior, I don't thinkk
> it's conforming to inject gratuitous diagnostics [..or..]
> to arbitrarily terminate the program.  [...]

You need to learn how to say what you mean.  Your earlier
posting didn't say anything about failing to compile
or altering program behavior.  If you can't learn how
to say what you mean then there is roughly a 1e-29 percent
chance that you'll get what you want.

[toc] | [prev] | [next] | [standalone]


Page 1 of 3  [1] 2 3  Next page →

Back to top | Article view | comp.std.c


csiph-web