Path: csiph.com!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: Keith Thompson Newsgroups: comp.lang.c Subject: Re: Representation of _Bool Date: Fri, 17 Jan 2025 13:34:53 -0800 Organization: None to speak of Lines: 197 Message-ID: <87ed116s5e.fsf@nosuchdomain.example.com> References: <87tums515a.fsf@nosuchdomain.example.com> <42fcea7270de500367eceea7ad5530fd@www.novabbs.com> MIME-Version: 1.0 Content-Type: text/plain Injection-Date: Fri, 17 Jan 2025 22:35:00 +0100 (CET) Injection-Info: dont-email.me; posting-host="e1a19721430b37b2b8e372382991ee1a"; logging-data="261196"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19ArmZ/vuE6vo4sZl6aR9Xt" User-Agent: Gnus/5.13 (Gnus v5.13) Cancel-Lock: sha1:pXKNYMS1sPWZ5D5bU4zTZQX8b7M= sha1:zLdcVcng4yjptLn6MKfImFBZgM0= Xref: csiph.com comp.lang.c:390068 learningcpp1@gmail.com (m137) writes: > Hi Keith, > > Thank you for posting this. The message being referred to is one I posted Sun 2021-05-23, with Message-ID <87tums515a.fsf@nosuchdomain.example.com>. It's visible on Google Groups at . As others have suggested, please include attribution information when posting a followup. You don't need to quote the entire message, but provide at least some context, particularly when the parent message is old. This is an update to that message. > I noticed that the newer drafts of C23 > (N2912 onwards, I think) have replaced the term "trap representation" > with "non-value representation": > - **Trap representation** was last defined in [N2731 > 3.19.4(1)](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf#page=) > as "an object representation that need not represent a value of the > object type." > - **Non-value representation** is most recently defined in [N3435 > 3.26(1)](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3435.pdf#page=23) > as "an object representation that does not represent a value of the > object type." > > The definition of non-value representation rules out object > representations that represent a value of the object type from being > non-value representations. So it seems to be stricter than the > definition of trap representation, which does not seem to rule out such > object representations from being trap representations. Is this > interpretation correct? I don't believe so. As far as I can tell, a "non-value representation" (C23 and later) is exactly the same thing as a "trap representation" (C17 and earlier). The older term was probably considered unclear, since it could imply that a trap is required. In fact, reading an object with a trap/non-value representation has undefined behavior, which can include yielding the value you might have expected. > If so, what happens to the 254 trap representations that GCC and Clang > reserve for `_Bool`? I see no evidence in gcc's documentation that gcc treats representations other than 0 or 1 as trap/non-value representations. I see only two references to "trap representation", one for signed integer types (saying that there are no trap representations) and one regarding type-punning via unions. There are no relevant references to "padding bits". I'm less familiar with clang's documentation, but I see no reference to "trap representation" or "non-value representation". We can get some information about this by running a test program. See below. > Assuming a width of 1, each of those 254 object > representations represents a value in `_Bool`'s domain (the half whose > value bit is 1 represents the value `true`, while the other half whose > value bit is 0 represents the value `false`), so they cannot be thought > of as non-value representations (since a non-value representation must > be an object representation that **does not** represent a value of the > object type). Reading an object with a non-value representation has undefined behavior. If the observed value happens to be a valid value of the object's type, that's still consistent with undefined behavior. *Everything* is consistent with undefined behavior. > I've been stuck on this for quite some time, so would be grateful for > any guidance you could provide. Editions of the C standard earlier than C23 were not entirely clear about the representation of _Bool. (C90 does not have _Bool or bool. C99 through C17 have _Bool as a keyword, with bool as a macro defined in . C23 has bool as a keyword, with _Bool as an alternate spelling.) In C99 and later, _Bool/bool is required to be an unsigned integer type large enough to hold the values 0 and 1. Its size must be at least CHAR_BIT bits (which is at least 8). The *rank* of _Bool is less than the rank of all other standard integer types. The rank implies that the range of values is a subset of the range of values of any other unsigned integer type. The rank does *not* imply anything about relative sizes. unsigned char has a higher rank than bool, but bool could have additional padding bits making sizeof(bool)>1. (Probably no implementation does this.) unsigned char has no padding bits. C11 implies that _Bool can have more than one value bit, which means it could represent values greater than 1 (but no more than 0..UCHAR_MAX). C23 (I'm using the N3096 draft) tightens the requirements, saying that bool has exactly one value bit and (sizeof(bool)*CHAR_BIT)-1 padding bits -- again implying that sizeof(bool) might be greater than 1, but forbidding values greater than 1. Typically in C17 and earlier, and always in C23, _Bool/bool will have exactly 1 value bit and CHAR_BIT-1 padding bits. Padding bits do not contribute to the value of an object (so 0 and 1 are the only possible values), but non-zero padding bits *may or may not* create trap/non-value representations. (A gratuitously exotic implementation might use a representation other than 00000001 for true, but 00000000 is guaranteed to be a representation for 0/false.) As far as I can tell, the standard is silent on whether a bool object with non-zero padding bits is a trap/non-value representation or not. I wrote a test program to explore how bool is treated. It uses memcpy to set the representation of a bool object and then prints the value of that object. Source is at the bottom of this message. If bool has no non-value representations, then the values of the CHAR_BIT-1 padding bits must be ignored when reading a bool object, and the value of such an object is determined only by its single value bit, 0 or 1. If it does have non-value representations, then reading such an object has undefined behavior. With gcc 14.2.0, with "-std=c23", all-zeros is treated as false when used in a condition and all other representations are treated as true. Converting the value of a bool object to another integer type yields the value of its full 8-bit representation. If a bool object holds a representation other than 00000000 or 00000001, it compares equal to both `true` and `false`. This implies that bool has 1 value bit and 7 padding bits (as required by C23) and that it has 2 value representations and 254 trap representations. The observed behavior for the non-value representations is the result of undefined behavior. (gcc -std=c23 sets __STDC_VERSION__ to 202000L, not 202311L. The documentation acknowledges that support for C23 is experimental and incomplete.) With clang 19.1.4, with "-std=c23", the behavior is consistent with bool having no non-value representations. The 7 padding bits do not contribute to the value of a bool object. Any bool object with 0 as the low-order bit is treated as false in a condition and yields 0 when converted to another integer type,. Any bool object with 1 as the low-order bit is treated as true, and yields 1 when converted to another integer type. I presume the intent is for bool to have 256 value representations and no non-value representations (with the padding bits ignored as required), but it's also consistent with bool having non-value representations and the observed behavior being undefined. It's not possible to determine with a test program whether the output is the result of undefined behavior or not. As far as I can tell, the question of whether bool has non-value representations is unspecified but not implementation-defined, meaning that an implementation is not required to document its choice. #include #include #include #if __STDC_VERSION__ < 202311L #include #endif int main() { printf("__STDC_VERSION__ = %ldL\n", __STDC_VERSION__); #if __STDC_VERSION__ < 202311L puts("Older than C23, using "); #else puts("C23 or later, using bool directly"); #endif printf("sizeof (unsigned char) = %zu, sizeof (bool) = %zu\n", sizeof (unsigned char), sizeof (bool)); const bool no = false; const bool yes = true; unsigned char uc; memcpy(&uc, &no, 1); printf("false is represented as %d\n", (int)uc); memcpy(&uc, &yes, 1); printf("true is represented as %d\n", (int)uc); for (int i = 0; i <= UCHAR_MAX; i ++) { const unsigned char uc = i; bool b; memcpy(&b, &uc, 1); const unsigned char value = b; printf("uc = 0x%02x b = 0x%02x b is %s, b%sfalse, b%strue\n", (unsigned)uc, value, b ? "truthy" : "falsy ", b == false ? "==" : "!=", b == true ? "==" : "!="); } } -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com void Void(void) { Void(); } /* The recursive call of the void */