Groups > comp.compilers > #2751 > unrolled thread

Union C++ standard

Started by	Hans-Peter Diettrich <DrDiettrich1@netscape.net>
First post	2021-11-25 11:11 +0100
Last post	2021-11-29 14:32 -0800
Articles	14 — 8 participants

Back to article view | Back to comp.compilers

  Union C++ standard Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2021-11-25 11:11 +0100
    Re: Union C++ standard Kaz Kylheku <480-992-1380@kylheku.com> - 2021-11-26 18:06 +0000
    Re: Union C++ standard gah4 <gah4@u.washington.edu> - 2021-11-26 12:16 -0800
    Re: Union C++ standard David Brown <david.brown@hesbynett.no> - 2021-11-27 16:59 +0100
      Re: Union C++ standard Derek Jones <derek@NOSPAM-knosof.co.uk> - 2021-11-28 12:51 +0000
        Re: Union C++ standard David Brown <david.brown@hesbynett.no> - 2021-11-28 19:00 +0100
          Re: Union C++ standard Derek Jones <derek@NOSPAM-knosof.co.uk> - 2021-11-29 00:09 +0000
            Re: Union C++ standard David Brown <david.brown@hesbynett.no> - 2021-11-29 21:00 +0100
              Re: Union C++ standard Derek Jones <derek@NOSPAM-knosof.co.uk> - 2021-11-30 00:46 +0000
                Re: Union C++ standard George Neuner <gneuner2@comcast.net> - 2021-11-30 17:18 -0500
                  Re: Union C++ standard terminology Derek Jones <derek@knosof.co.uk> - 2021-12-01 13:35 +0000
                Re: Union C++ standard David Brown <david.brown@hesbynett.no> - 2021-11-30 23:24 +0100
          Re: Union C++ standard Kaz Kylheku <480-992-1380@kylheku.com> - 2021-11-29 16:39 +0000
            Re: Union C++ standard Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-11-29 14:32 -0800

#2751 — Union C++ standard

From	Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Date	2021-11-25 11:11 +0100
Subject	Union C++ standard
Message-ID	<21-11-004@comp.compilers>

Can somebody explain why the access to members of a union is "undefined"
except for the most recently written member?

What can be undefined in a union of data types of the same typesize end
alignment? Any member written will result in a unique bit/byte pattern
in memory, whose reading may not make sense in a different type but
undoubtedly is well defined.

DoDi
[I think it's undefined in a standards sense.  In any individual
implementation the result is predictable, but it's not portable. -John]

[toc] | [next] | [standalone]

#2753

From	Kaz Kylheku <480-992-1380@kylheku.com>
Date	2021-11-26 18:06 +0000
Message-ID	<21-11-006@comp.compilers>
In reply to	#2751

On 2021-11-25, Hans-Peter Diettrich <DrDiettrich1@netscape.net> wrote:
> Can somebody explain why the access to members of a union is "undefined"
> except for the most recently written member?

I don't think that is true; if two members of foo, x and y, have
the same type, then it's possible to write to foo.x and then read foo.y.

> What can be undefined in a union of data types of the same typesize end
> alignment?

The representation. Same size and alignment are not sufficient
determiners of type.

For instance, you may find that int and float are of the same size on
the compiler you're using.

If the language does not define what it means to access a float object
through an int lvalue, that allows aggressive optimizations based on
the assumption that type aliasing is absent in the program.

For instance suppose that you have

  struct s {
    float *pflo;
    int ival;
  };

and a (nonsensical example) function working with a struct s *ptr
parameter:

  int fun(struct s *ptr)
  {
    ptr->ival++;
    *ptr->pflo = 0;
    return ptr->ival;
  }

Under the assumption that objects of different types are not aliased
by the program, the compiler can edit code which:

1. reads ptr->ival
2. stores the increment value back into ptr->ival
3. stores 0.0 through *ptr->pflo
4. returns the previously incremented value.

Now suppose that aliasing is allowed among any types, like int and
float. The compiler has no idea what ptr->pflo points to. The
caller could easily have set it like this:

   ptr->pflo = (float *) &ptr->ival;

So if that is allowed, we cannot emit the code like above. We
must do this:

1. read ptr->ival
2. store the incremented value back into ptr->ival
3. store 0.0 through *ptr->flo
4. NEW: re-read ptr->ival in case it was changed by 3.
5. return the re-read value.

Now that's just one problem. The other is the problem that writing a
value as one type and reading as another, if required to be defined in
terms of bits or whatever, is going to be entirely nonportable
nonetheless. The language standard cannot define it completely to the
point that you can rely on the value being the same when the program is
ported.  At best the standard could say that it's implementation-defined
behavior to read through differently-typed union-member.
Implementation-defined is basically "almost-undefined, except the
situation must be documented by the implementor and cannot blow up".

If certain behavior of unions is valuable to the users of a compiler,
they can always negotiate that with their compiler vendor; the
standard doesn't have to be involved in everything that is defined
between the implementor and programmer.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

[toc] | [prev] | [next] | [standalone]

#2754

From	gah4 <gah4@u.washington.edu>
Date	2021-11-26 12:16 -0800
Message-ID	<21-11-007@comp.compilers>
In reply to	#2751

On Friday, November 26, 2021 at 9:32:00 AM UTC-8, Hans-Peter Diettrich wrote:
> Can somebody explain why the access to members of a union is "undefined"
> except for the most recently written member?

> What can be undefined in a union of data types of the same typesize end
> alignment? Any member written will result in a unique bit/byte pattern
> in memory, whose reading may not make sense in a different type but
> undoubtedly is well defined.

In addition to the previously mentioned reasons, which I agree with,
there used to be (maybe still are) machines that tag memory with the
type stored.  That makes it very difficult to access memory as bits of
a different type.  Some language standards are written to allow for
those machines.

Many languages originated before IEEE floating point, where there
was no expectation that floating point values would agree between
different machines. Even more, some have "trap" values in floating
point, such that one can't reference some values. (VAX has a trap
value for negative zero.)  Since the language can't control all this,
it is made undefined. (But otherwise, machine dependent.)

JVM, while not using tags, is defined such that programs don't
do that.  The verifier is supposed to catch attempts, even if not
executed, to access memory the wrong way. Among others,
that allows for programs to be endian independent. (long takes
twice as much memory as int, but by refusing such access,
programs can't detect that, and so work on all hardware.)

(Not that it is likely that there will be a C++ compiler for JVM.)
[I think the Unisys Libra series may still run tagged Burroughs
architecture code, but if so I doubt there was ever a C compiler. -John]

[toc] | [prev] | [next] | [standalone]

#2755

From	David Brown <david.brown@hesbynett.no>
Date	2021-11-27 16:59 +0100
Message-ID	<21-11-008@comp.compilers>
In reply to	#2751

On 25/11/2021 11:11, Hans-Peter Diettrich wrote:
> Can somebody explain why the access to members of a union is "undefined"
> except for the most recently written member?
>
> What can be undefined in a union of data types of the same typesize end
> alignment? Any member written will result in a unique bit/byte pattern
> in memory, whose reading may not make sense in a different type but
> undoubtedly is well defined.
>
> DoDi
> [I think it's undefined in a standards sense.  In any individual
> implementation the result is predictable, but it's not portable. -John]
>

In C++, objects of a class typically have some kind of invariant which
is established by the constructor, and kept consistent when accessed via
its public methods.  Messing with the underlying data representation
directly is going to risk losing that - it means you are accessing data
without going through the proper defined interface (the public or
protected methods and members).

In C, type-punning via unions is allowed (i.e., fully defined behaviour
in the standards), but not in C++ where the language is expected to
enforce higher-level aspects of the data.

[toc] | [prev] | [next] | [standalone]

#2756

From	Derek Jones <derek@NOSPAM-knosof.co.uk>
Date	2021-11-28 12:51 +0000
Message-ID	<21-11-009@comp.compilers>
In reply to	#2755

David,

> In C, type-punning via unions is allowed (i.e., fully defined behaviour

That is not true.  Writing into one member and then reading from
another member is undefined behavior.

There is a special dispensation for what is known as a
common initial sequence:
sentence 1029
http://c0x.shape-of-code.com/6.5.2.3.html

> in the standards), but not in C++ where the language is expected to
> enforce higher-level aspects of the data.

This is a meaningless statement.

[toc] | [prev] | [next] | [standalone]

#2757

From	David Brown <david.brown@hesbynett.no>
Date	2021-11-28 19:00 +0100
Message-ID	<21-11-010@comp.compilers>
In reply to	#2756

On 28/11/2021 13:51, Derek Jones wrote:
> David,
>
>> In C, type-punning via unions is allowed (i.e., fully defined behaviour
>
> That is not true.  Writing into one member and then reading from
> another member is undefined behavior.

No, it is correct.  It would be helpful if you looked at the full
published standards, or (as most people do, since they are free) the
final pre-publishing drafts.  In particular, they contain the footnotes
that appear to be missing in the format you linked here.  Footnotes are
not part of the normative text, but are added for clarification.  (Your
reference also misses the standard paragraph numbering, and it is
outdated - not that this particular issue has changed since C was
standardised.)

So, the relevant paragraph is 6.5.2.3p3:

"""
A postfix expression followed by the . operator and an identifier
designates a member of a structure or union object. The value is that of
the named member, 101) and is an lvalue if the first expression is an
lvalue. If the first expression has qualified type, the result has the
so-qualified version of the type of the designated member.
"""

The footnote (101 in C18 - footnote numbers are not consistent between C
standard versions) is:

"""
If the member used to read the contents of a union object is not the
same as the member last used to store a value in the object, the
appropriate part of the object representation of the value is
reinterpreted as an object representation in the new type as described
in 6.2.6 (a process sometimes called "type punning"). This might be a
trap representation.
"""

These quotations are from C18 (draft N2346), which is the current C
standard (until C23 is finalised).  They have not changed since C99,
when the footnote was added without a change to the normative text.
This means that as far as the C committee was concerned, using unions
for type-punning has always (since standardisation) been valid in C, but
they realised that the text was unclear and thus added the footnote.
(Arguably, since C90 did not clearly state that type-punning was
defined, the behaviour was in fact undefined - though probably all C
compilers allowed the behaviour.)

> There is a special dispensation for what is known as a
> common initial sequence:
> sentence 1029
> http://c0x.shape-of-code.com/6.5.2.3.html

This is an additional guarantee that has a longer history - it
specifically allows a particular type of access that had been in regular
use from before C was standardised.

>> in the standards), but not in C++ where the language is expected to
>> enforce higher-level aspects of the data.
>
> This is a meaningless statement.

I disagree, but perhaps that is subjective.  In C++, accessing a member
of a union other than the one most recently written (or "active") member
is undefined behaviour, unless it matches the "initial sequence" exception.

Some useful references are:

<https://en.cppreference.com/w/c/language/union>
<https://en.cppreference.com/w/cpp/language/union>

While that site does not have the weight of the C or C++ standards, it
is supported by and contributed to by the C and C++ standards committees
and their ISO working groups.  The site does not get that kind of thing
wrong.
[I see what the standard says, but I don't see how reinterpreting the
bits from one type to another can be fully defined. I've certainly done
it, but it never seemed very portable. -John]

[toc] | [prev] | [next] | [standalone]

#2758

From	Derek Jones <derek@NOSPAM-knosof.co.uk>
Date	2021-11-29 00:09 +0000
Message-ID	<21-11-011@comp.compilers>
In reply to	#2757

David,

>>> In C, type-punning via unions is allowed (i.e., fully defined behaviour
>>
>> That is not true.  Writing into one member and then reading from
>> another member is undefined behavior.
>
> No, it is correct.  It would be helpful if you looked at the full

You have misunderstood the C conformance model, which revolves around
the use of "shall" and "shall not", and the kind of section in which
they appear (e.g., Constraints).  See:
http://c0x.shape-of-code.com/4..html

For a longer discussion see: http://knosof.co.uk/cbook/

> """
> If the member used to read the contents of a union object is not the
> same as the member last used to store a value in the object, the
> appropriate part of the object representation of the value is
> reinterpreted as an object representation in the new type as described
> in 6.2.6 (a process sometimes called "type punning"). This might be a
> trap representation.
> """
>
> These quotations are from C18 (draft N2346), which is the current C
> standard (until C23 is finalised).  They have not changed since C99,

This footnote was added in response to this DR (so it must have come
after C99):
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm

[toc] | [prev] | [next] | [standalone]

#2760

From	David Brown <david.brown@hesbynett.no>
Date	2021-11-29 21:00 +0100
Message-ID	<21-11-013@comp.compilers>
In reply to	#2758

On 29/11/2021 01:09, Derek Jones wrote:
> David,
>
>>>> In C, type-punning via unions is allowed (i.e., fully defined behaviour
>>>
>>> That is not true.  Writing into one member and then reading from
>>> another member is undefined behavior.
>>
>> No, it is correct.  It would be helpful if you looked at the full
>
> You have misunderstood the C conformance model, which revolves around
> the use of "shall" and "shall not", and the kind of section in which
> they appear (e.g., Constraints).  See:
> http://c0x.shape-of-code.com/4..html
>
> For a longer discussion see: http://knosof.co.uk/cbook/

I was not aware of your qualifications when I posted earlier - you have
been directly involved in things that I can only infer from reading the
standards and other material.

Let me put it this way.  Those of us who read the C standards, but were
not involved in writing them, do our best to interpret the precise
meaning of the words in the normative text.  Those meanings are not
always clear.  When we see examples or footnotes, we know they were
added by the same people that wrote the standard, and are used as
clarification for the meaning of the normative text.  The footnote
(added, as you say, in a C99 TC in response to a defect report - and
therefore AIUI as much part of C99 standard as the original published
text since the TC's replace previous versions) makes it perfectly clear
that type-punning via unions is defined behaviour in C.  This codifies
the existing practice supported by most (if not all) C compilers, and
relied upon by code.

Since the change was a clarifying footnote, not a change to the
normative text, the implication is that the normative text was always
intended to support these semantics.  The only three alternatives I see
to that is that the footnote was added by some committee members who
disagreed with what other committee members wrote in the normative text,
that the committee changed their minds about union semantics but did not
change the normative text, or that the footnote was deliberately added
to confuse people.  None of these alternatives is appealing.

If my reasoning here is faulty, I'd be grateful if you could point out
the flaw.

David

[toc] | [prev] | [next] | [standalone]

#2762

From	Derek Jones <derek@NOSPAM-knosof.co.uk>
Date	2021-11-30 00:46 +0000
Message-ID	<21-11-015@comp.compilers>
In reply to	#2760

David,

> I was not aware of your qualifications when I posted earlier - you have
> been directly involved in things that I can only infer from reading the
> standards and other material.

You should always infer meaning by reading from the standard, never
defer to anybody arguing from authority.

> Let me put it this way.  Those of us who read the C standards, but were
> not involved in writing them, do our best to interpret the precise
> meaning of the words in the normative text.  Those meanings are not
> always clear.

You have made the mistake of reading the standard as "plain English".
Almost everybody falls into this trap when they start out.
In fact the standard is a stylized version of English, with some phrases
specified to have a given meaning in specific contexts.

As the committee is always saying, the standard is not intended as
a tutorial.  You probably need to read it three or four times to
get an idea of how it fits together (there is a strange logic to it).

Start by understanding how the text is styled.

The Conformance section specifies how "shall" and "shall not" are to be
interpreted.

You also need to understand "unspecified behaviors" and "undefined behaviors".

See Kaz Kylheku's discussion of the status of footnotes.

You need to trace a legalistic top down approach (which takes
practice).

There are people actively discussing standard C on comp.std.c

Footnotes state the obvious when it is not obvious to somebody.
They are also an enormous source of confusion and best ignored.

[toc] | [prev] | [next] | [standalone]

#2763

From	George Neuner <gneuner2@comcast.net>
Date	2021-11-30 17:18 -0500
Message-ID	<21-11-016@comp.compilers>
In reply to	#2762

On Tue, 30 Nov 2021 00:46:04 +0000, Derek Jones
<derek@NOSPAM-knosof.co.uk> wrote:

>You have made the mistake of reading the standard as "plain English".
>Almost everybody falls into this trap when they start out.
>In fact the standard is a stylized version of English, with some phrases
>specified to have a given meaning in specific contexts.
>
>As the committee is always saying, the standard is not intended as
>a tutorial.  You probably need to read it three or four times to
>get an idea of how it fits together (there is a strange logic to it).
>
>Start by understanding how the text is styled.
>
>The Conformance section specifies how "shall" and "shall not" are to be
>interpreted.

But it does NOT define "will" and "will not", and "must" and "must
not", and "does" and "does not" ... terms which are used liberally in
the documents, apparently without having any normative definition.

Not to mention that the Conformance section generally is not included
in draft documents.  Nor are there easy to find, freely available,
references on how to read various standards documents.

A great many programmers are in work situations which can't support
purchasing every official document that might apply.

>You also need to understand "unspecified behaviors" and "undefined behaviors".
>
>See Kaz Kylheku's discussion of the status of footnotes.
>
>You need to trace a legalistic top down approach (which takes practice).
>
>There are people actively discussing standard C on comp.std.c
>
>Footnotes state the obvious when it is not obvious to somebody.
>They are also an enormous source of confusion and best ignored.

YMMV,
George

[toc] | [prev] | [next] | [standalone]

#2765 — Re: Union C++ standard terminology

From	Derek Jones <derek@knosof.co.uk>
Date	2021-12-01 13:35 +0000
Subject	Re: Union C++ standard terminology
Message-ID	<21-12-001@comp.compilers>
In reply to	#2763

George,

>> The Conformance section specifies how "shall" and "shall not" are to be
>> interpreted.
>
> But it does NOT define "will" and "will not", and "must" and "must
> not", and "does" and "does not" ... terms which are used liberally in
> the documents, apparently without having any normative definition.

The ISO directives say:
'Do not use "must" as an alternative for "shall".'
https://isotc.iso.org/livelink/livelink?func=ll&objId=4230456&objAction=browse&sort=subtype
Although the IETF treats the terms similarly:
https://www.ietf.org/rfc/rfc2119.txt

My recollection is the the ISO directives used to strongly recommend
against the use of any form of "must".

The get out of jail answer is to point out that
"ISO/IEC 2382−1:1993, Information technology — Vocabulary — Part 1:
Fundamental terms"
appears in the list of Normative references.
Back when libraries used to contain paper documents, I spent an
afternoon rummaging around the various parts of ISO 2382.
I was surprised to find out how few terms are defined and how
vague/general the definitions actually were.

I have been in committee meetings were people said the term was
defined in ISO 2382, we found out that it wasn't, then everybody
switched to saying: "Ok, common usage English applies" (whatever
that is; the "Longman Grammar of Spoken and Written English" is
great, but out of print, see the student edition).

There is one occurrence of the word "must" in the standard, in an
example.

"does not" is common, mostly in examples and footnotes.
The instances I have looked at look reasonable, e.g.,
"Each ? that does not begin one of the trigraphs..."

The three instances of "will not" all appear in footnotes.

> Not to mention that the Conformance section generally is not included
> in draft documents.  Nor are there easy to find, freely available,
> references on how to read various standards documents.

It appears in every copy of the draft standard I have seen.

[toc] | [prev] | [next] | [standalone]

#2764

From	David Brown <david.brown@hesbynett.no>
Date	2021-11-30 23:24 +0100
Message-ID	<21-11-017@comp.compilers>
In reply to	#2762

On 30/11/2021 01:46, Derek Jones wrote:
> David,
>
>> I was not aware of your qualifications when I posted earlier - you have
>> been directly involved in things that I can only infer from reading the
>> standards and other material.
>
> You should always infer meaning by reading from the standard, never
> defer to anybody arguing from authority.

But it helps to listen to people or read sources that have established a
position of respect and a reputation for reliability.  Of course anyone
can get things wrong.  If several people whom I know to be experts in a
language all agree, but have an interpretation different from what I
read in the standard, then I must suspect my own interpretation - at the
very least, it warrants further investigation and discussion.

In this case, you are - I assume - a person with a high degree of
experience and knowledge of the C standards.  I don't know you well
enough to judge for myself, having only read a few of your posts, but
your qualifications are significant.  Your interpretation of the
standard here differs from mine, and from how I have seen many other
experts and reliable resources interpret it.  So I am not deferring to
anyone - I am reading the standard.  But I am asking to help figure out
if I am reading the standard correctly!

>> Let me put it this way.  Those of us who read the C standards, but were
>> not involved in writing them, do our best to interpret the precise
>> meaning of the words in the normative text.  Those meanings are not
>> always clear.
>
> You have made the mistake of reading the standard as "plain English".
> Almost everybody falls into this trap when they start out.
> In fact the standard is a stylized version of English, with some phrases
> specified to have a given meaning in specific contexts.
>

I am not making that mistake - at least, not as a general point.  I am
well aware of the specialised and stylized language used in the
standard, and how specific terms and phrases can have meanings that are
not "plain English".  It is, however, possible that I am making an error
in the interpretation of this particular issue.  Just as I do not know
you, you do not know me - I've been studying and discussing the C
standards for a great many years, and I am not new to it.  (Again, that
experience does not mean I think I know everything about it or that my
interpretation of it is flawless.)  My programming field is quite
specialised - small-systems embedded programming - and I have not
bothered about parts of the standard that are not relevant there.  But
I've gone through a lot of the "meat" of the documents, many times.

> As the committee is always saying, the standard is not intended as
> a tutorial.  You probably need to read it three or four times to
> get an idea of how it fits together (there is a strange logic to it).
>
> Start by understanding how the text is styled.
>
> The Conformance section specifies how "shall" and "shall not" are to be
> interpreted.
>

Yes.

> You also need to understand "unspecified behaviors" and "undefined
> behaviors".
>

I know.

> See Kaz Kylheku's discussion of the status of footnotes.
>

I did.  I agree that the footnotes are not normative, but I don't agree
with his interpretation of the footnote.  The footnote says very clearly
that type-punning using a union is defined as storing a value of one
type in the representation (given in 6.2.6, along with
implementation-dependent details), then re-interpreting that
representation as the new type when read.

> You need to trace a legalistic top down approach (which takes
> practice).
>
> There are people actively discussing standard C on comp.std.c
>

I follow that group.

> Footnotes state the obvious when it is not obvious to somebody.
> They are also an enormous source of confusion and best ignored.

I'm sorry, but none of what you wrote comes close to answering my
question.  Your response merely says that the standard is written in
"standardese" and must be read appropriately - had I been new to the C
standards, it would have been useful.

The footnote was added specifically as a TC based on a DR that would
have been voted on, and it has been left untouched through three
revisions of the standard despite being arguably a critical part of the
language that many programs rely on.  Either the footnote accurately
describes the "rules" of the language - that type-punning via unions is
defined behaviour that can be relied upon (albeit with some
implementation-specific details, and scope for undefined behaviour from
accessing trap representations - such as storing 2 to a char, then
reading it as a _Bool), or the footnote was added deliberately and
intentionally with the aim of confusing and misleading people.

I find it hard to believe the latter - despite that being your suggestion
here.

Again, if you see a flaw in my reasoning, please say.

David

[toc] | [prev] | [next] | [standalone]

#2759

From	Kaz Kylheku <480-992-1380@kylheku.com>
Date	2021-11-29 16:39 +0000
Message-ID	<21-11-012@comp.compilers>
In reply to	#2757

On 2021-11-28, David Brown <david.brown@hesbynett.no> wrote:
> On 28/11/2021 13:51, Derek Jones wrote:
>> David,
>>
>>> In C, type-punning via unions is allowed (i.e., fully defined behaviour
>>
>> That is not true.  Writing into one member and then reading from
>> another member is undefined behavior.
>
> No, it is correct.  It would be helpful if you looked at the full
> published standards, or (as most people do, since they are free) the
> final pre-publishing drafts.  In particular, they contain the footnotes
> that appear to be missing in the format you linked here.  Footnotes are
> not part of the normative text, but are added for clarification.

Not being normative means that anything that looks like a requirement
that is in a footnote is not actually a requirement.

Footnotes can only clarify requirements that are not themselves in a
foonote; they can't add new requirements.

If you cannot infer the existence of a requirement while ignoring all
foonotes and examples, it isn't there.

> These quotations are from C18 (draft N2346), which is the current C
> standard (until C23 is finalised).  They have not changed since C99,
> when the footnote was added without a change to the normative text.

I have a copy of C99 (the final thing from ANSI, not a draft); I do not
see any such footnote; the paragraph has no footnotes.

> This means that as far as the C committee was concerned, using unions
> for type-punning has always (since standardisation) been valid in C,

Note that the "trap representation" terminology didn't exist prior to
C99, so any footnote referencing such a thing cannot possibly reflect
any intent about what C was going back to before standardization,
let alone some new footnote since C99.

> they realised that the text was unclear and thus added the footnote.
> (Arguably, since C90 did not clearly state that type-punning was
> defined, the behaviour was in fact undefined - though probably all C
> compilers allowed the behaviour.)

The concept of trap representations adds nuance to the requirements
for accessing objects; it doesn't make everything defined.

The trap concept is used to create a new (in C99) model why accessing an
object a field of "unsigned char" is okay: the unsigned char type has no
trap representation: all combinations of bit patterns give rise to a
valid value.

For instance, if we have a union like this:

  union u {
     int x;
     unsigned char y[sizeof (int)];
  }

then certain requirements can be inferred if we store x and access y[0],
based on knowing the implementation's parameters, like size and
representation of the integer and byte order.

I don't believe that the footnote you quoted gives any special blessing
to union-based type punning over pointer-based type punning. It just
clarifies the fact that unions are type punning, subject to the same
requirements as any other type punning. It points the reader to section
6.2.6 where the real requirements are, using which all instances of
type punning are to be interpreted.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

[toc] | [prev] | [next] | [standalone]

#2761

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2021-11-29 14:32 -0800
Message-ID	<21-11-014@comp.compilers>
In reply to	#2759

Kaz Kylheku <480-992-1380@kylheku.com> writes:
> On 2021-11-28, David Brown <david.brown@hesbynett.no> wrote:
>> On 28/11/2021 13:51, Derek Jones wrote:
>>> David,
>>>
>>>> In C, type-punning via unions is allowed (i.e., fully defined behaviour
>>>
>>> That is not true.  Writing into one member and then reading from
>>> another member is undefined behavior.
>>
>> No, it is correct.  It would be helpful if you looked at the full
>> published standards, or (as most people do, since they are free) the
>> final pre-publishing drafts.  In particular, they contain the footnotes
>> that appear to be missing in the format you linked here.  Footnotes are
>> not part of the normative text, but are added for clarification.
>
> Not being normative means that anything that looks like a requirement
> that is in a footnote is not actually a requirement.
>
> Footnotes can only clarify requirements that are not themselves in a
> foonote; they can't add new requirements.
>
> If you cannot infer the existence of a requirement while ignoring all
> foonotes and examples, it isn't there.
>
>> These quotations are from C18 (draft N2346), which is the current C
>> standard (until C23 is finalised).  They have not changed since C99,
>> when the footnote was added without a change to the normative text.

I don't think N2346 is a draft of "C18".  The page headers say:

    N2346      working draft — March 13, 2019      ISO/IEC 9899:202x (E)

The current C standard is usually referred to as "C17"; it's a minor
update to C11.  Work is in progress on C2X, which will supersede C17.

> I have a copy of C99 (the final thing from ANSI, not a draft); I do not
> see any such footnote; the paragraph has no footnotes.

(It was published by ISO.  ANSI adopted it and sold copies.)

The footnote does not appear in the published 1999 ISO C standard.  It
was added by Technical Corrigendum 3, published in 2007 (and therefore
in the N1256 draft, which includes the 1999 standard with the three
Technical Corrigenda merged into it).  As far as ISO is concerned, it's
part of C99 -- and of later editions of the standard, which have
superseded C99.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */
[I think this horse has been beaten about all it needs to.  And
it reminds us that just because C says something is defined does
not mean it's portable or that the results are easily predictable.

This is hardly the only place, e.g., the exciting range of things
that might or might not happen if the result of an integer operation
doesn't fit in its result type.  It might wrap, it might overflow,
it might saturate.  Or it might not. -John]

[toc] | [prev] | [standalone]

csiph-web

Union C++ standard

Contents

#2751 — Union C++ standard

#2753

#2754

#2755

#2756

#2757

#2758

#2760

#2762

#2763

#2765 — Re: Union C++ standard terminology

#2764

#2759

#2761