Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From: Kaz Kylheku <480-992-1380@kylheku.com>
Newsgroups: comp.compilers
Subject: Re: Union C++ standard
Date: Fri, 26 Nov 2021 18:06:37 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 78
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <21-11-006@comp.compilers>
References: <21-11-004@comp.compilers>
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="54566"; mail-complaints-to="abuse@iecc.com"
Keywords: C, standards
Posted-Date: 26 Nov 2021 13:26:59 EST
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: csiph.com comp.compilers:2753

On 2021-11-25, Hans-Peter Diettrich <DrDiettrich1@netscape.net> wrote:
> Can somebody explain why the access to members of a union is "undefined"
> except for the most recently written member?

I don't think that is true; if two members of foo, x and y, have
the same type, then it's possible to write to foo.x and then read foo.y.

> What can be undefined in a union of data types of the same typesize end
> alignment?

The representation. Same size and alignment are not sufficient
determiners of type.

For instance, you may find that int and float are of the same size on
the compiler you're using.

If the language does not define what it means to access a float object
through an int lvalue, that allows aggressive optimizations based on
the assumption that type aliasing is absent in the program.

For instance suppose that you have

  struct s {
    float *pflo;
    int ival;
  };

and a (nonsensical example) function working with a struct s *ptr
parameter:

  int fun(struct s *ptr)
  {
    ptr->ival++;
    *ptr->pflo = 0;
    return ptr->ival;
  }

Under the assumption that objects of different types are not aliased
by the program, the compiler can edit code which:

1. reads ptr->ival
2. stores the increment value back into ptr->ival
3. stores 0.0 through *ptr->pflo
4. returns the previously incremented value.

Now suppose that aliasing is allowed among any types, like int and
float. The compiler has no idea what ptr->pflo points to. The
caller could easily have set it like this:

   ptr->pflo = (float *) &ptr->ival;

So if that is allowed, we cannot emit the code like above. We
must do this:

1. read ptr->ival
2. store the incremented value back into ptr->ival
3. store 0.0 through *ptr->flo
4. NEW: re-read ptr->ival in case it was changed by 3.
5. return the re-read value.

Now that's just one problem. The other is the problem that writing a
value as one type and reading as another, if required to be defined in
terms of bits or whatever, is going to be entirely nonportable
nonetheless. The language standard cannot define it completely to the
point that you can rely on the value being the same when the program is
ported.  At best the standard could say that it's implementation-defined
behavior to read through differently-typed union-member.
Implementation-defined is basically "almost-undefined, except the
situation must be documented by the implementor and cannot blow up".

If certain behavior of unions is valuable to the users of a compiler,
they can always negotiate that with their compiler vendor; the
standard doesn't have to be involved in everything that is defined
between the implementor and programmer.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal