Path: csiph.com!eternal-september.org!feeder.eternal-september.org!nntp.eternal-september.org!.POSTED!not-for-mail
From: Tim Rentsch <tr.17687@z991.linuxsc.com>
Newsgroups: comp.lang.c
Subject: Re: Safety of casting from 'long' to 'int'
Date: Wed, 06 May 2026 19:43:16 -0700
Organization: A noiseless patient Spider
Lines: 186
Message-ID: <86o6isuegr.fsf@linuxsc.com>
References: <10su8cn$am9i$1@dont-email.me> <10sv4v0$h9mn$1@dont-email.me> <84c1c180f4d5b96259a631bdb09b6054b4eb44d2.camel@gmail.com> <10svgfv$l2bu$1@dont-email.me> <fd7787f25cf958abb657d3098b0c9f5cc80df928.camel@gmail.com> <10t4hse$22u36$1@dont-email.me> <97a1c40bf71cfe8edab25d5ac8a1ad435c3995e5.camel@gmail.com> <10t4tjd$25vb5$1@dont-email.me> <10t4viv$25van$2@dont-email.me> <10t55r6$28suo$1@dont-email.me> <86se89xtb8.fsf@linuxsc.com> <20260503082605.000073dc@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Date: Thu, 07 May 2026 02:43:21 +0000 (UTC)
Injection-Info: dont-email.me; logging-data="1778146"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/sbsXqma2Zw6dCsuNvr6HATtSjDAEh+LA="; posting-host="a025a3924a4370d705b785ca24b22822"
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:x0EBAcg7S3bEYpIEKGzRyqQzru0= sha1:EvVUrqB3/nlvdqNEP3SzqX4dZjI= sha256:dam7DIaW6mtfFg6CGHVMa9kYlxsReJTB3aH72xNQxsA= sha1:E5GivWdmNAjJw1/ZRsTxe/r4kqA=
Xref: csiph.com comp.lang.c:398429

Michael S <already5chosen@yahoo.com> writes:

> On Sat, 02 May 2026 16:52:59 -0700
> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>
>> Bart <bc@freeuk.com> writes:
>>
>>> I'd like to know why C is OK with modular arithmetic for unsigned
>>> but not signed integers.  If the latter depended on the hardware,
>>> then why wasn't it just implementation defined?
>>
>> This sounds like a rhetorical question.  Do you know so little
>> about the history of computer hardware?
>
> May be, I know too little about the history of computer hardware.
> But I can not see what exactly is missing in my knowledge that makes
> conversion of out-of-range value from wider-or-unsigned integer type
> to signed integer type implementation-defined and at the same time
> makes signed integer overflow undefined.
> Somehow, I think that this distinction is not related to hardware
> or, at very least , not related to hardware alone.

I have been thinking about your questions.  It's been interesting
following the discussions about undefined behavior in contrast to
implementation-defined behavior.  Somewhere along the way I started
to appreciate that the issues are more subtle than I realized at
first.  The important thing is, I wanted to be sure I understood the
questions before giving an answer.  Related to that matter here is
an excerpt from one of your down-thread posts:

> Why trap on signed overflow during arithmetic has to be UB while
> trap on signed overflow during narrowing integer conversion is
> implementation defined?

It's important to understand the perspectives of different groups of
participants in the C ecosystem.  There are three main groups:

   If you're a programmer, you hate undefined behavior, and avoid it
   like the plague.

   If you're a compiler writer, you love undefined behavior, because
   it lets you do whatever you want.

   If you're a member of the ISO C standards committee (and I admit
   that to a degree I am speculating here), you think of undefined
   behavior as a balancing test, of needing to weigh the tensions
   inherent in what the first two groups would prefer.

I expect most people can understand the views of the first two
groups.  The third group is less obvious.  To simplify matters let's
suppose for a given language construct there are only two plausible
choices - undefined behavior or implementation-defined behavior.
First what do the two terms mean?

   Undefined behavior means the language doesn't say anything about
   what the semantics are for the construct in question.  It also
   means that it gives each implementor a choice about what they
   think the semantics should be.  Specifying undefined behavior
   is a way of letting each implementor decide independently what
   the construct should do in the context of their environment.
   Importantly there is no obligation to say or explain what that
   decision is.

   Implementation-defined behavior means there is a fixed set of
   possible behaviors allowed, and each implementation gets to
   choose which behavior among the fixed set it will put into
   effect.  Any behavior from the set may be chosen, but the
   implementation has an obligation to document which choice is made
   for each situation the C standard identifies as implementation
   defined.

The set of choices allowed need not be given explicitly, but even
when it isn't the range of choices must be bounded.  To say that
another way, implementation-defined behavior is never a license to
"do anything, as long as you write it down".  Each choice must be
suitable to the context where it is stated that a particular
behavior is implementation defined (and there will always be an
explicit such statement;  there is no such thing as "implicitly
implementation-defined behavior").

For the particular case we are considering -- what happens when
signed integer arithmetic overflows -- for the behavior to be
implementation-defined, the result must be something that can be
defined in terms of what happens in the C abstract machine.  For
example, in a two's complement environment where all bit patterns
are legal, the definition could be "xor together the two operands,
rotate the result by 23 bits, and use that as the value[*]."  Of
course that's a ridiculous rule, but it would be okay as far as the
C standard is concerned - for that environment.  But it would not be
okay for an environment that uses ones' complement and does not
support negative zeros, because there is no telling what would
happen after one of those results;  in other words the outcome is
not bounded.

[*] This rule applies only in situations where there is overflow.

In deciding about how to define the semantics of a particular
operation, the C standards committee needs to evaluate what is
feasible in all of the potential implementation environments.  The
word "feasible" here includes a variety of considerations:  what is
possible;  how much or how little difficulty is involved;  how much
or how little consensus is there about what the result should be;
and probably some others I haven't thought of.

Considering all that, it is easy to see why it was decided that
signed overflow should be undefined behavior.  Pretty much every
machine has a signed integer hardware data type, and also has an
instruction (or perhaps more than one) to add signed integers
together.  In the early days of computers, and even when the
original C standard was being worked on, there were still machines
that used ones' complement or signed magnitude for signed integers.
There was also, as was pointed out, System/360, which had a way of
trapping on signed overflow, even though two's complement was in
force;  that this feature still existed meant there were probably
some people who thought it was useful (and perhaps there still are,
I haven't kept up with IBM hardware).  Even two's complement is
allowed to have trap representations.  In order not to foreclose C
implementations for such environments, signed overflow had to be
made undefined behavior.

Conversely, there is little incentive to have unsigned-to-signed
conversion be other than implementation-defined behavior.  Most
likely there isn't a hardware conversion opcode, but even if/when
there is, because conversion is a single argument operation it's
easy and cheap to implement a trouble-free result (a comparison and
an addition, even for a non-two's-complement representation).  In
environments where signed integers don't support negative zeros
(which is to say, where a trap representation might be produced),
there is still just one specific bit pattern to work around.  In
short, there seems to be no reason to want out-of-bounds behavior,
and because it's always easy to effect in-bounds behavior that's the
choice to make (which is to say, implementation-defined behavior).

Let me address an additional question, which may have been touched
on in other postings although I am not sure of that.  What should we
do if we want to safely add signed integers, avoid nasal demons no
matter what, and are content with wrap-around semantics for cases
that "overflow"?  Here is a function to do that:

signed
safely_add( signed a, signed b ){
  unsigned  ua = a,  ub = b,  u = ua+ub;
    return  u <= INT_MAX  ? (int)u  : -INT_MAX + (int)(u-INT_MAX-1) - 1;
}

No undefined behavior, no circumstances where ID behavior comes into
play, and gives desired answer in essentially all environments (the
exceptions are environments where UINT_MAX != INT_MAX*2+1, which is
almost non-existent today).

Now, same question, but for multiplication.  The answer is almost
exactly the same:

signed
safely_multiply( signed a, signed b ){
  unsigned  ua = a,  ub = b,  u = ua*ub;
    return  u <= INT_MAX  ? (int)u  : -INT_MAX + (int)(u-INT_MAX-1) - 1;
}

Both gcc and clang compile these functions into one operation each
(along with 'ret').

Here is a little test driver folks may want to try:

#include <stdio.h>

int
main(){
    for(  signed i = -10000;  i <= 10000;  i++  ){
        for(  signed j = -10000;  j <= 10000;  j++  ){
            signed p = i*j;
            signed q = safely_multiply( i, j );
            if(  p == q  )  continue;
            printf( "  %6d * %6d =  %12d  or  %12d\n", i, j, p, q );
        }
    }
    printf( "  done.\n" );
    return  0;
}

Compiling this with -S -O2 may give an amusing result, for those who
want to try it.


I hope the foregoing explanation has helped convey my views on your
questions.