Path: csiph.com!eternal-september.org!feeder.eternal-september.org!nntp.eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch Newsgroups: comp.lang.c Subject: Re: Safety of casting from 'long' to 'int' Date: Wed, 06 May 2026 19:43:16 -0700 Organization: A noiseless patient Spider Lines: 186 Message-ID: <86o6isuegr.fsf@linuxsc.com> References: <10su8cn$am9i$1@dont-email.me> <10sv4v0$h9mn$1@dont-email.me> <84c1c180f4d5b96259a631bdb09b6054b4eb44d2.camel@gmail.com> <10svgfv$l2bu$1@dont-email.me> <10t4hse$22u36$1@dont-email.me> <97a1c40bf71cfe8edab25d5ac8a1ad435c3995e5.camel@gmail.com> <10t4tjd$25vb5$1@dont-email.me> <10t4viv$25van$2@dont-email.me> <10t55r6$28suo$1@dont-email.me> <86se89xtb8.fsf@linuxsc.com> <20260503082605.000073dc@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Date: Thu, 07 May 2026 02:43:21 +0000 (UTC) Injection-Info: dont-email.me; logging-data="1778146"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/sbsXqma2Zw6dCsuNvr6HATtSjDAEh+LA="; posting-host="a025a3924a4370d705b785ca24b22822" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:x0EBAcg7S3bEYpIEKGzRyqQzru0= sha1:EvVUrqB3/nlvdqNEP3SzqX4dZjI= sha256:dam7DIaW6mtfFg6CGHVMa9kYlxsReJTB3aH72xNQxsA= sha1:E5GivWdmNAjJw1/ZRsTxe/r4kqA= Xref: csiph.com comp.lang.c:398429 Michael S writes: > On Sat, 02 May 2026 16:52:59 -0700 > Tim Rentsch wrote: > >> Bart writes: >> >>> I'd like to know why C is OK with modular arithmetic for unsigned >>> but not signed integers. If the latter depended on the hardware, >>> then why wasn't it just implementation defined? >> >> This sounds like a rhetorical question. Do you know so little >> about the history of computer hardware? > > May be, I know too little about the history of computer hardware. > But I can not see what exactly is missing in my knowledge that makes > conversion of out-of-range value from wider-or-unsigned integer type > to signed integer type implementation-defined and at the same time > makes signed integer overflow undefined. > Somehow, I think that this distinction is not related to hardware > or, at very least , not related to hardware alone. I have been thinking about your questions. It's been interesting following the discussions about undefined behavior in contrast to implementation-defined behavior. Somewhere along the way I started to appreciate that the issues are more subtle than I realized at first. The important thing is, I wanted to be sure I understood the questions before giving an answer. Related to that matter here is an excerpt from one of your down-thread posts: > Why trap on signed overflow during arithmetic has to be UB while > trap on signed overflow during narrowing integer conversion is > implementation defined? It's important to understand the perspectives of different groups of participants in the C ecosystem. There are three main groups: If you're a programmer, you hate undefined behavior, and avoid it like the plague. If you're a compiler writer, you love undefined behavior, because it lets you do whatever you want. If you're a member of the ISO C standards committee (and I admit that to a degree I am speculating here), you think of undefined behavior as a balancing test, of needing to weigh the tensions inherent in what the first two groups would prefer. I expect most people can understand the views of the first two groups. The third group is less obvious. To simplify matters let's suppose for a given language construct there are only two plausible choices - undefined behavior or implementation-defined behavior. First what do the two terms mean? Undefined behavior means the language doesn't say anything about what the semantics are for the construct in question. It also means that it gives each implementor a choice about what they think the semantics should be. Specifying undefined behavior is a way of letting each implementor decide independently what the construct should do in the context of their environment. Importantly there is no obligation to say or explain what that decision is. Implementation-defined behavior means there is a fixed set of possible behaviors allowed, and each implementation gets to choose which behavior among the fixed set it will put into effect. Any behavior from the set may be chosen, but the implementation has an obligation to document which choice is made for each situation the C standard identifies as implementation defined. The set of choices allowed need not be given explicitly, but even when it isn't the range of choices must be bounded. To say that another way, implementation-defined behavior is never a license to "do anything, as long as you write it down". Each choice must be suitable to the context where it is stated that a particular behavior is implementation defined (and there will always be an explicit such statement; there is no such thing as "implicitly implementation-defined behavior"). For the particular case we are considering -- what happens when signed integer arithmetic overflows -- for the behavior to be implementation-defined, the result must be something that can be defined in terms of what happens in the C abstract machine. For example, in a two's complement environment where all bit patterns are legal, the definition could be "xor together the two operands, rotate the result by 23 bits, and use that as the value[*]." Of course that's a ridiculous rule, but it would be okay as far as the C standard is concerned - for that environment. But it would not be okay for an environment that uses ones' complement and does not support negative zeros, because there is no telling what would happen after one of those results; in other words the outcome is not bounded. [*] This rule applies only in situations where there is overflow. In deciding about how to define the semantics of a particular operation, the C standards committee needs to evaluate what is feasible in all of the potential implementation environments. The word "feasible" here includes a variety of considerations: what is possible; how much or how little difficulty is involved; how much or how little consensus is there about what the result should be; and probably some others I haven't thought of. Considering all that, it is easy to see why it was decided that signed overflow should be undefined behavior. Pretty much every machine has a signed integer hardware data type, and also has an instruction (or perhaps more than one) to add signed integers together. In the early days of computers, and even when the original C standard was being worked on, there were still machines that used ones' complement or signed magnitude for signed integers. There was also, as was pointed out, System/360, which had a way of trapping on signed overflow, even though two's complement was in force; that this feature still existed meant there were probably some people who thought it was useful (and perhaps there still are, I haven't kept up with IBM hardware). Even two's complement is allowed to have trap representations. In order not to foreclose C implementations for such environments, signed overflow had to be made undefined behavior. Conversely, there is little incentive to have unsigned-to-signed conversion be other than implementation-defined behavior. Most likely there isn't a hardware conversion opcode, but even if/when there is, because conversion is a single argument operation it's easy and cheap to implement a trouble-free result (a comparison and an addition, even for a non-two's-complement representation). In environments where signed integers don't support negative zeros (which is to say, where a trap representation might be produced), there is still just one specific bit pattern to work around. In short, there seems to be no reason to want out-of-bounds behavior, and because it's always easy to effect in-bounds behavior that's the choice to make (which is to say, implementation-defined behavior). Let me address an additional question, which may have been touched on in other postings although I am not sure of that. What should we do if we want to safely add signed integers, avoid nasal demons no matter what, and are content with wrap-around semantics for cases that "overflow"? Here is a function to do that: signed safely_add( signed a, signed b ){ unsigned ua = a, ub = b, u = ua+ub; return u <= INT_MAX ? (int)u : -INT_MAX + (int)(u-INT_MAX-1) - 1; } No undefined behavior, no circumstances where ID behavior comes into play, and gives desired answer in essentially all environments (the exceptions are environments where UINT_MAX != INT_MAX*2+1, which is almost non-existent today). Now, same question, but for multiplication. The answer is almost exactly the same: signed safely_multiply( signed a, signed b ){ unsigned ua = a, ub = b, u = ua*ub; return u <= INT_MAX ? (int)u : -INT_MAX + (int)(u-INT_MAX-1) - 1; } Both gcc and clang compile these functions into one operation each (along with 'ret'). Here is a little test driver folks may want to try: #include int main(){ for( signed i = -10000; i <= 10000; i++ ){ for( signed j = -10000; j <= 10000; j++ ){ signed p = i*j; signed q = safely_multiply( i, j ); if( p == q ) continue; printf( " %6d * %6d = %12d or %12d\n", i, j, p, q ); } } printf( " done.\n" ); return 0; } Compiling this with -S -O2 may give an amusing result, for those who want to try it. I hope the foregoing explanation has helped convey my views on your questions.