Groups > comp.std.c > #6321 > unrolled thread

contradiction about the INFINITY macro

Started by	Vincent Lefevre <vincent-news@vinc17.net>
First post	2021-09-30 01:47 +0000
Last post	2021-09-30 03:20 +0100
Articles	20 on this page of 80 — 10 participants

Back to article view | Back to comp.std.c

  contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-09-30 01:47 +0000
    Re: contradiction about the INFINITY macro Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-09-29 19:05 -0700
      Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-09-30 11:24 +0000
        Re: contradiction about the INFINITY macro Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-09-30 08:38 -0700
          Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-10-01 09:05 +0000
            Re: contradiction about the INFINITY macro Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-10-01 12:20 -0700
              Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-10-04 09:26 +0000
                Re: contradiction about the INFINITY macro Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-10-04 10:34 -0700
                  Re: contradiction about the INFINITY macro Geoff Clare <geoff@clare.See-My-Signature.invalid> - 2021-10-05 13:53 +0100
                    Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-10-06 00:12 +0000
                  Re: contradiction about the INFINITY macro Tim Rentsch <tr.17687@z991.linuxsc.com> - 2021-10-07 07:05 -0700
                    Re: contradiction about the INFINITY macro Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-10-07 07:51 -0700
                      Re: contradiction about the INFINITY macro Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-10-08 11:41 -0700
                        Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-10-09 19:49 +0000
                          Re: contradiction about the INFINITY macro Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-10-09 14:28 -0700
            Re: contradiction about the INFINITY macro Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2021-10-01 22:55 +0200
              Re: contradiction about the INFINITY macro Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-10-01 14:26 -0700
            Re: contradiction about the INFINITY macro Tim Rentsch <tr.17687@z991.linuxsc.com> - 2021-10-08 08:30 -0700
              Re: contradiction about the INFINITY macro Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-10-08 11:40 -0700
                Re: contradiction about the INFINITY macro Tim Rentsch <tr.17687@z991.linuxsc.com> - 2021-12-17 21:00 -0800
              Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-10-09 20:05 +0000
                Re: contradiction about the INFINITY macro James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-10-11 12:40 -0400
                Re: contradiction about the INFINITY macro Tim Rentsch <tr.17687@z991.linuxsc.com> - 2021-12-17 21:02 -0800
        Re: contradiction about the INFINITY macro Tim Rentsch <tr.17687@z991.linuxsc.com> - 2021-10-08 00:02 -0700
          Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-10-09 20:17 +0000
            Re: contradiction about the INFINITY macro James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-10-11 12:40 -0400
              Re: contradiction about the INFINITY macro Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-10-11 12:39 -0700
                Re: contradiction about the INFINITY macro James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-10-11 21:04 -0400
                  Re: contradiction about the INFINITY macro Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-10-11 18:33 -0700
                    Re: contradiction about the INFINITY macro Tim Rentsch <tr.17687@z991.linuxsc.com> - 2022-01-03 12:03 -0800
                      Re: contradiction about the INFINITY macro Richard Damon <Richard@Damon-Family.org> - 2022-01-03 16:45 -0500
                        Re: contradiction about the INFINITY macro Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2022-01-03 14:36 -0800
                        Re: contradiction about the INFINITY macro James Kuyper <jameskuyper@alumni.caltech.edu> - 2022-01-04 02:10 -0500
                        Re: contradiction about the INFINITY macro Tim Rentsch <tr.17687@z991.linuxsc.com> - 2022-01-17 10:09 -0800
                Re: contradiction about the INFINITY macro Tim Rentsch <tr.17687@z991.linuxsc.com> - 2022-01-03 11:55 -0800
              Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-10-26 10:01 +0000
                Re: contradiction about the INFINITY macro James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-10-26 12:53 -0400
                  Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-10-28 09:38 +0000
                    Re: contradiction about the INFINITY macro James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-10-28 11:23 -0400
                      Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-10-29 12:12 +0000
                        Re: contradiction about the INFINITY macro James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-10-30 02:08 -0400
                          Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-11-08 02:44 +0000
                            Re: contradiction about the INFINITY macro James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-11-08 01:46 -0500
                              Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-11-08 10:56 +0000
                                Re: contradiction about the INFINITY macro James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-11-08 13:50 -0500
                                  Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-11-09 02:48 +0000
                                    Re: contradiction about the INFINITY macro James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-11-09 00:50 -0500
                                      Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-11-09 10:12 +0000
                                        Re: contradiction about the INFINITY macro James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-11-09 12:51 -0500
                                          Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-11-10 12:48 +0000
                                            Re: contradiction about the INFINITY macro James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-11-10 12:03 -0500
                                              Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-11-12 23:17 +0000
                                                Re: contradiction about the INFINITY macro James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-11-12 21:03 -0500
                                                  Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-11-15 09:18 +0000
                                                    Re: contradiction about the INFINITY macro James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-11-15 14:25 -0500
                                                      Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-11-16 01:17 +0000
                                                        Re: contradiction about the INFINITY macro James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-11-16 10:29 -0500
                                                          Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-12-08 10:09 +0000
                                                      Re: contradiction about the INFINITY macro Derek Jones <derek@NOSPAM-knosof.co.uk> - 2021-11-16 11:32 +0000
                                                        Re: contradiction about the INFINITY macro James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-11-16 10:35 -0500
            Re: contradiction about the INFINITY macro Tim Rentsch <tr.17687@z991.linuxsc.com> - 2021-11-09 07:13 -0800
              Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-11-10 13:16 +0000
                Re: contradiction about the INFINITY macro Tim Rentsch <tr.17687@z991.linuxsc.com> - 2021-11-10 08:02 -0800
                  Re: contradiction about the INFINITY macro Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-11-10 15:01 -0800
                    Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-11-13 00:30 +0000
                    Re: contradiction about the INFINITY macro Thomas Koenig <tkoenig@netcologne.de> - 2021-12-02 22:14 +0000
                    Re: contradiction about the INFINITY macro Tim Rentsch <tr.17687@z991.linuxsc.com> - 2022-01-03 12:48 -0800
                  Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-11-12 23:55 +0000
                    Re: contradiction about the INFINITY macro Tim Rentsch <tr.17687@z991.linuxsc.com> - 2021-11-15 07:59 -0800
                      Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-11-15 23:39 +0000
                        Re: contradiction about the INFINITY macro James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-11-15 20:00 -0500
                          Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-11-16 01:28 +0000
                            Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-11-16 01:57 +0000
                            Re: contradiction about the INFINITY macro James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-11-16 09:52 -0500
                              Re: contradiction about the INFINITY macro Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-11-16 19:00 -0800
                                Re: contradiction about the INFINITY macro Vincent Lefevre <vincent-news@vinc17.net> - 2021-12-08 10:56 +0000
                                Re: contradiction about the INFINITY macro Tim Rentsch <tr.17687@z991.linuxsc.com> - 2022-01-03 12:56 -0800
                                  Re: contradiction about the INFINITY macro James Kuyper <jameskuyper@alumni.caltech.edu> - 2022-01-03 22:45 -0800
                                    Re: contradiction about the INFINITY macro Tim Rentsch <tr.17687@z991.linuxsc.com> - 2022-01-17 05:35 -0800
    Re: contradiction about the INFINITY macro Ben Bacarisse <ben.usenet@bsb.me.uk> - 2021-09-30 03:20 +0100

Page 3 of 4 — ← Prev page 1 2 [3] 4 Next page →

#6363

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2021-10-30 02:08 -0400
Message-ID	<slingl$56v$1@dont-email.me>
In reply to	#6362

On 10/29/21 8:12 AM, Vincent Lefevre wrote:
> In article <slef9t$98j$2@dont-email.me>,
>   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> 
>> On 10/28/21 5:38 AM, Vincent Lefevre wrote:
>>> In article <sl9bqb$hf5$2@dont-email.me>,
>>>   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
>>>
>>>> On 10/26/21 6:01 AM, Vincent Lefevre wrote:
...
>> 7.12.1p5 describes the math library, not the handling of floating point
>> constants. While the C standard does recommended that "The
>> translation-time conversion of floating constants should match the
>> execution-time conversion of character strings by library functions,
>> such as strtod , given matching inputs suitable for both conversions,
>> the same result format, and default execution-time rounding."
>> (6.4.4.2p11), it does not actually require such a match. Therefore, if
>> there is any inconsistency  it would not be problematic.
> 
> Yes, but this means that any implicit use of overflow is not
> perfectly clear.

What is unclear about it? It very explicitly allows three different
values, deliberately failing to specify only one of them as valid, and
it is perfectly clear what those three values are.

...
>> 7.12.1p5 goes on to say that "If a floating result overflows and default
>> rounding is in effect, then the function returns the value of the macro
>> HUGE_VAL ...".
>> As cited above, the standard recommends, but does not require, the use
>> of default execution-time rounding mode for floating point constants.
>> HUGE_VAL is only required to be positive (7.12p6) - it could be as small
>> as DBL_MIN.
> 
> Note that C2x (in particular, the current draft N2731) requires that
> nextup(HUGE_VAL) be HUGE_VAL, probably assuming that HUGE_VAL is the
> maximum value. I've just sent a mail to the CFP list about that.

I've just downloaded N2731.pdf. Yes, that is an improvement over the
previous specification, and strengthens my argument: the value that is
required by 7.12.1p5 for strtod() in the event of overflow is now always
one of two or three values permitted by 6.4.4.2p4 for overflowing
floating-point constants, regardless of whether the floating point
format supports infinities or IEEE 754.

...
>> about normalization. Neither 7.12.5p1 nor 7.12p6 say anything to require
>> that the value be normalized. Therefore, as far as I can see, DBL_MAX is
>> the relevant value.
> 
> But DBL_NORM_MAX is the relevant value for the general definition
> of "overflow" (on double). So in 7.12p4, "overflows" is not used
> correctly, at least not this the usual meaning.

What do you consider the "general definition of overflow"? I would have
though you were referring to 7.12.1p5, but I see no wording there that
distinguishes between normalized and unnormalized values.

> More than that, with the IEEE 754 overflow definition, you have
> numbers larger than DBL_MAX (up to those within 1 ulp) that do not
> overflow.

I don't see how that's a problem.

...
>>>> No definition by the standard is needed; the conventional mathematical
>>>> definitions of "nearest" are sufficient. If infinity is representable,
>>>> DBL_MAX is always nearer to any finite value than infinity is.
>>>> Regardless of whether infinity is representable, any finite value
>>>> greater than DBL_MAX is closer to DBL_MAX than it is to any other
>>>> representable value.
>>>
>>> The issue is that this may easily be confused with the result
>>> obtained in the FE_TONEAREST rounding mode with the IEEE 754 rules
>>> (where, for instance, 2*DBL_MAX rounds to +Inf, not to DBL_MAX,
>>> despite the fact that 2*DBL_MAX is closer to DBL_MAX than to +Inf).
> 
>> Yes, and DBL_MAX and +Inf are two of the three values permitted by
>> 6.4.4.2p4, so I don't see any conflict there.
> 
> My point is that this definition of "nearest" does not match the
> definition of IEEE 754's FE_TONEAREST.

FE_TONEAREST is not "IEEE 754's". It is a macro defined by the C
standard, and in the latest draft it's been changed so it now represents
IEC 60559's "roundTiesToEven" rounding attribute.

The C standard does not define "nearest", it merely uses it in the
phrase "nearest representable value", the same exact phrase used for
exactly the same purpose by IEC 60559 while describing the
roundTiesToEven rounding attribute. Note that I'm not saying that
roundTiesToEven is defined as producing the "nearest representable
value" - only that the specification starts out from that phrase, and
then adds complications to it, such as how ties and overflows are handled.

Section 6.4.4.2p4 uses "nearest representable value" to identify one of
the three permitted values, and uses that value to determine the other
two permitted values. It does not define a rounding mode, and was not
intended to do so. But every IEC 60559 rounding mode selects one of the
three values permitted by 6.4.4.2p4.

> ... I'm not saying that there
> is a conflict, just that the text is ambiguous. If one follows
> the IEEE 754 definition, there are only two possible values
> (DBL_MAX and +Inf, thus excluding nextdown(DBL_MAX)).

Yes, that was deliberate - it was intended to be compatible with IEC
60559, but also to be sufficiently loose to allow use of non-IEC 60559
floating point.

[toc] | [prev] | [next] | [standalone]

#6364

From	Vincent Lefevre <vincent-news@vinc17.net>
Date	2021-11-08 02:44 +0000
Message-ID	<20211108014459$1725@zira.vinc17.org>
In reply to	#6363

In article <slingl$56v$1@dont-email.me>,
  James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

> On 10/29/21 8:12 AM, Vincent Lefevre wrote:
> > In article <slef9t$98j$2@dont-email.me>,
> >   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> > 
> >> On 10/28/21 5:38 AM, Vincent Lefevre wrote:
> >>> In article <sl9bqb$hf5$2@dont-email.me>,
> >>>   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> >>>
> >>>> On 10/26/21 6:01 AM, Vincent Lefevre wrote:
> ...
> >> 7.12.1p5 describes the math library, not the handling of floating point
> >> constants. While the C standard does recommended that "The
> >> translation-time conversion of floating constants should match the
> >> execution-time conversion of character strings by library functions,
> >> such as strtod , given matching inputs suitable for both conversions,
> >> the same result format, and default execution-time rounding."
> >> (6.4.4.2p11), it does not actually require such a match. Therefore, if
> >> there is any inconsistency  it would not be problematic.
> > 
> > Yes, but this means that any implicit use of overflow is not
> > perfectly clear.

> What is unclear about it? It very explicitly allows three different
> values, deliberately failing to specify only one of them as valid, and
> it is perfectly clear what those three values are.

These rules are not about overflow. They are general rules.

What is not defined is when a value overflows (there are different
definitions). And what is the consequence of the overflow (at runtime,
there may be traps).

> > But DBL_NORM_MAX is the relevant value for the general definition
> > of "overflow" (on double). So in 7.12p4, "overflows" is not used
> > correctly, at least not this the usual meaning.

> What do you consider the "general definition of overflow"?

The one given by the standard in 7.12.1p5.

> I would have though you were referring to 7.12.1p5, but I see no
> wording there that distinguishes between normalized and unnormalized
> values.

"A floating result overflows if the magnitude of the mathematical
result is finite but so large that the mathematical result cannot
be represented without extraordinary roundoff error in an object
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
of the specified type."

If the exact result is above the maximum normal value, there is
likely to be an extraordinary roundoff error.

> > More than that, with the IEEE 754 overflow definition, you have
> > numbers larger than DBL_MAX (up to those within 1 ulp) that do not
> > overflow.

> I don't see how that's a problem.

Your definition conflicts with IEEE 754.

Note also that overflow is also used for any floating-point expression
(not just math functions of the C library). See 7.6.2. And when Annex F
is supported, the IEEE 754 definition necessarily applies to the
associated FP types.

> ...
> >>>> No definition by the standard is needed; the conventional mathematical
> >>>> definitions of "nearest" are sufficient. If infinity is representable,
> >>>> DBL_MAX is always nearer to any finite value than infinity is.
> >>>> Regardless of whether infinity is representable, any finite value
> >>>> greater than DBL_MAX is closer to DBL_MAX than it is to any other
> >>>> representable value.
> >>>
> >>> The issue is that this may easily be confused with the result
> >>> obtained in the FE_TONEAREST rounding mode with the IEEE 754 rules
> >>> (where, for instance, 2*DBL_MAX rounds to +Inf, not to DBL_MAX,
> >>> despite the fact that 2*DBL_MAX is closer to DBL_MAX than to +Inf).
> > 
> >> Yes, and DBL_MAX and +Inf are two of the three values permitted by
> >> 6.4.4.2p4, so I don't see any conflict there.
> > 
> > My point is that this definition of "nearest" does not match the
> > definition of IEEE 754's FE_TONEAREST.

> FE_TONEAREST is not "IEEE 754's". It is a macro defined by the C
> standard, and in the latest draft it's been changed so it now represents
> IEC 60559's "roundTiesToEven" rounding attribute.

If Annex F is supported, FE_TONEAREST corresponds to the IEEE 754-1985
round-to-nearest mode. This is what I mean.

> > ... I'm not saying that there
> > is a conflict, just that the text is ambiguous. If one follows
> > the IEEE 754 definition, there are only two possible values
> > (DBL_MAX and +Inf, thus excluding nextdown(DBL_MAX)).

> Yes, that was deliberate - it was intended to be compatible with IEC
> 60559, but also to be sufficiently loose to allow use of non-IEC 60559
> floating point.

But what is allowed is not clear for an IEEE 754 format (this does
not affect the INFINITY macro, but users could write exact values
larger than DBL_MAX + 1 ulp, for which nextdown(DBL_MAX) could be
unexpected as the obtained value).

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

[toc] | [prev] | [next] | [standalone]

#6365

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2021-11-08 01:46 -0500
Message-ID	<smah3q$a9f$1@dont-email.me>
In reply to	#6364

On 11/7/21 9:44 PM, Vincent Lefevre wrote:
> In article <slingl$56v$1@dont-email.me>,
>   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> 
>> On 10/29/21 8:12 AM, Vincent Lefevre wrote:
...
>>> Yes, but this means that any implicit use of overflow is not
>>> perfectly clear.
> 
>> What is unclear about it? It very explicitly allows three different
>> values, deliberately failing to specify only one of them as valid, and
>> it is perfectly clear what those three values are.
> 
> These rules are not about overflow. They are general rules.

Yes, and they are sufficiently general that it is perfectly clear how
they apply to the case when there is overflow.

> What is not defined is when a value overflows (there are different
> definitions). And what is the consequence of the overflow (at runtime,
> there may be traps).

We're talking about floating point constants here. The standard clearly
specifies that "Floating constants are converted to internal format as
if at translation-time. The conversion of a floating constant shall not
raise an exceptional condition or a floating-point exception at
execution time." Runtime behavior is not the issue, and traps are not
allowed.

The standard describes two cases: if infinities are supported (as they
necessarily are when IEEE formats are used), INFINITY is required to
expand to a constant expression that represents positive or unsigned
infinity. This is not outside the range of representable values - that
range includes either positive or unsigned infinity, so the constraint
in 6.4.4p2 is not violated.

If infinities are not supported (which is therefore necessarily not an
IEEE format), then INFINITY is required to expand to a constant that
will overflow. This does violate that constraint, which means that a
diagnostic message is required.

That's why it confuses me that you're talking about INFINITY violating
the constraint in 6.4.4p2 and the requirements of IEEE 754 at the same
time. If float uses an IEEE 754 floating point format, the way that
INFINITY is required to be defined doesn't violate that constraint.

It's normally the case that, when a constraint is violated, the behavior
is undefined. However, that's not because of anything the standard says
about constraint violations in general. It's because, in most cases, the
behavior is undefined "by the omission of any explicit definition of
behavior." (4p2). However, this is one of the rare exceptions: there is
no such omission. There is a general definition of the behavior that
continues to apply in a perfectly clear fashion even in the event of
overflow. Therefore, an implementation is required to assign a value to
such a constant that is one of the two identified by that definition,
either FLT_MAX or nextdownf(FLT_MAX).

>>> But DBL_NORM_MAX is the relevant value for the general definition
>>> of "overflow" (on double). So in 7.12p4, "overflows" is not used
>>> correctly, at least not this the usual meaning.
> 
>> What do you consider the "general definition of overflow"?
> 
> The one given by the standard in 7.12.1p5.
> 
>> I would have though you were referring to 7.12.1p5, but I see no
>> wording there that distinguishes between normalized and unnormalized
>> values.
> 
> "A floating result overflows if the magnitude of the mathematical
> result is finite but so large that the mathematical result cannot
> be represented without extraordinary roundoff error in an object
>                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> of the specified type."
> 
> If the exact result is above the maximum normal value, there is
> likely to be an extraordinary roundoff error.

Your comment made me realize that I had no idea how DBL_NORM_MAX could
possibly be less than DBL_MAX. I did some searching, and discovered
official text of a committee decision indicating that they are normally
the same - the only exception known to the committee was systems that
implemented a long double as the sum of a pair of doubles, for which
LDBL_MAX == 2.0L*DBL_MAX, while LDBL_NORM_MAX is just slightly larger
than DBL_MAX.

However, I'm confused about how this connects to the standard's
definition of normalized floating-point numbers: "f_1 > 0"
(5.2.4.2.2p4). It seems to me that, even for the pair-of-doubles format,
LDBL_MAX is represented by a value with f_1 = 1, and therefore is a
normalized floating point number that is larger than LDBL_NORM_MAX,
which strikes me as a contradiction.

In any event, INFINITY is required to expand into an expression of type
"float", so if the only known exception involves long double, it's not
very relevant.

...
>>> ... I'm not saying that there
>>> is a conflict, just that the text is ambiguous. If one follows
>>> the IEEE 754 definition, there are only two possible values
>>> (DBL_MAX and +Inf, thus excluding nextdown(DBL_MAX)).
> 
>> Yes, that was deliberate - it was intended to be compatible with IEC
>> 60559, but also to be sufficiently loose to allow use of non-IEC 60559
>> floating point.
> 
> But what is allowed is not clear for an IEEE 754 format (this does
> not affect the INFINITY macro, but users could write exact values
> larger than DBL_MAX + 1 ulp, for which nextdown(DBL_MAX) could be
> unexpected as the obtained value).

It's unexpected because that would violate a requirement of IEEE 754,
but the C standard doesn't require violating that requirement. Section
6.4.4.2p4 of the C standard allows such a constant to have any one of
the three values (+infinity, FLT_MAX, or nextdownf(FLT_MAX)).
Therefore, an implementation that wants to conform to both the C
standard and IEEE 754 must select FLT_MAX. What's unclear or ambiguous
about that?

[toc] | [prev] | [next] | [standalone]

#6366

From	Vincent Lefevre <vincent-news@vinc17.net>
Date	2021-11-08 10:56 +0000
Message-ID	<20211108093020$5609@zira.vinc17.org>
In reply to	#6365

In article <smah3q$a9f$1@dont-email.me>,
  James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

> On 11/7/21 9:44 PM, Vincent Lefevre wrote:
> > In article <slingl$56v$1@dont-email.me>,
> >   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> > 
> >> On 10/29/21 8:12 AM, Vincent Lefevre wrote:
> ...
> >>> Yes, but this means that any implicit use of overflow is not
> >>> perfectly clear.
> > 
> >> What is unclear about it? It very explicitly allows three different
> >> values, deliberately failing to specify only one of them as valid, and
> >> it is perfectly clear what those three values are.
> > 
> > These rules are not about overflow. They are general rules.

> Yes, and they are sufficiently general that it is perfectly clear how
> they apply to the case when there is overflow.

I've done some tests, and it is interesting to see that both GCC and
Clang choose the IEEE 754 definition of overflow on floating-point
constants, not yours (<sl9bqb$hf5$2@dont-email.me>). For instance,
the exact value of 0x1.fffffffffffff7p1023 is larger than DBL_MAX,
but it doesn't trigger an overflow warning with GCC and Clang.

Note these are warnings really about overflow, and not about the range
of floating-point numbers.

> > What is not defined is when a value overflows (there are different
> > definitions). And what is the consequence of the overflow (at runtime,
> > there may be traps).

> We're talking about floating point constants here. The standard clearly
> specifies that "Floating constants are converted to internal format as
> if at translation-time. The conversion of a floating constant shall not
> raise an exceptional condition or a floating-point exception at
> execution time." Runtime behavior is not the issue, and traps are not
> allowed.

I agree. But the question is whether the compiler may choose to
stop the compilation.

There is a confusion in the standard, because 6.4.4p2 says
"the value of a constant" while "value" is defined by 3.19
and means the value of the object, while I suspect that
6.4.4p2 intends to mean the *exact* value.

> The standard describes two cases: if infinities are supported (as they
> necessarily are when IEEE formats are used), INFINITY is required to
> expand to a constant expression that represents positive or unsigned
> infinity. This is not outside the range of representable values - that
> range includes either positive or unsigned infinity, so the constraint
> in 6.4.4p2 is not violated.

The range includes all real numbers, but not infinities. No issues
with INFINITY, but my remark was about the case a user would write
a constant like 0x1.0p1024 (or 1.0e999). Such constants are in the
range of floating-point numbers (which is the set of real numbers in
this case), but this constant overflows with the IEEE 754 meaning,
and both GCC and Clang emits a warning for this reason.

Note that if the intent were "exceeds the range", the C standard
should have said that.

> If infinities are not supported (which is therefore necessarily not an
> IEEE format), then INFINITY is required to expand to a constant that
> will overflow. This does violate that constraint, which means that a
> diagnostic message is required.

This point is not clear and does not match what implementations
consider as overflow.

> It's normally the case that, when a constraint is violated, the behavior
> is undefined. However, that's not because of anything the standard says
> about constraint violations in general. It's because, in most cases, the
> behavior is undefined "by the omission of any explicit definition of
> behavior." (4p2). However, this is one of the rare exceptions: there is
> no such omission. There is a general definition of the behavior that
> continues to apply in a perfectly clear fashion even in the event of
> overflow. Therefore, an implementation is required to assign a value to
> such a constant that is one of the two identified by that definition,
> either FLT_MAX or nextdownf(FLT_MAX).

I think that I was initially confused by the meaning of "value".
in 6.4.4p2, as it seems to imply that a converted value may be
outside the range of representable values. It seems that it was
written mainly with integer constants in mind.

But there's still the fact that "overflow" is not defined (this
term is used only when there are no infinities, though).

> >>> But DBL_NORM_MAX is the relevant value for the general definition
> >>> of "overflow" (on double). So in 7.12p4, "overflows" is not used
> >>> correctly, at least not this the usual meaning.
> > 
> >> What do you consider the "general definition of overflow"?
> > 
> > The one given by the standard in 7.12.1p5.
> > 
> >> I would have though you were referring to 7.12.1p5, but I see no
> >> wording there that distinguishes between normalized and unnormalized
> >> values.
> > 
> > "A floating result overflows if the magnitude of the mathematical
> > result is finite but so large that the mathematical result cannot
> > be represented without extraordinary roundoff error in an object
> >                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > of the specified type."
> > 
> > If the exact result is above the maximum normal value, there is
> > likely to be an extraordinary roundoff error.

> Your comment made me realize that I had no idea how DBL_NORM_MAX could
> possibly be less than DBL_MAX. I did some searching, and discovered
> official text of a committee decision indicating that they are normally
> the same - the only exception known to the committee was systems that
> implemented a long double as the sum of a pair of doubles, for which
> LDBL_MAX == 2.0L*DBL_MAX, while LDBL_NORM_MAX is just slightly larger
> than DBL_MAX.

The case LDBL_MAX == 2.0L*DBL_MAX is an hypothetical system, but
allowed by the C standard. However, the more general fact that there
may be finite values above the maximum normal floating-point number
justifies the definition of macros like DBL_NORM_MAX. The intent is
to say that if a computed value is larger than DBL_NORM_MAX, then
there may have been a loss of accuracy. In error analysis, this is
what is meant by "overflow". (Note that one may also have an overflow
if one gets DBL_NORM_MAX, e.g. when DBL_NORM_MAX = DBL_MAX and
rounding is toward 0, with an exact value ≥ DBL_MAX + 1 ulp.)

The point of overflow and underflow exceptions is to signal that a
conventional error analysis may no longer be valid.

> However, I'm confused about how this connects to the standard's
> definition of normalized floating-point numbers: "f_1 > 0"
> (5.2.4.2.2p4). It seems to me that, even for the pair-of-doubles format,
> LDBL_MAX is represented by a value with f_1 = 1, and therefore is a
> normalized floating point number that is larger than LDBL_NORM_MAX,
> which strikes me as a contradiction.

Note that there is a requirement on the exponent: e ≤ e_max.

> In any event, INFINITY is required to expand into an expression of type
> "float", so if the only known exception involves long double, it's not
> very relevant.

One could imaging a non-IEEE 754 system where float would not be
a strict FP format. (I'm wondering whether there are attempts to
replace the FP formats by unums, at least for testing purpose.)

> ...
> >>> ... I'm not saying that there
> >>> is a conflict, just that the text is ambiguous. If one follows
> >>> the IEEE 754 definition, there are only two possible values
> >>> (DBL_MAX and +Inf, thus excluding nextdown(DBL_MAX)).
> > 
> >> Yes, that was deliberate - it was intended to be compatible with IEC
> >> 60559, but also to be sufficiently loose to allow use of non-IEC 60559
> >> floating point.
> > 
> > But what is allowed is not clear for an IEEE 754 format (this does
> > not affect the INFINITY macro, but users could write exact values
> > larger than DBL_MAX + 1 ulp, for which nextdown(DBL_MAX) could be
> > unexpected as the obtained value).

> It's unexpected because that would violate a requirement of IEEE 754,
> but the C standard doesn't require violating that requirement. Section
> 6.4.4.2p4 of the C standard allows such a constant to have any one of
> the three values (+infinity, FLT_MAX, or nextdownf(FLT_MAX)).
> Therefore, an implementation that wants to conform to both the C
> standard and IEEE 754 must select FLT_MAX. What's unclear or ambiguous
> about that?

If Annex F is not claimed to be supported[*], this requirement would
not be violated.

[*] For instance, for systems that almost support this annex, but
not completely.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

[toc] | [prev] | [next] | [standalone]

#6367

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2021-11-08 13:50 -0500
Message-ID	<smbrgo$g4b$1@dont-email.me>
In reply to	#6366

On 11/8/21 5:56 AM, Vincent Lefevre wrote:
> In article <smah3q$a9f$1@dont-email.me>,
>   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> 
>> On 11/7/21 9:44 PM, Vincent Lefevre wrote:...
>>> These rules are not about overflow. They are general rules.
> 
>> Yes, and they are sufficiently general that it is perfectly clear how
>> they apply to the case when there is overflow.
> 
> I've done some tests, and it is interesting to see that both GCC and
> Clang choose the IEEE 754 definition of overflow on floating-point
> constants, not yours (<sl9bqb$hf5$2@dont-email.me>).

The only definition for overflow that I discussed is not mine, it
belongs to the C standard: "A floating result overflows if the magnitude
(absolute value) of the mathematical result is finite but so large that
the mathematical result cannot be represented without extraordinary
roundoff error in an object of the specified type." (7.12.1p5).

> ... For instance,
> the exact value of 0x1.fffffffffffff7p1023 is larger than DBL_MAX,
> but it doesn't trigger an overflow warning with GCC and Clang.

No warning is mandated for overflows, so that doesn't contradict
anything I said.

I wasn't talking about overflow for it's own sake, but only in the
context of what the standard says about the value of floating point
constants. What value does that constant have? Is it one of the three
values permitted by 6.4.4.2p4? Is it, in particular, the value required
by IEEE 754? If the answers to both questions are yes, it's consistent
with everything I said.

>>> What is not defined is when a value overflows (there are different
>>> definitions). And what is the consequence of the overflow (at runtime,
>>> there may be traps).
> 
>> We're talking about floating point constants here. The standard clearly
>> specifies that "Floating constants are converted to internal format as
>> if at translation-time. The conversion of a floating constant shall not
>> raise an exceptional condition or a floating-point exception at
>> execution time." Runtime behavior is not the issue, and traps are not
>> allowed.
> 
> I agree. But the question is whether the compiler may choose to
> stop the compilation.

I don't remember that issue having previously been raised.

"The implementation shall not successfully translate a preprocessing
translation unit containing a #error preprocessing directive unless it
is part of a group skipped by conditional inclusion." (4p4).

"The implementation shall be able to translate and execute at least one
program that contains at least one instance of every one of the
following limits:" (5.2.4.1p1).

In all other cases, stopping compilation is neither mandatory nor
prohibited.

> There is a confusion in the standard, because 6.4.4p2 says
> "the value of a constant" while "value" is defined by 3.19
> and means the value of the object, while I suspect that
> 6.4.4p2 intends to mean the *exact* value.

The term "representable value" is used in 23 places in the standard,
including 6.4.4.2p4. That term would be redundant if the term "value"
only had meaning when it could be represented. That interpretation would
render all 23 of those clauses meaningless, including 6.4.4.2p4.

The standard frequently uses the term "value" to refer to the
mathematical value of something, which isn't necessarily representable
in any type, and in particular, need not be representable in the
particular type relevant to the discussion. This is usually done in the
context of defining how the requirements imposed by the C standard
depend upon whether or not the mathematical value is representable or in
the range of representable values, as is the case in 6.4.4p2.

I will agree that it would be clearer to either modify the definition of
"value" to include such usage, or to define and consistently use some
other term (such as "mathematical result", which is used for this
purpose only in the discussions of floating point overflow and underflow).

Note that IEEE 754 uses the same idea in it's description of overflow:
"...by what would have been the rounded floating point result were the
exponent range unbounded."

...
>> The standard describes two cases: if infinities are supported (as they
>> necessarily are when IEEE formats are used), INFINITY is required to
>> expand to a constant expression that represents positive or unsigned
>> infinity. This is not outside the range of representable values - that
>> range includes either positive or unsigned infinity, so the constraint
>> in 6.4.4p2 is not violated.
> 
> The range includes all real numbers, but not infinities.

For an implementation that supports infinities (in other words, an
implementation where infinities are representable), how do infinities
fail to qualify as being within the range of representable values? Where
is that exclusion specified? Such formats correspond to affinely
extended real number systems, which differ from ordinary real number
systems by including -infinity and +infinity. IEEE 754 specifies that
infinities are to be interpreted in the affine sense.

> ... No issues
> with INFINITY, but my remark was about the case a user would write
> a constant like 0x1.0p1024 (or 1.0e999). Such constants are in the
> range of floating-point numbers (which is the set of real numbers in
> this case), but this constant overflows with the IEEE 754 meaning,

It also overflows with the C standard's meaning.

> and both GCC and Clang emits a warning for this reason.
> 
> Note that if the intent were "exceeds the range", the C standard
> should have said that.

I'm sorry - I seem to have lost the thread of your argument. In which
location in the current standard do you think the current wording would
need to be changed to "exceeds the range", in to support my argument?
Which current phrase would need to be replaced, and why?

>> If infinities are not supported (which is therefore necessarily not an
>> IEEE format), then INFINITY is required to expand to a constant that
>> will overflow. This does violate that constraint, which means that a
>> diagnostic message is required.
> 
> This point is not clear and does not match what implementations
> consider as overflow.

Which implementations did you test on, which don't support infinities,
in order to justify that conclusion? In my experience, such
implementations are rare. The only systems I've ever used that didn't
support infinities, failed to do so because they didn't support floating
point at all.

...
> I think that I was initially confused by the meaning of "value".
> in 6.4.4p2, as it seems to imply that a converted value may be
> outside the range of representable values.

Correct. This is an example of a context where it is referring to the
mathematical value, rather than a necessarily representable value.

> ... It seems that it was
> written mainly with integer constants in mind.

I think not - constants with a value that cannot be represented occur in
integer constants, floating constants, and character constants, which is
why that paragraph appears in 6.4.4p2. If it were meant only for integer
constants, it would have been under 6.4.4.1.

> But there's still the fact that "overflow" is not defined (this
> term is used only when there are no infinities, though).

7.12.1p5 is not marked as a definition for "overflows", but has the form
of a definition. There is no restriction within 7.12.1 to
implementations that don't support infinities.

>> However, I'm confused about how this connects to the standard's
>> definition of normalized floating-point numbers: "f_1 > 0"
>> (5.2.4.2.2p4). It seems to me that, even for the pair-of-doubles format,
>> LDBL_MAX is represented by a value with f_1 = 1, and therefore is a
>> normalized floating point number that is larger than LDBL_NORM_MAX,
>> which strikes me as a contradiction.
> 
> Note that there is a requirement on the exponent: e ≤ e_max.

Yes, and DBL_MAX has e==e_max.

...
>>> But what is allowed is not clear for an IEEE 754 format (this does
>>> not affect the INFINITY macro, but users could write exact values
>>> larger than DBL_MAX + 1 ulp, for which nextdown(DBL_MAX) could be
>>> unexpected as the obtained value).
> 
>> It's unexpected because that would violate a requirement of IEEE 754,
>> but the C standard doesn't require violating that requirement. Section
>> 6.4.4.2p4 of the C standard allows such a constant to have any one of
>> the three values (+infinity, FLT_MAX, or nextdownf(FLT_MAX)).
>> Therefore, an implementation that wants to conform to both the C
>> standard and IEEE 754 must select FLT_MAX. What's unclear or ambiguous
>> about that?

I had originally intended that paragraph to be about INFINITY, where
FLT_MAX is the relevant limit, but you were explicitly talking about
DBL_MAX+1ulp, so I should have changed all instances of FLT in that
paragraph to DBL_MAX.

> If Annex F is not claimed to be supported[*], this requirement would
> not be violated.

And if Annex F were claimed to be supported, this requirement would
still not be violated by giving that constant a value of DBL_MAX. That
value satisfies all applicable requirements of either standard.

[toc] | [prev] | [next] | [standalone]

#6368

From	Vincent Lefevre <vincent-news@vinc17.net>
Date	2021-11-09 02:48 +0000
Message-ID	<20211109010315$1773@zira.vinc17.org>
In reply to	#6367

In article <smbrgo$g4b$1@dont-email.me>,
  James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

> On 11/8/21 5:56 AM, Vincent Lefevre wrote:
> > In article <smah3q$a9f$1@dont-email.me>,
> >   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> > 
> >> On 11/7/21 9:44 PM, Vincent Lefevre wrote:...
> >>> These rules are not about overflow. They are general rules.
> > 
> >> Yes, and they are sufficiently general that it is perfectly clear how
> >> they apply to the case when there is overflow.
> > 
> > I've done some tests, and it is interesting to see that both GCC and
> > Clang choose the IEEE 754 definition of overflow on floating-point
> > constants, not yours (<sl9bqb$hf5$2@dont-email.me>).

> The only definition for overflow that I discussed is not mine, it
> belongs to the C standard: "A floating result overflows if the magnitude
> (absolute value) of the mathematical result is finite but so large that
> the mathematical result cannot be represented without extraordinary
> roundoff error in an object of the specified type." (7.12.1p5).

That's in the C standard. But in <sl9bqb$hf5$2@dont-email.me>, you
said: "Overflow occurs when a floating constant is created whose
value is greater than DBL_MAX or less than -DBL_MAX."

So... I don't understand what you consider as an overflow.

> > ... For instance, the exact value of 0x1.fffffffffffff7p1023 is
> > larger than DBL_MAX, but it doesn't trigger an overflow warning
> > with GCC and Clang.

> No warning is mandated for overflows, so that doesn't contradict
> anything I said.

But that's what it is implemented in practice, and in GCC, the
condition is the same whether infinity is supported or not (see
the code later).

> I wasn't talking about overflow for it's own sake, but only in the
> context of what the standard says about the value of floating point
> constants. What value does that constant have? Is it one of the three
> values permitted by 6.4.4.2p4? Is it, in particular, the value required
> by IEEE 754? If the answers to both questions are yes, it's consistent
> with everything I said.

The second answer is not "yes", in case nextdown(DBL_MAX) would be
returned.

> > I agree. But the question is whether the compiler may choose to
> > stop the compilation.

> I don't remember that issue having previously been raised.

> "The implementation shall not successfully translate a preprocessing
> translation unit containing a #error preprocessing directive unless it
> is part of a group skipped by conditional inclusion." (4p4).

> "The implementation shall be able to translate and execute at least one
> program that contains at least one instance of every one of the
> following limits:" (5.2.4.1p1).

> In all other cases, stopping compilation is neither mandatory nor
> prohibited.

Well, from this point of view, an implementation is free to regard
an overflowing constant as not having a defined behavior and stop
compilation.

> ...
> >> The standard describes two cases: if infinities are supported (as they
> >> necessarily are when IEEE formats are used), INFINITY is required to
> >> expand to a constant expression that represents positive or unsigned
> >> infinity. This is not outside the range of representable values - that
> >> range includes either positive or unsigned infinity, so the constraint
> >> in 6.4.4p2 is not violated.
> > 
> > The range includes all real numbers, but not infinities.

> For an implementation that supports infinities (in other words, an
> implementation where infinities are representable), how do infinities
> fail to qualify as being within the range of representable values? Where
> is that exclusion specified?

5.2.4.2.2p5. Note that it seems that it is intended to exclude
some representable values from the range. Otherwise such a long
specification of the range would not be needed.

That said, either this specification seems incorrect or there are
several meanings of "range". For instance, 5.2.4.2.2p9 says "Except
for assignment and cast (which remove all extra range and precision)",
and here, the intent is to limit the range to the emax exponent of
the considered type.

> Such formats correspond to affinely extended real number systems,
> which differ from ordinary real number systems by including
> -infinity and +infinity. IEEE 754 specifies that infinities are to
> be interpreted in the affine sense.

Yes, but I'm not sure that the exclusion of infinities from the range
has any consequence. For instance, 6.3.1.5 says:

  When a value of real floating type is converted to a real floating type,
  if the value being converted can be represented exactly in the new type,
  it is unchanged. If the value being converted is in the range of values
  that can be represented but cannot be represented exactly, the result is
  either the nearest higher or nearest lower representable value, chosen
  in an implementation-defined manner. If the value being converted is
  outside the range of values that can be represented, the behavior is
  undefined. [...]

So, if infinity is representable in both types, we are in the first
case ("can be represented exactly"), and the range is not used.

> > and both GCC and Clang emits a warning for this reason.
> > 
> > Note that if the intent were "exceeds the range", the C standard
> > should have said that.

> I'm sorry - I seem to have lost the thread of your argument. In which
> location in the current standard do you think the current wording would
> need to be changed to "exceeds the range", in to support my argument?
> Which current phrase would need to be replaced, and why?

I don't remember exactly, but I think that was 7.12p4 to make it
consistent with its footnote (which refers to 6.4.4).

Still, there would be an issue with 5.2.4.2.2p5 what would really
mean.

> >> If infinities are not supported (which is therefore necessarily not an
> >> IEEE format), then INFINITY is required to expand to a constant that
> >> will overflow. This does violate that constraint, which means that a
> >> diagnostic message is required.
> > 
> > This point is not clear and does not match what implementations
> > consider as overflow.

> Which implementations did you test on, which don't support infinities,
> in order to justify that conclusion?

Note that the notion of overflow as defined by 7.12.1p5 (which is
consistent with the particular case of IEEE 754) exists whether
infinities are supported or not.

And for implementations without infinities, see the GCC code:
gcc/c-family/c-lex.c

  if (REAL_VALUE_ISINF (real)
      || (const_type != type && REAL_VALUE_ISINF (real_trunc)))
    {
      *overflow = OT_OVERFLOW;
      if (!(flags & CPP_N_USERDEF))
        {
          if (!MODE_HAS_INFINITIES (TYPE_MODE (type)))
            pedwarn (input_location, 0,
                     "floating constant exceeds range of %qT", type);
          else
            warning (OPT_Woverflow,
                     "floating constant exceeds range of %qT", type);
        }
    }

> > But there's still the fact that "overflow" is not defined (this
> > term is used only when there are no infinities, though).

> 7.12.1p5 is not marked as a definition for "overflows", but has the form
> of a definition. There is no restriction within 7.12.1 to
> implementations that don't support infinities.

I agree. But see the beginning of this message.

> >> However, I'm confused about how this connects to the standard's
> >> definition of normalized floating-point numbers: "f_1 > 0"
> >> (5.2.4.2.2p4). It seems to me that, even for the pair-of-doubles format,
> >> LDBL_MAX is represented by a value with f_1 = 1, and therefore is a
> >> normalized floating point number that is larger than LDBL_NORM_MAX,
> >> which strikes me as a contradiction.
> > 
> > Note that there is a requirement on the exponent: e ≤ e_max.

> Yes, and DBL_MAX has e==e_max.

No, not necessarily. DBL_NORM_MAX has e == e_max. But DBL_MAX may
have a larger exponent. The C2x draft says:

  maximum representable finite floating-point number; if that number
  is normalized, its value is (1 − b^(−p)) b^(e_max).

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

[toc] | [prev] | [next] | [standalone]

#6369

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2021-11-09 00:50 -0500
Message-ID	<smd27p$28v$1@dont-email.me>
In reply to	#6368

On 11/8/21 9:48 PM, Vincent Lefevre wrote:
> In article <smbrgo$g4b$1@dont-email.me>,
>   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> 
>> On 11/8/21 5:56 AM, Vincent Lefevre wrote:
>>> In article <smah3q$a9f$1@dont-email.me>,
>>>   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
...
>> The only definition for overflow that I discussed is not mine, it
>> belongs to the C standard: "A floating result overflows if the magnitude
>> (absolute value) of the mathematical result is finite but so large that
>> the mathematical result cannot be represented without extraordinary
>> roundoff error in an object of the specified type." (7.12.1p5).
> 
> That's in the C standard. But in <sl9bqb$hf5$2@dont-email.me>, you
> said: "Overflow occurs when a floating constant is created whose
> value is greater than DBL_MAX or less than -DBL_MAX."
> 
> So... I don't understand what you consider as an overflow.

At that time, I was unaware of the existence of any floating point
format where DBL_NORM_MAX < DBL_MAX. I've since acknowledged that such
things can occur - but only in such obscure formats.

>> I wasn't talking about overflow for it's own sake, but only in the
>> context of what the standard says about the value of floating point
>> constants. What value does that constant have? Is it one of the three
>> values permitted by 6.4.4.2p4? Is it, in particular, the value required
>> by IEEE 754? If the answers to both questions are yes, it's consistent
>> with everything I said.
> 
> The second answer is not "yes", in case nextdown(DBL_MAX) would be
> returned.

I'm asking what value you observed - was it nextdown(DBL_MAX), DBL_MAX,
+infinity, or something else? The first three are permitted by the C
standard, the second one is mandated by IEEE 754, so I would expect an
implementation that claimed conformance to both standards to choose
DBL_MAX, and NOT nextdown(DBL_MAX). So - which value did you see?

>>> I agree. But the question is whether the compiler may choose to
>>> stop the compilation.
> 
>> I don't remember that issue having previously been raised.
> 
>> "The implementation shall not successfully translate a preprocessing
>> translation unit containing a #error preprocessing directive unless it
>> is part of a group skipped by conditional inclusion." (4p4).
> 
>> "The implementation shall be able to translate and execute at least one
>> program that contains at least one instance of every one of the
>> following limits:" (5.2.4.1p1).
> 
>> In all other cases, stopping compilation is neither mandatory nor
>> prohibited.
> 
> Well, from this point of view, an implementation is free to regard
> an overflowing constant as not having a defined behavior and stop
> compilation.

What renders the behavior undefined? On an implementation that doesn't
support infinities, it's a constraint violation - but constraint
violations don't necessarily have undefined behavior. They usually have
undefined behavior due to "ommission of any explicit definition of the
behavior", but there is in fact an explicit definition of the behavior
that continues to apply even when that constraint is violated.
And on an implementation that does support infinities, it isn't even a
constraint violation.
Whether or not a constraint is violated, as I said above, stopping
compilation is neither mandatory nor prohibited, just like most other
programs.

...
>> For an implementation that supports infinities (in other words, an
>> implementation where infinities are representable), how do infinities
>> fail to qualify as being within the range of representable values? Where
>> is that exclusion specified?
> 
> 5.2.4.2.2p5. Note that it seems that it is intended to exclude
> some representable values from the range. Otherwise such a long
> specification of the range would not be needed.

That clause correctly states that infinities do NOT qualify as floating
point numbers. However, it also correctly refers to them as values. The
relevant clauses refer to the range of representable values, not the
range of representable floating point numbers. On such an
implementation, infinities are representable and they are values.

What are you referring to when you say "such a long specification"?

> That said, either this specification seems incorrect or there are
> several meanings of "range". For instance, 5.2.4.2.2p9 says "Except
> for assignment and cast (which remove all extra range and precision)",
> and here, the intent is to limit the range to the emax exponent of
> the considered type.

I had to go back to n1570.pdf to find that wording. It was removed from
n2310.pdf (2018-11-06). 1. In n2596.pdf (2020-12-11), wording about the
extra range was placed in footnote 22, referred to by 5.2.4.2.2p4, and
is still there in the latest draft I have, n2731.pdf (2021-10-18).

I believe that "extra range" refers to extra representable values that
are supported by the evaluation format, but not by the format of the
type itself. The extra range consists entirely of finite values, even if
the full range is infinite for both formats.

>> Such formats correspond to affinely extended real number systems,
>> which differ from ordinary real number systems by including
>> -infinity and +infinity. IEEE 754 specifies that infinities are to
>> be interpreted in the affine sense.
> 
> Yes, but I'm not sure that the exclusion of infinities from the range
> has any consequence. For instance, 6.3.1.5 says:
> 
>   When a value of real floating type is converted to a real floating type,
>   if the value being converted can be represented exactly in the new type,
>   it is unchanged. If the value being converted is in the range of values
>   that can be represented but cannot be represented exactly, the result is
>   either the nearest higher or nearest lower representable value, chosen
>   in an implementation-defined manner. If the value being converted is
>   outside the range of values that can be represented, the behavior is
>   undefined. [...]
> 
> So, if infinity is representable in both types, we are in the first
> case ("can be represented exactly"), and the range is not used.

I agree - with respect to conversions between floating point types. What
does that have to do with the conversion from a decimal string to a
floating point type, which is described in 6.4.4.2p4? The decimal
strings allowed by that clause cannot represent infinity - they can
acquire an infinite value only by rounding, depending upon the default
rounding mode.

>>> and both GCC and Clang emits a warning for this reason.
>>>
>>> Note that if the intent were "exceeds the range", the C standard
>>> should have said that.
> 
>> I'm sorry - I seem to have lost the thread of your argument. In which
>> location in the current standard do you think the current wording would
>> need to be changed to "exceeds the range", in to support my argument?
>> Which current phrase would need to be replaced, and why?
> 
> I don't remember exactly, but I think that was 7.12p4 to make it
> consistent with its footnote (which refers to 6.4.4).

In the latest draft standard that I have, that wording is now in 7.12p7.
I've already conceded that "overflows" is not necessarily the same as
"exceeds the range". However, the only known exception is for a long
double type, which can't apply to INFINITY, which is what 7.12p7 describes.

...
>>>> If infinities are not supported (which is therefore necessarily not an
>>>> IEEE format), then INFINITY is required to expand to a constant that
>>>> will overflow. This does violate that constraint, which means that a
>>>> diagnostic message is required.
>>>
>>> This point is not clear and does not match what implementations
>>> consider as overflow.
> 
>> Which implementations did you test on, which don't support infinities,
>> in order to justify that conclusion?
> 
> Note that the notion of overflow as defined by 7.12.1p5 (which is
> consistent with the particular case of IEEE 754) exists whether
> infinities are supported or not.

Yes, but INFINITY is only required to overflow, which is what you were
talking about, on implementations that don't support infinities. So, in
order to justify saying that it "does not match what implementations
consider as overflow", you must necessarily be referring to
implementations that don't support infinities.

> And for implementations without infinities, see the GCC code:
> gcc/c-family/c-lex.c
> 
>   if (REAL_VALUE_ISINF (real)
>       || (const_type != type && REAL_VALUE_ISINF (real_trunc)))
>     {
>       *overflow = OT_OVERFLOW;
>       if (!(flags & CPP_N_USERDEF))
>         {
>           if (!MODE_HAS_INFINITIES (TYPE_MODE (type)))
>             pedwarn (input_location, 0,
>                      "floating constant exceeds range of %qT", type);
>           else
>             warning (OPT_Woverflow,
>                      "floating constant exceeds range of %qT", type);
>         }
>     }

It's actually the behavior of that implementation in modes that do
support infinities that is most relevant to this discussion - it labels
and infinite value as exceeding the type's range, even if it does
support infinities. Apparently they are using "range" to refer to the
range of finite values - but I would consider the wording to be
misleading without the qualifier "finite".

>>>> However, I'm confused about how this connects to the standard's
>>>> definition of normalized floating-point numbers: "f_1 > 0"
>>>> (5.2.4.2.2p4). It seems to me that, even for the pair-of-doubles format,
>>>> LDBL_MAX is represented by a value with f_1 = 1, and therefore is a
>>>> normalized floating point number that is larger than LDBL_NORM_MAX,
>>>> which strikes me as a contradiction.
>>>
>>> Note that there is a requirement on the exponent: e ≤ e_max.
> 
>> Yes, and DBL_MAX has e==e_max.
> 
> No, not necessarily. DBL_NORM_MAX has e == e_max. But DBL_MAX may
> have a larger exponent. The C2x draft says:
> 
>   maximum representable finite floating-point number; if that number
>   is normalized, its value is (1 − b^(−p)) b^(e_max).

So, what is the value of e for LDBL_MAX in the pair-of-doubles format?
What is the value of e_max? If LDBL_MAX does not have e==e_max, what is
the largest representable value in that format that does have e==e_max?

[toc] | [prev] | [next] | [standalone]

#6370

From	Vincent Lefevre <vincent-news@vinc17.net>
Date	2021-11-09 10:12 +0000
Message-ID	<20211109082543$cbe5@zira.vinc17.org>
In reply to	#6369

In article <smd27p$28v$1@dont-email.me>,
  James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

> On 11/8/21 9:48 PM, Vincent Lefevre wrote:
> > In article <smbrgo$g4b$1@dont-email.me>,
> >   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> > 
> >> On 11/8/21 5:56 AM, Vincent Lefevre wrote:
> >>> In article <smah3q$a9f$1@dont-email.me>,
> >>>   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> ...
> >> The only definition for overflow that I discussed is not mine, it
> >> belongs to the C standard: "A floating result overflows if the magnitude
> >> (absolute value) of the mathematical result is finite but so large that
> >> the mathematical result cannot be represented without extraordinary
> >> roundoff error in an object of the specified type." (7.12.1p5).
> > 
> > That's in the C standard. But in <sl9bqb$hf5$2@dont-email.me>, you
> > said: "Overflow occurs when a floating constant is created whose
> > value is greater than DBL_MAX or less than -DBL_MAX."
> > 
> > So... I don't understand what you consider as an overflow.

> At that time, I was unaware of the existence of any floating point
> format where DBL_NORM_MAX < DBL_MAX. I've since acknowledged that such
> things can occur - but only in such obscure formats.

Even with IEEE 754 formats, values less than DBL_MAX + 1/2 ulp
in magnitude do not yield an overflow in round-to-nearest (the
default rounding mode in IEEE 754).

> >> I wasn't talking about overflow for it's own sake, but only in the
> >> context of what the standard says about the value of floating point
> >> constants. What value does that constant have? Is it one of the three
> >> values permitted by 6.4.4.2p4? Is it, in particular, the value required
> >> by IEEE 754? If the answers to both questions are yes, it's consistent
> >> with everything I said.
> > 
> > The second answer is not "yes", in case nextdown(DBL_MAX) would be
> > returned.

> I'm asking what value you observed - was it nextdown(DBL_MAX), DBL_MAX,
> +infinity, or something else? The first three are permitted by the C
> standard, the second one is mandated by IEEE 754, so I would expect an
> implementation that claimed conformance to both standards to choose
> DBL_MAX, and NOT nextdown(DBL_MAX). So - which value did you see?

This issue is not what one can observe on a subset of implementations,
but what is possible. The value nextdown(DBL_MAX) does not make much
sense when the implementation *knows* that the value is larger than
DBL_MAX because it exceeds the range (there is a diagnostic to tell
that to the user because of 6.4.4p2).

[...]
> What renders the behavior undefined? On an implementation that doesn't
> support infinities, it's a constraint violation - but constraint
> violations don't necessarily have undefined behavior. They usually have
> undefined behavior due to "ommission of any explicit definition of the
> behavior", but there is in fact an explicit definition of the behavior
> that continues to apply even when that constraint is violated.
> And on an implementation that does support infinities, it isn't even a
> constraint violation.

Actually it is when the mathematical result exceeds the range. 6.5p5
says: "If an /exceptional condition/ occurs during the evaluation of
an expression (that is, if the result is not mathematically defined or
not in the range of representable values for its type), the behavior
is undefined." So this appears to be an issue when infinity is not
supported.

I suppose that when the standard defines something, it assumes the
case where such an exceptional condition does not occur, unless
explicitly said otherwise (that's the whole point of 6.5p5). And in
the definitions concerning floating-point expressions, the standard
never distinguishes between an exceptional condition or not. For
instance, for addition, the standatd just says "The result of the
binary + operator is the sum of the operands." (on the real numbers,
this operation is always mathematically well-defined, so the only
issue is results that exceed the range, introduced by 6.5p5).

> ...
> >> For an implementation that supports infinities (in other words, an
> >> implementation where infinities are representable), how do infinities
> >> fail to qualify as being within the range of representable values? Where
> >> is that exclusion specified?
> > 
> > 5.2.4.2.2p5. Note that it seems that it is intended to exclude
> > some representable values from the range. Otherwise such a long
> > specification of the range would not be needed.

> That clause correctly states that infinities do NOT qualify as
> floating point numbers.

Note that there are inconsistencies in the standard about what
it means by "floating-point numbers". It is sometimes used to
mean the value of a floating type. For instance, the standard
says for fabs: "The fabs functions compute the absolute value
of a floating-point number x." But I really don't think that
this function is undefined on infinities.

That's probably why it says "*finite* floating-point number"
and not just "floating-point number" (if it were clear that
infinities do not qualify as floating-point numbers, the word
"finite" would not be necessary).

> However, it also correctly refers to them as values. The relevant
> clauses refer to the range of representable values, not the range of
> representable floating point numbers. On such an implementation,
> infinities are representable and they are values.

My point is that it says *real* numbers. And infinities are not
real numbers.

> What are you referring to when you say "such a long specification"?

If I understand what you wish (to include all representable values
in the range), the standard could have said: "The minimum range of
representable values for a floating type is the most negative number
in that type through the most positive number in that type." That's
simpler and shorter than the current text of 5.2.4.2.2p5.

So leaving representable values (which are not FP numbers) outside
the range may be intended.

> > That said, either this specification seems incorrect or there are
> > several meanings of "range". For instance, 5.2.4.2.2p9 says "Except
> > for assignment and cast (which remove all extra range and precision)",
> > and here, the intent is to limit the range to the emax exponent of
> > the considered type.

> I had to go back to n1570.pdf to find that wording. It was removed from
> n2310.pdf (2018-11-06). 1. In n2596.pdf (2020-12-11), wording about the
> extra range was placed in footnote 22, referred to by 5.2.4.2.2p4, and
> is still there in the latest draft I have, n2731.pdf (2021-10-18).

I can still see this text in other similar places of the current
draft N2731. For instance, 6.5.4p6 about cast operators:
"[...] then the cast specifies a conversion even if the type of
the expression is the same as the named type and removes any extra
range and precision."

> I believe that "extra range" refers to extra representable values that
> are supported by the evaluation format, but not by the format of the
> type itself. The extra range consists entirely of finite values, even if
> the full range is infinite for both formats.

This is what I believe too. But instead of "extra range and precision",
the standard should have said values that are not representable exactly
in the target floating type. Something like that.

[...]
> > I don't remember exactly, but I think that was 7.12p4 to make it
> > consistent with its footnote (which refers to 6.4.4).

> In the latest draft standard that I have, that wording is now in 7.12p7.
> I've already conceded that "overflows" is not necessarily the same as
> "exceeds the range". However, the only known exception is for a long
> double type, which can't apply to INFINITY, which is what 7.12p7 describes.

A value may overflow, but still be in the range of representable
values (if infinities are not supported, 5.2.4.2.2p5 just specifies
a minimum range). And conversely, something like DBL_MAX + a tiny
number may not be regarded as an overflow, but be outside the range
(if infinities are not supported).

> > Note that the notion of overflow as defined by 7.12.1p5 (which is
> > consistent with the particular case of IEEE 754) exists whether
> > infinities are supported or not.

> Yes, but INFINITY is only required to overflow, which is what you were
> talking about, on implementations that don't support infinities. So, in
> order to justify saying that it "does not match what implementations
> consider as overflow", you must necessarily be referring to
> implementations that don't support infinities.

I was saying 2 things:
  * What an implementation regards as an overflow, whether infinities
    are supported or not.
  * With GCC, what happens when infinities are not supported,
    according to its code.

> > And for implementations without infinities, see the GCC code:
> > gcc/c-family/c-lex.c
> > 
> >   if (REAL_VALUE_ISINF (real)
> >       || (const_type != type && REAL_VALUE_ISINF (real_trunc)))
> >     {
> >       *overflow = OT_OVERFLOW;
> >       if (!(flags & CPP_N_USERDEF))
> >         {
> >           if (!MODE_HAS_INFINITIES (TYPE_MODE (type)))
> >             pedwarn (input_location, 0,
> >                      "floating constant exceeds range of %qT", type);
> >           else
> >             warning (OPT_Woverflow,
> >                      "floating constant exceeds range of %qT", type);
> >         }
> >     }

> It's actually the behavior of that implementation in modes that do
> support infinities that is most relevant to this discussion - it labels
> and infinite value as exceeding the type's range, even if it does
> support infinities. Apparently they are using "range" to refer to the
> range of finite values - but I would consider the wording to be
> misleading without the qualifier "finite".

Yes, the wording in the "else" case (where infinities are supported)
is incorrect. I had reported a bug:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103123

> >>>> However, I'm confused about how this connects to the standard's
> >>>> definition of normalized floating-point numbers: "f_1 > 0"
> >>>> (5.2.4.2.2p4). It seems to me that, even for the pair-of-doubles format,
> >>>> LDBL_MAX is represented by a value with f_1 = 1, and therefore is a
> >>>> normalized floating point number that is larger than LDBL_NORM_MAX,
> >>>> which strikes me as a contradiction.
> >>>
> >>> Note that there is a requirement on the exponent: e ≤ e_max.
> > 
> >> Yes, and DBL_MAX has e==e_max.
> > 
> > No, not necessarily. DBL_NORM_MAX has e == e_max. But DBL_MAX may
> > have a larger exponent. The C2x draft says:
> > 
> >   maximum representable finite floating-point number; if that number
> >   is normalized, its value is (1 − b^(−p)) b^(e_max).

> So, what is the value of e for LDBL_MAX in the pair-of-doubles format?

It should be DBL_MAX_EXP. What happens with double-double is that
for the maximum exponent of double, not all precision-p numbers
are representable (here, p = 106 = 2 * 53 historically, though
107 could actually be used thanks to the constraint below and the
limitation on the exponent discussed here).

The reason is that there is a constraint on the format in order
to make the double-double algorithms fast enough: if (x1,x2) is
a valid double-double number, then x1 must be equal to x1 + x2
rounded to nearest. So LDBL_MAX has the form:

  .111...1110111...111 * 2^(DBL_MAX_EXP)

where both sequences 111...111 have 53 bits. Values above this
number would increase the exponent of x1 to DBL_MAX_EXP + 1,
which is above the maximum exponent for double; thus such values
are not representable.

The consequence is that e_max < DBL_MAX_EXP.

> What is the value of e_max?

DBL_MAX_EXP - 1

> If LDBL_MAX does not have e==e_max,

(LDBL_MAX has exponent e = e_max + 1.)

> what is the largest representable value in that format that does
> have e==e_max?

Some value very close to 2^e_max: x1 = 2^e_max and x2 = -DBL_TRUE_MIN.
Note that it does not fit the floating-point model because it is not
representable with a p-bit precision.

And LDBL_NORM_MAX = (1 − 2^(−p)) 2^(e_max) as specified; it is
represented by

  x1 = 2^(e_max) = .1 * 2^DBL_MAX_EXP
  x2 = -2^(e_max-p)

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

[toc] | [prev] | [next] | [standalone]

#6372

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2021-11-09 12:51 -0500
Message-ID	<smecfc$jai$1@dont-email.me>
In reply to	#6370

On 11/9/21 5:12 AM, Vincent Lefevre wrote:
> In article <smd27p$28v$1@dont-email.me>,
>   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> 
>> On 11/8/21 9:48 PM, Vincent Lefevre wrote:
>>> In article <smbrgo$g4b$1@dont-email.me>,
>>>   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
...
>>>> I wasn't talking about overflow for it's own sake, but only in the
>>>> context of what the standard says about the value of floating point
>>>> constants. What value does that constant have? Is it one of the three
>>>> values permitted by 6.4.4.2p4? Is it, in particular, the value required
>>>> by IEEE 754? If the answers to both questions are yes, it's consistent
>>>> with everything I said.
>>>
>>> The second answer is not "yes", in case nextdown(DBL_MAX) would be
>>> returned.
> 
>> I'm asking what value you observed - was it nextdown(DBL_MAX), DBL_MAX,
>> +infinity, or something else? The first three are permitted by the C
>> standard, the second one is mandated by IEEE 754, so I would expect an
>> implementation that claimed conformance to both standards to choose
>> DBL_MAX, and NOT nextdown(DBL_MAX). So - which value did you see?
> 
> This issue is not what one can observe on a subset of implementations,
> but what is possible.

Why does it matter to you that such implementations are possible? No
such implementation can qualify as conforming to IEEE 754 - so what? The
C standard very deliberately does NOT require conformance to IEEE 754,
and what it requires in areas that are also covered by IEEE 754 is
deliberately more lenient than what IEEE 754 requires, precisely so C
can be implemented on platforms where floating point hardware that can't
meet IEEE 754's accuracy is installed. That's why the __STDC_IEC_*
macros exist - to allow a program to determine whether a implementation
claims to conform to some or all of the requirements of IEC 60559
(==IEEE 754). That's why those macros are described in the section
titled "Conditional feature macros."

Two standards do not (as you claim in the Subject: header of this
thread) contradict each other just because they say different things
about the same situation. If one standard provides a set containing one
or more options, and the other standard provides a different set of one
or more options, the two standards contradict each other only if there's
no overlap between the two sets of options. So long as there is at least
one option that meets the requirements of both standards, they don't
contradict each other.

People do not create full implementations of C just for the fun of it
(well, most people don't). In particular, they don't create an
implementation that conforms to the C standard but not to IEC 60559 by
accident or laziness. In general, you can safely assume that any such
implementation did so because there was some inconvenience associated
with conforming to IEC 60559 that they wished to avoid. If the C
standard were changed to mandate conformance with IEC 60559, some of
those implementations might change to conform with that standard, but
many (possibly most) such implementations would respond by deciding to
not bother conforming to that version of the C standard, because
conforming would be too inconvenient.

> ... The value nextdown(DBL_MAX) does not make much
> sense when the implementation *knows* that the value is larger than
> DBL_MAX because it exceeds the range (there is a diagnostic to tell
> that to the user because of 6.4.4p2).

You misunderstand the purpose of the specification in 6.4.4.2p4. It was
not intended that a floating point implementation would generate the
nearest representable value, and that the implementation of C would then
arbitrarily chose to pick one of the other two adjacent representable
values. The reason was to accommodate floating point implementations
that couldn't meet the accuracy requirements of IEC 60559. The
implementation asks the floating point hardware to calculate what the
value is, the hardware does it's best to accurately calculate the value,
but it's best isn't good enough to qualify as conforming to IEC 60559.
It might take some shortcuts or simplifications that make it faster or
simpler than an IEC 60559, at the cost of being less accurate. It
returns a value that, incorrectly, is not greater than DBL_MAX, and the
wording in 6.4.4.2p4 gives the implementation permission to use that
incorrect number, so long as it isn't smaller than nextdown(DBL_MAX).

...
> Actually it is when the mathematical result exceeds the range. 6.5p5
> says: "If an /exceptional condition/ occurs during the evaluation of
> an expression (that is, if the result is not mathematically defined or
> not in the range of representable values for its type), the behavior
> is undefined." So this appears to be an issue when infinity is not
> supported.

Conversion of a floating point constant into a floating point value is
not "evaluation of an expression", and therefore is not covered by
6.5p5. Such conversions are required to occur "as-if at translation
time", and exceptional conditions are explicitly prohibited.

> I suppose that when the standard defines something, it assumes the
> case where such an exceptional condition does not occur, unless
> explicitly said otherwise (that's the whole point of 6.5p5). And in
> the definitions concerning floating-point expressions, the standard
> never distinguishes between an exceptional condition or not. For
> instance, for addition, the standatd just says "The result of the
> binary + operator is the sum of the operands." (on the real numbers,
> this operation is always mathematically well-defined, so the only
> issue is results that exceed the range, introduced by 6.5p5).

The standard is FAR more lenient with regard to floating point
operations than it is for floating point constants:
"The accuracy of the floating-point operations ( + , - , * , / ) and of
the library functions in <math.h> and <complex.h> that return
floating-point results is implementation-defined, as is the accuracy of
the conversion between floating-point internal representations and
string representations performed by the library functions in <stdio.h> ,
<stdlib.h> , and <wchar.h> . The implementation may state that the
accuracy is unknown." (5.2.4.2.2p8).

That wording allows an implementation to implement floating point
arithmetic so inaccurately that it can conclude that the expression
LDBL_MAX - LDBL_MIN < LDBL_MIN - LDBL_MAX is true. Note: the comparison
operators (== != < > <= >=) are not covered by 5.2.4.2.2p8, but the
subtraction operator is.

I don't approve of this situation; I can't imagine any good reason for
implementing floating point operations as inaccurately as the standard
allows them to be implemented. The standard should provide some more
meaningful requirements, They don't have to be very strong - they could
be weak enough that every known serious floating point implementation
could meet them, and still be immensely stronger than the current
requirements. Any platform where floating point isn't actually needed
should simply be allowed to opt out of supporting floating point
entirely, rather than being required to support it but allowed to
implement it that badly. That would be safer for all concerned.

However, those incredibly loose requirements are what the standard
actually says.

...
>>>> For an implementation that supports infinities (in other words, an
>>>> implementation where infinities are representable), how do infinities
>>>> fail to qualify as being within the range of representable values? Where
>>>> is that exclusion specified?
>>>
>>> 5.2.4.2.2p5. Note that it seems that it is intended to exclude
>>> some representable values from the range. Otherwise such a long
>>> specification of the range would not be needed.
> 
>> That clause correctly states that infinities do NOT qualify as
>> floating point numbers.
> 
> Note that there are inconsistencies in the standard about what
> it means by "floating-point numbers". It is sometimes used to
> mean the value of a floating type. For instance, the standard
> says for fabs: "The fabs functions compute the absolute value
> of a floating-point number x." But I really don't think that
> this function is undefined on infinities.

If __STDC_IEC_60559_BFP__ is pre#defined by the implementation, F10.4.3
not only allows fabs (±∞), it explicitly mandates that it return +∞.
Note: if you see odd symbols on the previous line, they were supposed to
be infinities).

>> However, it also correctly refers to them as values. The relevant
>> clauses refer to the range of representable values, not the range of
>> representable floating point numbers. On such an implementation,
>> infinities are representable and they are values.
> 
> My point is that it says *real* numbers. And infinities are not
> real numbers.

In n2731.pdf, 5.2.4.2.2p5 says "An implementation may give zero and
values that are not floating-point numbers (such as infinities
and NaNs) a sign or may leave them unsigned. Wherever such values are
unsigned, any requirement in this document to retrieve the sign shall
produce an unspecified sign, and any requirement to set the sign shall
be ignored."
Nowhere in that clause does it use the term "real".
Are you perhaps referring to 5.2.4.2.2p7?

...
>>>>>> However, I'm confused about how this connects to the standard's
>>>>>> definition of normalized floating-point numbers: "f_1 > 0"
>>>>>> (5.2.4.2.2p4). It seems to me that, even for the pair-of-doubles format,
>>>>>> LDBL_MAX is represented by a value with f_1 = 1, and therefore is a
>>>>>> normalized floating point number that is larger than LDBL_NORM_MAX,
>>>>>> which strikes me as a contradiction.
>>>>>
>>>>> Note that there is a requirement on the exponent: e ≤ e_max.
>>>
>>>> Yes, and DBL_MAX has e==e_max.
>>>
>>> No, not necessarily. DBL_NORM_MAX has e == e_max. But DBL_MAX may
>>> have a larger exponent. The C2x draft says:
>>>
>>>   maximum representable finite floating-point number; if that number
>>>   is normalized, its value is (1 − b^(−p)) b^(e_max).
> 
>> So, what is the value of e for LDBL_MAX in the pair-of-doubles format?
> 
> It should be DBL_MAX_EXP. What happens with double-double is that
> for the maximum exponent of double, not all precision-p numbers
> are representable (here, p = 106 = 2 * 53 historically, though
> 107 could actually be used thanks to the constraint below and the
> limitation on the exponent discussed here).
> 
> The reason is that there is a constraint on the format in order
> to make the double-double algorithms fast enough: if (x1,x2) is
> a valid double-double number, then x1 must be equal to x1 + x2
> rounded to nearest. So LDBL_MAX has the form:
> 
>   .111...1110111...111 * 2^(DBL_MAX_EXP)
> 
> where both sequences 111...111 have 53 bits. Values above this
> number would increase the exponent of x1 to DBL_MAX_EXP + 1,
> which is above the maximum exponent for double; thus such values
> are not representable.
> 
> The consequence is that e_max < DBL_MAX_EXP.
> 
>> What is the value of e_max?
> 
> DBL_MAX_EXP - 1
> 
>> If LDBL_MAX does not have e==e_max,
> 
> (LDBL_MAX has exponent e = e_max + 1.)

That doesn't work. 5.2.4.2.2p2 and p3 both specify that floating point
numbers must have e_min <= e && e <= e_max. LDBL_MAX is defined as the
"maximum finite floating point number". A value for which e > e_max
can't qualify as a floating point number, and therefore in particular
can't qualify as the maximum finite floating point number. An
implementation that uses the sum-of-pair-of-doubles floating point
format has two options: increase e_max high enough to include the value
you specify for LDBL_MAX, or decrease LDBL_MAX to a value low enough to
have e<=e_max.

Key point: most items in 5.2.4.2.2 have two parts: a description, and an
expression involving the parameters of the floating point format. For
formats that are a good fit to the C standard's floating point model,
those formulas give the exactly correct result. For other formats, the
description is what specifies what the result must be, the formula
should be treated only as an example that might not apply.

Those formulas were written on an implicit assumption that becomes
obvious only when you try to apply them to a format that violates the
assumption: every base_b digit from f_1 to f_p can freely be set to any
value from 0 to b-1. In particular, the formula for LDBL_MAX was based
upon the assumption that all of those values were set to b-1, and e was
set to e_max.
A pair-of-doubles format could fit that assumption if a restriction were
imposed that says that a pair (x1, x2) is allowed only if x2 == 0 || (
1ulp on x1 > x2 && x2 >= 0.5 ulp on x1). (that condition needs to be
modified to give the right requirements for negative numbers). Such an
implementation could, with perfect accuracy, be described using
LDBL_MANT_DIG == 2*DBL_MANT_DIG and LDBL_MAX_EXP == DBL_MAX_EXP.

However, the pair-of-doubles format you've described doesn't impose such
requirements. The value of p must be high enough that, for any pair (x1,
x2) where x1 is finite and x2 is non-zero which is meant to qualify as
representing a floating point number, p covers both the most significant
digit of x1, and the least significant digit of any non-zero x2, no
matter how large the ratio x1/x2 is. Whenever that ratio is high enough,
f_k for most values of k can only be 0. As a result, one of the
assumptions behind the formulas in 5.2.4.2.2 isn't met. so those
formulas aren't always valid for such a format - but the descriptions
still apply.

[toc] | [prev] | [next] | [standalone]

#6373

From	Vincent Lefevre <vincent-news@vinc17.net>
Date	2021-11-10 12:48 +0000
Message-ID	<20211110113701$8446@zira.vinc17.org>
In reply to	#6372

In article <smecfc$jai$1@dont-email.me>,
  James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

> Why does it matter to you that such implementations are possible?

When writing a portable program, one wants it to behave correctly
even on untested implementations (which can be known implementations
but without a machine available to test the program, implementations
unknown to the developer, and possible future implementations).

This is also useful for formal proofs that don't stick to a particular
implementation.

> No such implementation can qualify as conforming to IEEE 754 - so
> what? The C standard very deliberately does NOT require conformance
> to IEEE 754,

This is not the point. IEEE 754 had great properties as it more
or less ensures a sane behavior. If the implementation does not
conform to IEEE 754, one should still expect a sane behavior (if
well-defined), and the C standard should ensure that.

For instance, one should expect that HUGE_VALF ≤ INFINITY and
FLT_MAX ≤ INFINITY.

> > ... The value nextdown(DBL_MAX) does not make much
> > sense when the implementation *knows* that the value is larger than
> > DBL_MAX because it exceeds the range (there is a diagnostic to tell
> > that to the user because of 6.4.4p2).

> You misunderstand the purpose of the specification in 6.4.4.2p4. It was
> not intended that a floating point implementation would generate the
> nearest representable value, and that the implementation of C would then
> arbitrarily chose to pick one of the other two adjacent representable
> values. The reason was to accommodate floating point implementations
> that couldn't meet the accuracy requirements of IEC 60559.

You didn't understand. I repeat. The implementation *knows* that the
value is larger than DBL_MAX. This knowledge is *required* by the C
standard so that the required diagnostic can be emitted (due to the
constraint in 6.4.4p2). So there is no reason that the implementation
would assume that the value can be less than DBL_MAX.

This is not an accuracy issue, or if there is one, it occurs at the
level of the 6.4.4p2 constraint.

> ...
> > Actually it is when the mathematical result exceeds the range. 6.5p5
> > says: "If an /exceptional condition/ occurs during the evaluation of
> > an expression (that is, if the result is not mathematically defined or
> > not in the range of representable values for its type), the behavior
> > is undefined." So this appears to be an issue when infinity is not
> > supported.

> Conversion of a floating point constant into a floating point value is
> not "evaluation of an expression", and therefore is not covered by
> 6.5p5. Such conversions are required to occur "as-if at translation
> time", and exceptional conditions are explicitly prohibited.

But what about constant expressions?

For instance, assuming no IEEE 754 support, what is the behavior of
the following code?

  static double x = DBL_MAX + DBL_MAX;

(We are in the case of a result that is mathematically defined, but
not in the range of representable values for its type.)

If one ignores 6.5p5 because this is a translation-time computation,
I find the standard rather ambiguous on what is required.

Note that there is a constraint 6.6p4 "Each constant expression shall
evaluate to a constant that is in the range of representable values
for its type." but this is of the same kind as 6.4.4p2 for constants.

And what about the following?

  static int i = 2 || 1 / 0;

Here, 1 / 0 is a constant expression that doesn't meet constraint
6.6p4. So a diagnostic would be required (even though the behavior
is well-defined)?

Note: There was a DR resolution to justify that there should not
be a diagnostic. This DR said that since 1 / 0 was not meeting the
constraint, it was not regarded as a constant expression. But if
one applies this interpretation on constants, this means that if
the value is not in the range of representable values, then it is
not regarded as a constant, thus making the behavior undefined in
this case.

> > Note that there are inconsistencies in the standard about what
> > it means by "floating-point numbers". It is sometimes used to
> > mean the value of a floating type. For instance, the standard
> > says for fabs: "The fabs functions compute the absolute value
> > of a floating-point number x." But I really don't think that
> > this function is undefined on infinities.

> If __STDC_IEC_60559_BFP__ is pre#defined by the implementation, F10.4.3
> not only allows fabs (±∞), it explicitly mandates that it return +∞.

The issue is when __STDC_IEC_60559_BFP__ is not defined but infinities
are supported (as allowed by the standard).

> >> However, it also correctly refers to them as values. The relevant
> >> clauses refer to the range of representable values, not the range of
> >> representable floating point numbers. On such an implementation,
> >> infinities are representable and they are values.
> > 
> > My point is that it says *real* numbers. And infinities are not
> > real numbers.

> In n2731.pdf, 5.2.4.2.2p5 says "An implementation may give zero and
> values that are not floating-point numbers (such as infinities
> and NaNs) a sign or may leave them unsigned. Wherever such values are
> unsigned, any requirement in this document to retrieve the sign shall
> produce an unspecified sign, and any requirement to set the sign shall
> be ignored."
> Nowhere in that clause does it use the term "real".
> Are you perhaps referring to 5.2.4.2.2p7?

Sorry for the confusion, I should have said that this came from C17.
Indeed, this was renumbered to 5.2.4.2.2p7 in N2731.

(I'm considering both C17 and C2x N2731, but perhaps one should
consider only C2x N2731 since it fixes FP issues from the past
standards.)

> ...
> >>>>>> However, I'm confused about how this connects to the standard's
> >>>>>> definition of normalized floating-point numbers: "f_1 > 0"
> >>>>>> (5.2.4.2.2p4). It seems to me that, even for the pair-of-doubles format,
> >>>>>> LDBL_MAX is represented by a value with f_1 = 1, and therefore is a
> >>>>>> normalized floating point number that is larger than LDBL_NORM_MAX,
> >>>>>> which strikes me as a contradiction.
> >>>>>
> >>>>> Note that there is a requirement on the exponent: e ≤ e_max.
> >>>
> >>>> Yes, and DBL_MAX has e==e_max.
> >>>
> >>> No, not necessarily. DBL_NORM_MAX has e == e_max. But DBL_MAX may
> >>> have a larger exponent. The C2x draft says:
> >>>
> >>>   maximum representable finite floating-point number; if that number
> >>>   is normalized, its value is (1 − b^(−p)) b^(e_max).
> > 
> >> So, what is the value of e for LDBL_MAX in the pair-of-doubles format?
> > 
> > It should be DBL_MAX_EXP. What happens with double-double is that
> > for the maximum exponent of double, not all precision-p numbers
> > are representable (here, p = 106 = 2 * 53 historically, though
> > 107 could actually be used thanks to the constraint below and the
> > limitation on the exponent discussed here).
> > 
> > The reason is that there is a constraint on the format in order
> > to make the double-double algorithms fast enough: if (x1,x2) is
> > a valid double-double number, then x1 must be equal to x1 + x2
> > rounded to nearest. So LDBL_MAX has the form:
> > 
> >   .111...1110111...111 * 2^(DBL_MAX_EXP)
> > 
> > where both sequences 111...111 have 53 bits. Values above this
> > number would increase the exponent of x1 to DBL_MAX_EXP + 1,
> > which is above the maximum exponent for double; thus such values
> > are not representable.
> > 
> > The consequence is that e_max < DBL_MAX_EXP.
> > 
> >> What is the value of e_max?
> > 
> > DBL_MAX_EXP - 1
> > 
> >> If LDBL_MAX does not have e==e_max,
> > 
> > (LDBL_MAX has exponent e = e_max + 1.)

> That doesn't work. 5.2.4.2.2p2 and p3 both specify that floating point
> numbers must have e_min <= e && e <= e_max.

Yes, *floating-point numbers*.

> LDBL_MAX is defined as the "maximum finite floating point number".

I'd see this as a defect in N2731. As I was saying earlier, the
standard does not use "floating-point number" in a consistent way.
This was discussed, but it seems that not everything was fixed.
As an attempt to clarify this point, "normalized" was added, but
this may not have been the right thing.

The purpose of LDBL_MAX is to be able to be a finite value larger
than LDBL_NORM_MAX, which is the maximum floating-point number
following the 5.2.4.2.2p3 definition. LDBL_NORM_MAX was introduced
precisely because LDBL_MAX does not necessarily follow the model
of 5.2.4.2.2p3 (i.e. LDBL_MAX isn't necessarily a floating-point
number).

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

[toc] | [prev] | [next] | [standalone]

#6376

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2021-11-10 12:03 -0500
Message-ID	<smgu08$3r1$1@dont-email.me>
In reply to	#6373

On 11/10/21 7:48 AM, Vincent Lefevre wrote:
> In article <smecfc$jai$1@dont-email.me>,
>   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> 
>> Why does it matter to you that such implementations are possible?
> 
> When writing a portable program, one wants it to behave correctly
> even on untested implementations (which can be known implementations
> but without a machine available to test the program, implementations
> unknown to the developer, and possible future implementations).

Yes, but it's also necessary to understand as "correct" any behavior
that conforms to the relevant requirements, which may include
conformance with relevant standards. Such implementations produce
behavior that is correct according to the C standard. Such
implementations should make no claim to conform to IEEE 754. If so, the
fact that don't conform to it doesn't render their result incorrect.

...
>>> ... The value nextdown(DBL_MAX) does not make much
>>> sense when the implementation *knows* that the value is larger than
>>> DBL_MAX because it exceeds the range (there is a diagnostic to tell
>>> that to the user because of 6.4.4p2).
> 
>> You misunderstand the purpose of the specification in 6.4.4.2p4. It was
>> not intended that a floating point implementation would generate the
>> nearest representable value, and that the implementation of C would then
>> arbitrarily chose to pick one of the other two adjacent representable
>> values. The reason was to accommodate floating point implementations
>> that couldn't meet the accuracy requirements of IEC 60559.
> 
> You didn't understand. I repeat. The implementation *knows* that the
> value is larger than DBL_MAX. This knowledge is *required* by the C
> standard so that the required diagnostic can be emitted (due to the
> constraint in 6.4.4p2). So there is no reason that the implementation
> would assume that the value can be less than DBL_MAX.
> 
> This is not an accuracy issue, or if there is one, it occurs at the
> level of the 6.4.4p2 constraint.

I assume we've been talking about implementations that conform to the C
standard, right? Otherwise there's nothing meaningful that can be said.

6.4.4.2p4 describes accuracy requirements that allow the result you find
objectionable. I've been talking about the fact that those requirements
are a little bit more lenient than those imposed by IEEE 754, because
those looser requirements allow a slightly simpler implementation, one
which might use up less code space or execute somewhat faster, at the
cost of lower accuracy.

However, it's not a lot less accuracy. Please be specific in your
answers to the following questions. Identify a specific IEEE 754 format
and the actual numerical values for that format, printed with enough
digits to see the differences between them:

* How big is the largest value described by a floating point constant,
that must be rounded down to DBL_MAX?
* What is the value of DBL_MAX?
* What is the value of nextdown(DBL_MAX)?
* How big is the difference between the first two values?
* How big is the difference between the second and third values?

As you should see, the maximum error allowed by the C standard is not
enormously larger than the maximum error allowed by IEEE 754.

You're worried about the possibility of an implementation conforming to
the C standard by returning nextdown(DBL_MAX), despite the fact that, in
order to conform, the implementation would also have to generate that
diagnostic message? This means that there must be a block of code in the
compiler somewhere, which issues that diagnostic, and which only gets
executed when that constraint is violated, but for some reason the
implementor choose not to add code to that block to set the value to
DBL_MAX. If you're worried about that possibility, that implies that you
can imagine a reason why someone might do that. What might that reason be?

For the sake of argument, let's postulate that a given implementor does
in fact have some reason to do that. If that's the case, there's
something I can guarantee to you: that implementor considers such an
error to be acceptably small, and believes that a sufficiently large
fraction of the users of his implementation will agree. If the
implementor is wrong about that second point, people will eventually
stop using his implementation. If he's right about that point - if both
he and the users of his implementation consider such inaccuracy
acceptable - why should he change his implementation just because you
consider it unacceptable? You wouldn't be a user of such an
implementation anyway, right?

...
>>> Actually it is when the mathematical result exceeds the range. 6.5p5
>>> says: "If an /exceptional condition/ occurs during the evaluation of
>>> an expression (that is, if the result is not mathematically defined or
>>> not in the range of representable values for its type), the behavior
>>> is undefined." So this appears to be an issue when infinity is not
>>> supported.
> 
>> Conversion of a floating point constant into a floating point value is
>> not "evaluation of an expression", and therefore is not covered by
>> 6.5p5. Such conversions are required to occur "as-if at translation
>> time", and exceptional conditions are explicitly prohibited.
> 
> But what about constant expressions?

They are expressions, not constants, so I agree that they are covered by
ed 6.5p5. That doesn't quite lead you to exactly the result you want.
See below.

> For instance, assuming no IEEE 754 support, what is the behavior of
> the following code?
> 
>   static double x = DBL_MAX + DBL_MAX;

That involves addition, and is therefore covered by 5.2.4.2.2p8, which I
quoted in my previous message..

> If one ignores 6.5p5 because this is a translation-time computation,
> I find the standard rather ambiguous on what is required.

Floating point constants are required to be evaluated as-if at
translation-time.
Constant expressions are permitted to be evaluated at translation-time,
but it is not required. If it is performed at translation time, the
recommended practice when __STDC_IEC_60559_BFP__ is pre#defined is: "The
implementation should produce a diagnostic message for each
translation-time floating-point exception, other than "inexact"; 396)
the implementation should then proceed with the translation of the
program." (F.8.2p2). I would presume that this is also allowed, but not
required, even if __STDC_IEC_60559_BFP__ is not pre#defined.

> Note that there is a constraint 6.6p4 "Each constant expression shall
> evaluate to a constant that is in the range of representable values
> for its type." but this is of the same kind as 6.4.4p2 for constants.

Because of 5.2.4.2.2p8, it's implementation-defined whether or not the
addition is carried out with sufficient inaccuracy to produce a result
that is within the range of representable values. I would not recommend
having any specific expectations, good or bad, about the behavior of
such code.

> And what about the following?
> 
>   static int i = 2 || 1 / 0;

Integer division is far more tightly constrained by the C standard than
floating point division (it would be really difficult, bordering on
impossible, for something to constrain floating point division more
loosely than the C standard does).

>>> Note that there are inconsistencies in the standard about what
>>> it means by "floating-point numbers". It is sometimes used to
>>> mean the value of a floating type. For instance, the standard
>>> says for fabs: "The fabs functions compute the absolute value
>>> of a floating-point number x." But I really don't think that
>>> this function is undefined on infinities.
> 
>> If __STDC_IEC_60559_BFP__ is pre#defined by the implementation, F10.4.3
>> not only allows fabs (±∞), it explicitly mandates that it return +∞.
> 
> The issue is when __STDC_IEC_60559_BFP__ is not defined but infinities
> are supported (as allowed by the standard).

True, but the fact that F10.4.3 is there implies that the behavior it
specifies is not considered to violate the clause that you referred to,
so it would not be prohibited for an implementation to provide such
behavior even if __STDC_IEC_60559_BFP__ were not pre#defined.

...
>>> (LDBL_MAX has exponent e = e_max + 1.)
> 
>> That doesn't work. 5.2.4.2.2p2 and p3 both specify that floating point
>> numbers must have e_min <= e && e <= e_max.
> 
> Yes, *floating-point numbers*.
> 
>> LDBL_MAX is defined as the "maximum finite floating point number".
> 
> I'd see this as a defect in N2731. As I was saying earlier, the
> standard does not use "floating-point number" in a consistent way.
> This was discussed, but it seems that not everything was fixed.
> As an attempt to clarify this point, "normalized" was added, but
> this may not have been the right thing.
> 
> The purpose of LDBL_MAX is to be able to be a finite value larger
> than LDBL_NORM_MAX,

No, LDBL_MAX is allowed to be larger than LDBL_NORM_MAX, but the
committee made it clear that they expected LDBL_MAX and LDBL_NORM_MAX to
have the same value on virtually all real-world implementations.

> ... which is the maximum floating-point number
> following the 5.2.4.2.2p3 definition. LDBL_NORM_MAX was introduced> precisely because LDBL_MAX does not necessarily follow the model
> of 5.2.4.2.2p3 (i.e. LDBL_MAX isn't necessarily a floating-point
> number).

I don't believe that was the intent. I believe that the standard was
saying precisely what it meant to say when describing LDBL_MAX as the
largest finite floating point number, while describing LDBL_NORM_MAX as
the largest finite normalized floating point number. What precisely are
the definitions for those two macros that you think the committee
intended to describe?

[toc] | [prev] | [next] | [standalone]

#6378

From	Vincent Lefevre <vincent-news@vinc17.net>
Date	2021-11-12 23:17 +0000
Message-ID	<20211112221035$2893@zira.vinc17.org>
In reply to	#6376

In article <smgu08$3r1$1@dont-email.me>,
  James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

> On 11/10/21 7:48 AM, Vincent Lefevre wrote:
> > In article <smecfc$jai$1@dont-email.me>,
> ...
> >>> ... The value nextdown(DBL_MAX) does not make much
> >>> sense when the implementation *knows* that the value is larger than
> >>> DBL_MAX because it exceeds the range (there is a diagnostic to tell
> >>> that to the user because of 6.4.4p2).
> > 
> >> You misunderstand the purpose of the specification in 6.4.4.2p4. It was
> >> not intended that a floating point implementation would generate the
> >> nearest representable value, and that the implementation of C would then
> >> arbitrarily chose to pick one of the other two adjacent representable
> >> values. The reason was to accommodate floating point implementations
> >> that couldn't meet the accuracy requirements of IEC 60559.
> > 
> > You didn't understand. I repeat. The implementation *knows* that the
> > value is larger than DBL_MAX. This knowledge is *required* by the C
> > standard so that the required diagnostic can be emitted (due to the
> > constraint in 6.4.4p2). So there is no reason that the implementation
> > would assume that the value can be less than DBL_MAX.
> > 
> > This is not an accuracy issue, or if there is one, it occurs at the
> > level of the 6.4.4p2 constraint.

> I assume we've been talking about implementations that conform to the C
> standard, right? Otherwise there's nothing meaningful that can be said.

The issue is more related to (strictly) conforming programs.

> 6.4.4.2p4 describes accuracy requirements that allow the result you find
> objectionable. I've been talking about the fact that those requirements
> are a little bit more lenient than those imposed by IEEE 754, because
> those looser requirements allow a slightly simpler implementation, one
> which might use up less code space or execute somewhat faster, at the
> cost of lower accuracy.

Except that with what you assume, it does not make the implementation
simpler: If an implementation has determined that a value is
larger than DBL_MAX (from 6.4.4p2), I don't see why allowing the
implementation to return a value less than DBL_MAX would make it
simpler.

[...]
> As you should see, the maximum error allowed by the C standard is not
> enormously larger than the maximum error allowed by IEEE 754.

I know, but the accuracy is not the main issue here. The main issue
is *consistency*. If an implementation says at the same time that a
value is considered being strictly larger than DBL_MAX and strictly
smaller than DBL_MAX, then something is wrong! Note: by "considered",
I mean than the implementation may be inaccurate when evaluating the
value (but there's only one evaluation attempt).

> You're worried about the possibility of an implementation conforming to
> the C standard by returning nextdown(DBL_MAX), despite the fact that, in
> order to conform, the implementation would also have to generate that
> diagnostic message?

Yes.

> This means that there must be a block of code in the compiler
> somewhere, which issues that diagnostic, and which only gets
> executed when that constraint is violated, but for some reason the
> implementor choose not to add code to that block to set the value to
> DBL_MAX. If you're worried about that possibility, that implies that
> you can imagine a reason why someone might do that. What might that
> reason be?

I don't see why the C standard would allow an implementation to make
results inconsistent... unless the diagnostic in 6.4.4p2 is regarded
as undefined behavior.

> For the sake of argument, let's postulate that a given implementor does
> in fact have some reason to do that. If that's the case, there's
> something I can guarantee to you: that implementor considers such an
> error to be acceptably small, and believes that a sufficiently large
> fraction of the users of his implementation will agree. If the
> implementor is wrong about that second point, people will eventually
> stop using his implementation. If he's right about that point - if both
> he and the users of his implementation consider such inaccuracy
> acceptable - why should he change his implementation just because you
> consider it unacceptable? You wouldn't be a user of such an
> implementation anyway, right?

Wrong reasoning.

1. The implementor doesn't necessarily know all the possible issues
with his choices. That's why standards should give restrictions and
do that when needed, in many cases.

2. A bit related to (1), the implementor doesn't know all programs.

3. When there are issues in implementations, people don't stop using
such implementations. See the number of GCC bugs... for most of them,
much worse than the above issue.

A bit similar to the above issue, if x and y have the same value,
sin(x) and sin(y) may give different results due to different
contexts (which is initially just an accuracy issue, and ditto
with other math functions), and because of that, with GCC, one can
get an integer variable that appears to have two different values
at the same time:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102930

What I mean here is that in practice, inaccuracy can yield more
serious issues than just inaccurate results.

> > For instance, assuming no IEEE 754 support, what is the behavior of
> > the following code?
> > 
> >   static double x = DBL_MAX + DBL_MAX;

> That involves addition, and is therefore covered by 5.2.4.2.2p8, which I
> quoted in my previous message..

My point was not about the accuracy, but the fact that the result
is out of range. Assume that the accuracy is large enough so that
the result is actually out of range (this is the case everywhere,
I suppose).

> > If one ignores 6.5p5 because this is a translation-time computation,
> > I find the standard rather ambiguous on what is required.

> Floating point constants are required to be evaluated as-if at
> translation-time.
> Constant expressions are permitted to be evaluated at translation-time,
> but it is not required.

Even with "static"??? If evaluation is not done at translation-time,
how can the implementation know whether to generate a diagnostic
due to 6.6p4 ("Each constant expression shall evaluate to a constant
that is in the range of representable values for its type.")?

> > And what about the following?
> > 
> >   static int i = 2 || 1 / 0;

> Integer division is far more tightly constrained by the C standard than
> floating point division (it would be really difficult, bordering on
> impossible, for something to constrain floating point division more
> loosely than the C standard does).

Without Annex F, it isn't. But this wasn't the point. The point is
that "1 / 0" is not regarded as a constant expression, just because
the constraint 6.6p4 is not satisfied. And with the same argument,
one may consider that 1.0e99999 (out of range in practice) is not
regarded as a constant, so that the rules associated with constants
will not apply, thus implying undefined behavior.

> ...
> >>> (LDBL_MAX has exponent e = e_max + 1.)
> > 
> >> That doesn't work. 5.2.4.2.2p2 and p3 both specify that floating point
> >> numbers must have e_min <= e && e <= e_max.
> > 
> > Yes, *floating-point numbers*.
> > 
> >> LDBL_MAX is defined as the "maximum finite floating point number".
> > 
> > I'd see this as a defect in N2731. As I was saying earlier, the
> > standard does not use "floating-point number" in a consistent way.
> > This was discussed, but it seems that not everything was fixed.
> > As an attempt to clarify this point, "normalized" was added, but
> > this may not have been the right thing.
> > 
> > The purpose of LDBL_MAX is to be able to be a finite value larger
> > than LDBL_NORM_MAX,

> No, LDBL_MAX is allowed to be larger than LDBL_NORM_MAX,

This is not possible, because LDBL_NORM_MAX is the maximum value of
the set of all (normalized) floating-point numbers (i.e. the numbers
that satisfy the model 5.2.4.2.2p3), LDBL_MAX is the maximum value
of the set of all finite numbers, and the former set is a subset of
the latter set.

> but the committee made it clear that they expected LDBL_MAX and
> LDBL_NORM_MAX to have the same value on virtually all real-world
> implementations.

This is not true for double-double, which exists in practice.

> > ... which is the maximum floating-point number
> > following the 5.2.4.2.2p3 definition. LDBL_NORM_MAX was introduced> precisely because LDBL_MAX does not necessarily follow the model
> > of 5.2.4.2.2p3 (i.e. LDBL_MAX isn't necessarily a floating-point
> > number).

> I don't believe that was the intent.

It was. See N2092[*], in particular:

  [*_NORM_MAX macros]

  Existing practice

  For most implementations, these three macros will be the same as the
  corresponding *_MAX macros. The only known case where that is not
  true is those where long double is implemented as a pair of doubles
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  (and then only LDBL_MAX will differ from LDBL_NORM_MAX).

[*] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2092.htm

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

[toc] | [prev] | [next] | [standalone]

#6381

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2021-11-12 21:03 -0500
Message-ID	<smn6d8$c92$1@dont-email.me>
In reply to	#6378

On 11/12/21 6:17 PM, Vincent Lefevre wrote:
> In article <smgu08$3r1$1@dont-email.me>,
>   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
...
>> I assume we've been talking about implementations that conform to the C
>> standard, right? Otherwise there's nothing meaningful that can be said.
> 
> The issue is more related to (strictly) conforming programs.

It can't be. Strictly conforming programs are prohibited from having
output that depends upon behavior that the standard leaves unspecified.
6.4.4.2p4 identifies what is usually three different possible values for
each floating point constant (four if the constant describes a value
exactly half-way between two consecutive representable values, but only
two if it describes a value larger than DBL_MAX or smaller than -DBL_MAX
on a platform that doesn't support infinities), and leaves it
unspecified which one of those values is chosen. Since that is precisely
the freedom of choice that you're complaining about, we can't be
discussing strictly conforming programs - if your program's output
didn't depend upon which choice was made, you'd have no reason to worry
about which choice was made.

And at this point, I've officially grown weary of this discussion, and
am bowing out.

[toc] | [prev] | [next] | [standalone]

#6382

From	Vincent Lefevre <vincent-news@vinc17.net>
Date	2021-11-15 09:18 +0000
Message-ID	<20211115085704$f7e0@zira.vinc17.org>
In reply to	#6381

In article <smn6d8$c92$1@dont-email.me>,
  James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

> On 11/12/21 6:17 PM, Vincent Lefevre wrote:
> > In article <smgu08$3r1$1@dont-email.me>,
> >   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> ...
> >> I assume we've been talking about implementations that conform to the C
> >> standard, right? Otherwise there's nothing meaningful that can be said.
> > 
> > The issue is more related to (strictly) conforming programs.

> It can't be. Strictly conforming programs are prohibited from having
> output that depends upon behavior that the standard leaves unspecified.

This is not how I interpret the standard. Otherwise there would be
an obvious contradiction with note 3, which uses

#ifdef __STDC_IEC_559__

while the value of __STDC_IEC_559__ is not specified in the standard.

What matters is that the program needs to take every possibility into
account and make sure that the (visible) behavior is the same in each
case. So...

> 6.4.4.2p4 identifies what is usually three different possible values for
> each floating point constant (four if the constant describes a value
> exactly half-way between two consecutive representable values, but only
> two if it describes a value larger than DBL_MAX or smaller than -DBL_MAX
> on a platform that doesn't support infinities), and leaves it
> unspecified which one of those values is chosen.

The program can deal with that in order to get the same behavior in
each case, so that it could be strictly conforming. However, if the
behavior is undefined (assumed as a consequence of the failed
constraint), there is *nothing* that one can do.

That said, since the floating-point accuracy is not specified, can be
extremely low and is not even checkable by the program (so that there
is no possible fallback in case of low accuracy), there is not much
one can do with floating point.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

[toc] | [prev] | [next] | [standalone]

#6384

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2021-11-15 14:25 -0500
Message-ID	<smuc7i$6hq$1@dont-email.me>
In reply to	#6382

On 11/15/21 4:18 AM, Vincent Lefevre wrote:
> In article <smn6d8$c92$1@dont-email.me>,
>   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> 
>> On 11/12/21 6:17 PM, Vincent Lefevre wrote:
...
>>> The issue is more related to (strictly) conforming programs.
> 
>> It can't be. Strictly conforming programs are prohibited from having
>> output that depends upon behavior that the standard leaves unspecified.
> 
> This is not how I interpret the standard.

I don't see how there's room for interpretation: "A strictly conforming
program ... shall not produce output dependent on any unspecified ...
behavior, ..." (4p6).

> Otherwise there would be
> an obvious contradiction with note 3, which uses

> 
> #ifdef __STDC_IEC_559__
> 
> while the value of __STDC_IEC_559__ is not specified in the standard.
> 
> What matters is that the program needs to take every possibility into
> account and make sure that the (visible) behavior is the same in each
> case. So...

The example in that footnote is based upon the fact that it's
unspecified whether the macro FE_UPWARD is #defined in <fenv.h>. The
call to fesetround(FE_UPWARD) would refer to an undeclared identifier if
it wasn't. The technique shown in Footnote 3 ensures that fsetround()
doesn't even get called unless __STDC_IEC_60559_BFP__ is already
#defined, thereby ensuring that FE_UPWARD is #defined, and as a result
the output doesn't change just because that call is made. Note: it would
have been better to write

#ifdef FE_UPWARD
    fesetround(FE_UPWARD);
#endif

Implementations that don't fully support IEC 60559 might still support
FE_UPWARD.

The example code in that footnote is, however, rather badly chosen,
because it's pretty nearly impossible to make any meaningful use of
floating point operations without producing output that depends upon
things that are unspecified. While the technique shown in Footnote 3
does prevent the call to fesetround() from being problematic in itself,
any situation where the developer cares about the rounding direction
implies that the output from the program will depend upon how rounding
is performed. If that weren't the case, why bother calling it?

That's even more true of __STD_IEC_60559_BFP__. Any program that does
anything with floating point values other than comparing floating point
constants for relative order might have greatly different output
depending upon whether an implementation conforms to IEC_60599, or takes
maximal advantage of the freedom the standard gives them when
__STDC_IEC_60559_BFP__ is not pre#defined. You can write code that
doesn't care whether LDBL_MAX - LDBL_MIN > LDBL_MIN - LDBL_MAX is true
or false, but only by, for all practical purposes, making no meaningful
use of floating point operations.

>> 6.4.4.2p4 identifies what is usually three different possible values for
>> each floating point constant (four if the constant describes a value
>> exactly half-way between two consecutive representable values, but only
>> two if it describes a value larger than DBL_MAX or smaller than -DBL_MAX
>> on a platform that doesn't support infinities), and leaves it
>> unspecified which one of those values is chosen.
> 
> The program can deal with that in order to get the same behavior in
> each case, so that it could be strictly conforming.

Agreed - and if your program were so written, you'd have no cause to
complain about which of the three was chosen. But you are complaining
about the possibility that a different one might be chosen than the one
you think should be.

> ... However, if the
> behavior is undefined (assumed as a consequence of the failed
> constraint), there is *nothing* that one can do.

Yes, but nowhere does the standard specify that violating a constraint
does, in itself, render the behavior undefined. Most constraint
violations do render the behavior undefined "by omission of any explicit
definition of the behavior", but not this one. You might not like the
definition that 6.4.4.2p4 provides, but it does provide one.

> That said, since the floating-point accuracy is not specified, can be
> extremely low and is not even checkable by the program (so that there
> is no possible fallback in case of low accuracy), there is not much
> one can do with floating point.

??? You can check for __STDC_IEC_60559_BFP__; if it's defined, then
pretty much the highest possible accuracy is required.
Are you worried about __STDC_IEC_60559_BFP__ being falsely pre#defined?
Accuracy lower that required by IEC 60559 is pretty easily detected,
unless an implementation takes truly heroic efforts to cover it up. To
render the inaccuracy uncheckable would require almost as much hard work
and ingenuity as producing the right result.

[toc] | [prev] | [next] | [standalone]

#6387

From	Vincent Lefevre <vincent-news@vinc17.net>
Date	2021-11-16 01:17 +0000
Message-ID	<20211115234025$15a2@zira.vinc17.org>
In reply to	#6384

In article <smuc7i$6hq$1@dont-email.me>,
  James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

> On 11/15/21 4:18 AM, Vincent Lefevre wrote:
> > In article <smn6d8$c92$1@dont-email.me>,
> >   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> > 
> >> On 11/12/21 6:17 PM, Vincent Lefevre wrote:
> ...
> >>> The issue is more related to (strictly) conforming programs.
> > 
> >> It can't be. Strictly conforming programs are prohibited from having
> >> output that depends upon behavior that the standard leaves unspecified.
> > 
> > This is not how I interpret the standard.

> I don't see how there's room for interpretation: "A strictly conforming
> program ... shall not produce output dependent on any unspecified ...
> behavior, ..." (4p6).

I'm not sure what you intended to mean, but IMHO, the "It can't be."
is wrong based on the unsatisfied constraint and definition 3.8 of
"constraint" (but this should really be clarified).

[...]
> > ... However, if the
> > behavior is undefined (assumed as a consequence of the failed
> > constraint), there is *nothing* that one can do.

> Yes, but nowhere does the standard specify that violating a constraint
> does, in itself, render the behavior undefined. Most constraint
> violations do render the behavior undefined "by omission of any explicit
> definition of the behavior", but not this one. You might not like the
> definition that 6.4.4.2p4 provides, but it does provide one.

But the fact that a restriction is not fulfilled (definition 3.8)
is what matters.

Another example:

    6.5.2.2 Function calls

    Constraints
[...]
  2 If the expression that denotes the called function has a type that
    includes a prototype, the number of arguments shall agree with the
    number of parameters. [...]

IMHO, if one provides an additional argument, this is undefined
behavior, even though the semantics describe the behavior in this
case.

Another one:

    6.5.3.3 Unary arithmetic operators

    Constraints
[...]
  1 The operand of the unary + or - operator shall have arithmetic type
    [...]

Even though the semantics for +X still makes sense for any object type,
IMHO, this is undefined behavior if X does not have an arithmetic type.

It happens that the compilers reject such code. But what if they
chose not to reject it? Would they be forced to use the defined
semantics or be allowed to have some other behavior as an extension?
I would say the latter.

The example 6.5.16.1p6 regards a constraint violation as invalid code.

> > That said, since the floating-point accuracy is not specified, can be
> > extremely low and is not even checkable by the program (so that there
> > is no possible fallback in case of low accuracy), there is not much
> > one can do with floating point.

> ??? You can check for __STDC_IEC_60559_BFP__; if it's defined, then
> pretty much the highest possible accuracy is required.

Indeed, well, almost I think. One should also check that
FLT_EVAL_METHOD is either 0 or 1. Otherwise the accuracy
becomes unknown.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

[toc] | [prev] | [next] | [standalone]

#6392

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2021-11-16 10:29 -0500
Message-ID	<sn0iog$gna$1@dont-email.me>
In reply to	#6387

On 11/15/21 8:17 PM, Vincent Lefevre wrote:
> In article <smuc7i$6hq$1@dont-email.me>,
>   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
> 
>> On 11/15/21 4:18 AM, Vincent Lefevre wrote:
>>> In article <smn6d8$c92$1@dont-email.me>,
>>>   James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
>>>
>>>> On 11/12/21 6:17 PM, Vincent Lefevre wrote:
>> ...
>>>>> The issue is more related to (strictly) conforming programs.
>>>
>>>> It can't be. Strictly conforming programs are prohibited from having
>>>> output that depends upon behavior that the standard leaves unspecified.
>>>
>>> This is not how I interpret the standard.
> 
>> I don't see how there's room for interpretation: "A strictly conforming
>> program ... shall not produce output dependent on any unspecified ...
>> behavior, ..." (4p6).
> 
> I'm not sure what you intended to mean, but IMHO, the "It can't be."
> is wrong based on the unsatisfied constraint and definition 3.8 of
> "constraint" (but this should really be clarified).

I said that this issue cannot be "related to (strictly) conforming
programs". This issue can't come up in strictly conforming programs, nor
does the way in which this issue might be resolved have any effect on
whether a program qualifies as strictly conforming. Not only is there
inherently a constraint violation, but the value of such a constant
would be unspecified even if there were no constraint, and the only
reason to care about which value is selected by the implementation would
be if the value affects the observable behavior of your program, which
would mean that it's not strictly conforming.

...
>>> That said, since the floating-point accuracy is not specified, can be
>>> extremely low and is not even checkable by the program (so that there
>>> is no possible fallback in case of low accuracy), there is not much
>>> one can do with floating point.
> 
>> ??? You can check for __STDC_IEC_60559_BFP__; if it's defined, then
>> pretty much the highest possible accuracy is required.
> 
> Indeed, well, almost I think. One should also check that
> FLT_EVAL_METHOD is either 0 or 1. Otherwise the accuracy
> becomes unknown.

A value of 2 tells you that the implementation will evaluate "all
operations and constants to the range and precision of the long double
type", which is pretty specific about what the accuracy is. It has
precisely the same accuracy that it would have had on an otherwise
identical implementation where FLT_EVAL_METHOD == 0, if you explicitly
converted all double operands to long double, and then converted the
final result back to double. Would you consider the accuracy of such
code to be unknown?

A value of -1 leaves some uncertainty about the accuracy. However, the
evaluation format is allowed to have range or precision that is greater
than that of the expression's type. The accuracy of such a type might be
greater than that of the expression's type, but it's not allowed to be
worse. That's far less uncertainty than what is allowed if
__STDC_IEC_60559_BFP__ is NOT pre#defined by the implementation.

[toc] | [prev] | [next] | [standalone]

#6396

From	Vincent Lefevre <vincent-news@vinc17.net>
Date	2021-12-08 10:09 +0000
Message-ID	<20211208095434$6b03@zira.vinc17.org>
In reply to	#6392

Sorry for the late reply (not much time ATM).

In article <sn0iog$gna$1@dont-email.me>,
  James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

[...]
> >> ??? You can check for __STDC_IEC_60559_BFP__; if it's defined, then
> >> pretty much the highest possible accuracy is required.
> > 
> > Indeed, well, almost I think. One should also check that
> > FLT_EVAL_METHOD is either 0 or 1. Otherwise the accuracy
> > becomes unknown.

> A value of 2 tells you that the implementation will evaluate "all
> operations and constants to the range and precision of the long double
> type", which is pretty specific about what the accuracy is. It has
> precisely the same accuracy that it would have had on an otherwise
> identical implementation where FLT_EVAL_METHOD == 0, if you explicitly
> converted all double operands to long double, and then converted the
> final result back to double. Would you consider the accuracy of such
> code to be unknown?

Simply because the accuracy of long double is unknown and may be lower
than the one of float. Annex F says for long double:

    F.2 Types
    [...]
    The long double type matches an IEC 60559 extended format,363) else
    a non-IEC 60559 extended format, else the IEC 60559 double format.

    Any non-IEC 60559 extended format used for the long double type
    shall have more precision than IEC 60559 double and at least the
    range of IEC 60559 double.364) The value of FLT_ROUNDS applies to
    all IEC 60559 types supported by the implementation, but need not
    apply to non-IEC 60559 types.

Just consider a non-IEC 60559 extended format. Note that the standard
says that it shall have more *precision* than IEC 60559 double, but
does not say anything about accuracy.

> A value of -1 leaves some uncertainty about the accuracy. However, the
> evaluation format is allowed to have range or precision that is greater
> than that of the expression's type. The accuracy of such a type might be
> greater than that of the expression's type, but it's not allowed to be
> worse.

I don't see where the standard says that it's not allowed to be worse.
One just has:

    5.2.4.2.2 Characteristics of floating types <float.h>
    [...]
  6 The accuracy of the floating-point operations (+, -, *, /) and
    of the library functions in <math.h> and <complex.h> that return
    floating-point results is implementation-defined, as is the
    accuracy of the conversion between floating-point internal
    representations and string representations performed by the
    library functions in <stdio.h>, <stdlib.h>, and <wchar.h>. The
    implementation may state that the accuracy is unknown.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

[toc] | [prev] | [next] | [standalone]

#6390

From	Derek Jones <derek@NOSPAM-knosof.co.uk>
Date	2021-11-16 11:32 +0000
Message-ID	<bMmdnUhrv496Cw78nZ2dnUU78K_NnZ2d@brightview.co.uk>
In reply to	#6384

All,

> I don't see how there's room for interpretation: "A strictly conforming
> program ... shall not produce output dependent on any unspecified ...
> behavior, ..." (4p6).

Indeed.

Now the order of evaluation of binary operators is
unspecified.  But this does not mean that all programs containing
at least one binary operator is not strictly conforming.

For instance, the order of evaluation of the two
operands in the following expression-statement is unspecified.
But unless they are volatile qualified the output does
not depend on the unspecified behavior:

x+y;

But in:

a[printf("Hello")]+a[printf(" World")];

the output does depend on the order of evaluation,
and a program containing this code is not strictly conforming.

>> #ifdef __STDC_IEC_559__
>>
>> while the value of __STDC_IEC_559__ is not specified in the standard.

The output of a strictly conforming program does not depend on the
implementation used.

Since the value of __STDC_IEC_559__ depends on the implementation,
its use can produce a program that is not strictly conforming.

ps.  This whole discussion has been very interesting.

[toc] | [prev] | [next] | [standalone]

#6393

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2021-11-16 10:35 -0500
Message-ID	<sn0j44$jcv$1@dont-email.me>
In reply to	#6390

On 11/16/21 6:32 AM, Derek Jones wrote:
> All,
> 
>> I don't see how there's room for interpretation: "A strictly conforming
>> program ... shall not produce output dependent on any unspecified ...
>> behavior, ..." (4p6).
> 
> Indeed.
> 
> Now the order of evaluation of binary operators is
> unspecified.  But this does not mean that all programs containing
> at least one binary operator is not strictly conforming.
> 
> For instance, the order of evaluation of the two
> operands in the following expression-statement is unspecified.
> But unless they are volatile qualified the output does
> not depend on the unspecified behavior:
> 
> x+y;
> 
> But in:
> 
> a[printf("Hello")]+a[printf(" World")];
> 
> the output does depend on the order of evaluation,
> and a program containing this code is not strictly conforming.
> 
>>> #ifdef __STDC_IEC_559__
>>>
>>> while the value of __STDC_IEC_559__ is not specified in the standard.
> 
> The output of a strictly conforming program does not depend on the
> implementation used.
> 
> Since the value of __STDC_IEC_559__ depends on the implementation,
> its use can produce a program that is not strictly conforming.

Agreed. But it also can produce program that is strictly conforming,
just like your example of x+y above.

However, given how horrible the accuracy requirements are when
__STDC_IEC_60559_BFP__ is not pre#defined, the only way that a program
could make any meaningful use of floating point and still be strictly
conforming is if it limits such use to comparing floating point
constants for relative order - and even then, that's only true if the
constants are sufficiently far apart in value to guarantee the result of
that comparison.

[toc] | [prev] | [next] | [standalone]

Page 3 of 4 — ← Prev page 1 2 [3] 4 Next page →

csiph-web

contradiction about the INFINITY macro

Contents

#6363

#6364

#6365

#6366

#6367

#6368

#6369

#6370

#6372

#6373

#6376

#6378

#6381

#6382

#6384

#6387

#6392

#6396

#6390

#6393