Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.c > #152540

Re: longer 'char literals' meaning in c

From Tim Rentsch <tr.17687@z991.linuxsc.com>
Newsgroups comp.lang.c
Subject Re: longer 'char literals' meaning in c
Date 2020-05-30 10:53 -0700
Organization A noiseless patient Spider
Message-ID <86mu5pqpg6.fsf@linuxsc.com> (permalink)
References (11 earlier) <878si5225f.fsf@nosuchdomain.example.com> <r8v1a3$b86$1@z-news.wcss.wroc.pl> <87lfm4ztok.fsf@nosuchdomain.example.com> <MIb7Z5jPf5oxcfbROg@bongo-ra.co> <87mu6jtf2s.fsf@nosuchdomain.example.com>

Show all headers | View raw


Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

> Spiros Bousbouras <spibou@gmail.com> writes:
>
>> On Wed, 06 May 2020 13:37:15 -0700
>> Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>>
>>> As far as multicharacter constants are concerned, I disagree with
>>> Tim Rentsch's argument, though the standard is not entirely clear
>>> (perhaps because it's an edge case that the authors didn't
>>> carefully consider).  In my opinion, since the standard says that
>>> the value of a multicharacter constant is of type int and is
>>> implementation-defined, it's reasonable to think that the intent
>>> is to require that the value must be representable as an int.
>>> Otherwise it's not of type int, which violates the requirement.
>>
>> Yes but is it the responsibility of the application or the
>> implementation to make sure it is representable as an int ? When
>> 6.4.4 paragraph 2 says
>>
>>     Each constant shall have a type and the value of a constant
>>     shall be in the range of representable values for its type.
>>
>> whose responsibility is to satisfy this constraint ?
>
> That constraint can be violated by a program that includes an
> integer constant outside the range of any normal or extended
> integer type,

That's right, because such a constant has no type, and thus it
violates the first part of 6.4.4 p2.

> or a floating constant outside the range of its type (the type
> of a floating constant is determined by its syntax, not by its
> value),

This is wrong.  The value of a floating constant is always a
representable value of the constant's type, even if the mathematical
value of the constant lies outside the range of what the type can
represent.  See 6.4.4.2, paragraph 3.  There is always a nearest
representable value, no matter what the mathematical value is.

> or by a character constant with an octal or hexadecimal escape
> sequence outside the range of its type (unsigned char for a
> character constant with no prefix).

This too is wrong.  Such an escape sequence already violates a
different constraint, the one given in 6.4.4.4 paragraph 9.  Once
that constraint is violated, the program has undefined behavior,
and it's meaningless to talk about whether 6.4.4 p2 is violated.

> It's not clear to me whether an overly long multi-character
> constant can violate it.  It certainly can't *if* the
> implementation-defined mapping maps all multi-character constants
> to values within the range of int.
>
> My argument is basically that an implementation-defined mapping that
> maps 'ABCDE' to a value outside the range of int violates the
> requirement (not a constraint) that "An integer character constant
> has type int."  A value exceeding INT_MAX does not have type int.

It appears you are seeing something that isn't there.  The
Standard says that an integer character constant has type int.
It does not say the /value/ has type int.  A value is just a
value.  It is because the (typeless) value may lie outside
the range of what int can represent that 6.4.4 p2 is there.

Look now at the situation we're in.  Integer constants cannot
violate the second part of 6.4.4p2.  Floating constants cannot
violate 6.4.4p2.  Enumeration constants cannot violate 6.4.4p2
(because of the constraint in 6.7.2.2 p2, we can be sure that
their values are representable as int).  All this leaves, for
the second part of 6.4.4p2, is character constants.  Any valid
single-byte character constant cannot violate 6.4.4p2.  The only
case left is multi-character character constants.  So it must be
possible for the value of a multi-character character constant
to be one that is not representable in its type, otherwise the
second part of 6.4.4p2 is pointless.

I should put in a note here that there is a funny corner case for
single-byte character constants.  In an implementation where
UCHAR_MAX > INT_MAX, a hexadecimal escape sequence (but not an
octal escape sequence) may satisfy the constraint in 6.4.4.4p9
yet still be larger than INT_MAX.  Under C90 rules, there is no
problem:  the value is converted to int, the result of which
conversion is always a representable value of int.  In C99 and
later, converting a value that is outside the range of int may
generate an implementation-defined signal, which presumably means
undefined behavior if it occurs at compile time.  If there were
such a case it would be reasonable to interpret it as violating
the second part of 6.4.4p2.  At the same time, it seems very
unlikely that the second part of 6.4.4p2 was left in solely to
cover cases that are (a) very unusual, and (b) in some sense
generate no value at all.  What seems more likely is that such
unusual cases simply were not thought of, and the second part of
6.4.4 p2 was left in for multi-character character constants,
which are a lot more obvious and also a lot more common.

<snip>

>> It's the responsibility of the programmer to make sure that for
>> each MT1CC appearing in his code , the resulting value is in the
>> range of int ; if he fails to do so , it is a constraint violation
>> with all that entails.
>>
>> 2.  The documentation of the implementation must describe an
>> algorithm which for every MT1CC returns an integer value in the
>> range of int .
>>
>> I am undecided which one I prefer.
>>
>> I would also be interested to know how people think that an
>> implementation with a 16-bit int should handle '\U000FFFFF' (Tim's
>> example).  6.4.4.4 paragraph 10 says
>>
>>     If an integer character constant contains a single character or
>>     escape sequence, its value is the one that results when an
>>     object with type char whose value is that of the single
>>     character or escape sequence is converted to type int.
>>
>> Does this mean that its value should fit in a  char ?
>
> Interesting case.  I'll have to think about it.

The element \U000FFFFF is not an escape sequence but a universal
character name.  As such either it maps to a single-byte member
of the execution character set, in which case its value fits in
a char, or it maps to a multibyte character having more than one
character, in which case it is not a single character and the
quoted sentence does not apply.

Back to comp.lang.c | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Re: longer 'char literals' meaning in c antispam@math.uni.wroc.pl - 2020-05-05 00:47 +0000
  Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-04 20:08 -0700
  Re: longer 'char literals' meaning in c James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-05-04 23:52 -0400
    Re: longer 'char literals' meaning in c antispam@math.uni.wroc.pl - 2020-05-05 17:10 +0000
      Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-05 13:41 -0700
        Re: longer 'char literals' meaning in c antispam@math.uni.wroc.pl - 2020-05-05 22:29 +0000
          Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-05 16:18 -0700
            Re: longer 'char literals' meaning in c antispam@math.uni.wroc.pl - 2020-05-06 02:33 +0000
              Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-05 20:06 -0700
                Re: longer 'char literals' meaning in c antispam@math.uni.wroc.pl - 2020-05-06 18:57 +0000
                Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-06 13:37 -0700
                Re: longer 'char literals' meaning in c Spiros Bousbouras <spibou@gmail.com> - 2020-05-07 13:48 +0000
                Re: longer 'char literals' meaning in c Spiros Bousbouras <spibou@gmail.com> - 2020-05-07 13:58 +0000
                Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-24 17:36 -0700
                Re: longer 'char literals' meaning in c Spiros Bousbouras <spibou@gmail.com> - 2020-05-25 00:56 +0000
                Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-29 22:53 -0700
                Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-07 18:01 -0700
                Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-30 10:53 -0700
                Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-30 15:11 -0700
                Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-24 18:19 -0700
                Re: longer 'char literals' meaning in c James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-05-24 20:06 -0700
                Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-06-28 04:51 -0700
                Re: longer 'char literals' meaning in c James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-06-28 17:45 -0700
                Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-26 07:02 -0700
      Re: longer 'char literals' meaning in c James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-05-05 20:18 -0400
        Re: longer 'char literals' meaning in c antispam@math.uni.wroc.pl - 2020-05-06 20:01 +0000
          Re: longer 'char literals' meaning in c James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-05-06 18:20 -0400
            Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-06-28 04:42 -0700
              Re: longer 'char literals' meaning in c James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-06-28 17:13 -0700
    Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-06-03 09:08 -0700

csiph-web