Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.c > #152458

Re: longer 'char literals' meaning in c

From Tim Rentsch <tr.17687@z991.linuxsc.com>
Newsgroups comp.lang.c
Subject Re: longer 'char literals' meaning in c
Date 2020-05-24 18:19 -0700
Organization A noiseless patient Spider
Message-ID <867dx0u7zq.fsf@linuxsc.com> (permalink)
References (10 earlier) <r8t7m9$8o1$1@z-news.wcss.wroc.pl> <878si5225f.fsf@nosuchdomain.example.com> <r8v1a3$b86$1@z-news.wcss.wroc.pl> <87lfm4ztok.fsf@nosuchdomain.example.com> <MIb7Z5jPf5oxcfbROg@bongo-ra.co>

Show all headers | View raw


Spiros Bousbouras <spibou@gmail.com> writes:

> On Wed, 06 May 2020 13:37:15 -0700
> Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>
>> As far as multicharacter constants are concerned, I disagree with Tim
>> Rentsch's argument, though the standard is not entirely clear (perhaps
>> because it's an edge case that the authors didn't carefully consider).
>> In my opinion, since the standard says that the value of a
>> multicharacter constant is of type int and is implementation-defined,
>> it's reasonable to think that the intent is to require that the value
>> must be representable as an int.  Otherwise it's not of type int, which
>> violates the requirement.
>
> Yes but is it the responsibility of the application or the implementation
> to make sure it is representable as an int ? When 6.4.4 paragraph 2 says
>
>     Each constant shall have a type and the value of a constant shall be
>     in the range of representable values for its type.
>
> whose responsibility is to satisfy this constraint ?

Surely every statement given in a Constraints section is meant as
an obligation on programs, not on implementations.  It would be
silly for the Standard to say that violating a constraint must
result in a diagnostic, and then have a constraint apply to the
implementation itself.  What would the diagostic message be?
"Oops, I'm doing something I'm not allowed to do"?  Seems
somewhat reminiscent of Russell's paradox.

> So a summary of the 2 interpretations which have been presented , as I
> understand them , is as follows :
>
> 1. The documentation of the implementation must describe an algorithm [*]
> which for every "integer character constant containing more than one
> character" (MT1CC for short) returns an integer value.  It's the responsi-
> bility of the programmer to make sure that for each MT1CC appearing in his
> code , the resulting value is in the range of  int ; if he fails to do so ,
> it is a constraint violation with all that entails.
>
> 2. The documentation of the implementation must describe an algorithm which
> for every MT1CC returns an integer value in the range of  int .
>
> I am undecided which one I prefer.

TO me it seems clear that (1) is what was meant.  Otherwise the
constraint in 6.4.4 p2 that "the value of a constant shall be in
the range of representable values for its type" is pointless.  (I
should add here that technically it is (possibly) not completely
pointless since there is a strange corner case where it might
otherwise be relevant;  it seems unlikely though that this part
of 6.4.4 p2 was left in just to handle this one unusual corner
case.)

> I would also be interested to know how people think that an implementation
> with a 16-bit  int  should handle '\U000FFFFF' (Tim's example).  6.4.4.4
> paragraph 10 says
>
>     If an integer character constant contains a single character or escape
>     sequence, its value is the one that results when an object with type char
>     whose value is that of the single character or escape sequence is
>     converted to type int.
>
> Does this mean that its value should fit in a  char ?

The \U000FFFFF is not an escape sequence but a universal character
name.  There is no requirement that a UCN maps to a single-byte
execution character.  If it doesn't then this example is just like
'ab'.  If it does then the result is either a perfectly good 'int'
value, or an implementation-defined signal, which would like the
implementation do whatever it wants (presumably, although that
might depend on the implementation's documentation).

>> If a future edition of the standard were to say that a multicharacter
>> constant may be rejected if it's too long (worded more clearly than
>> I just did), I would have no problem with that.  For that matter,
>> if an official response to a DR said the same thing, I would accept
>> it.  I just don't see any such permission in the standard as it's
>> currently written.  If the authors had intended to allow overly
>> wide character constants to be treated as constraint violations,
>> they could have said so explicitly.
>
> It's not the length of the MT1CC which is at issue.  The documentation may
> specify for example that  'ab'  gets the value  INT_MAX+1  and every other
> MT1CC gets the value 0.  Would this be conforming ?  According to interpre-
> tation 1 it would be and it is the responsibility of the programmer not to
> use  'ab'  in his code.  According to interpretation 2 , it's not conforming.

It seems ridiculous to say that the value of a character constant
with more than one character is in effect the result of a hash
function that is always end up in the range of int.  It would be
more sensible that distinct character constants would always have
distinct values, and those whose values are outside the range of
int should be flagged.  In my experience the people who write the
Standard almost always expect a common sense interpretation of
what text appears in the Standard.  Because interpretation (2)
would violate that principle it seems unlikely to be what the
authors meant.

Back to comp.lang.c | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Re: longer 'char literals' meaning in c antispam@math.uni.wroc.pl - 2020-05-05 00:47 +0000
  Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-04 20:08 -0700
  Re: longer 'char literals' meaning in c James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-05-04 23:52 -0400
    Re: longer 'char literals' meaning in c antispam@math.uni.wroc.pl - 2020-05-05 17:10 +0000
      Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-05 13:41 -0700
        Re: longer 'char literals' meaning in c antispam@math.uni.wroc.pl - 2020-05-05 22:29 +0000
          Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-05 16:18 -0700
            Re: longer 'char literals' meaning in c antispam@math.uni.wroc.pl - 2020-05-06 02:33 +0000
              Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-05 20:06 -0700
                Re: longer 'char literals' meaning in c antispam@math.uni.wroc.pl - 2020-05-06 18:57 +0000
                Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-06 13:37 -0700
                Re: longer 'char literals' meaning in c Spiros Bousbouras <spibou@gmail.com> - 2020-05-07 13:48 +0000
                Re: longer 'char literals' meaning in c Spiros Bousbouras <spibou@gmail.com> - 2020-05-07 13:58 +0000
                Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-24 17:36 -0700
                Re: longer 'char literals' meaning in c Spiros Bousbouras <spibou@gmail.com> - 2020-05-25 00:56 +0000
                Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-29 22:53 -0700
                Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-07 18:01 -0700
                Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-30 10:53 -0700
                Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-30 15:11 -0700
                Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-24 18:19 -0700
                Re: longer 'char literals' meaning in c James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-05-24 20:06 -0700
                Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-06-28 04:51 -0700
                Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-26 07:02 -0700
      Re: longer 'char literals' meaning in c James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-05-05 20:18 -0400
        Re: longer 'char literals' meaning in c antispam@math.uni.wroc.pl - 2020-05-06 20:01 +0000
          Re: longer 'char literals' meaning in c James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-05-06 18:20 -0400
            Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-06-28 04:42 -0700
    Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-06-03 09:08 -0700

csiph-web