Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #152540
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Newsgroups | comp.lang.c |
| Subject | Re: longer 'char literals' meaning in c |
| Date | 2020-05-30 10:53 -0700 |
| Organization | A noiseless patient Spider |
| Message-ID | <86mu5pqpg6.fsf@linuxsc.com> (permalink) |
| References | (11 earlier) <878si5225f.fsf@nosuchdomain.example.com> <r8v1a3$b86$1@z-news.wcss.wroc.pl> <87lfm4ztok.fsf@nosuchdomain.example.com> <MIb7Z5jPf5oxcfbROg@bongo-ra.co> <87mu6jtf2s.fsf@nosuchdomain.example.com> |
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes: > Spiros Bousbouras <spibou@gmail.com> writes: > >> On Wed, 06 May 2020 13:37:15 -0700 >> Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >> >>> As far as multicharacter constants are concerned, I disagree with >>> Tim Rentsch's argument, though the standard is not entirely clear >>> (perhaps because it's an edge case that the authors didn't >>> carefully consider). In my opinion, since the standard says that >>> the value of a multicharacter constant is of type int and is >>> implementation-defined, it's reasonable to think that the intent >>> is to require that the value must be representable as an int. >>> Otherwise it's not of type int, which violates the requirement. >> >> Yes but is it the responsibility of the application or the >> implementation to make sure it is representable as an int ? When >> 6.4.4 paragraph 2 says >> >> Each constant shall have a type and the value of a constant >> shall be in the range of representable values for its type. >> >> whose responsibility is to satisfy this constraint ? > > That constraint can be violated by a program that includes an > integer constant outside the range of any normal or extended > integer type, That's right, because such a constant has no type, and thus it violates the first part of 6.4.4 p2. > or a floating constant outside the range of its type (the type > of a floating constant is determined by its syntax, not by its > value), This is wrong. The value of a floating constant is always a representable value of the constant's type, even if the mathematical value of the constant lies outside the range of what the type can represent. See 6.4.4.2, paragraph 3. There is always a nearest representable value, no matter what the mathematical value is. > or by a character constant with an octal or hexadecimal escape > sequence outside the range of its type (unsigned char for a > character constant with no prefix). This too is wrong. Such an escape sequence already violates a different constraint, the one given in 6.4.4.4 paragraph 9. Once that constraint is violated, the program has undefined behavior, and it's meaningless to talk about whether 6.4.4 p2 is violated. > It's not clear to me whether an overly long multi-character > constant can violate it. It certainly can't *if* the > implementation-defined mapping maps all multi-character constants > to values within the range of int. > > My argument is basically that an implementation-defined mapping that > maps 'ABCDE' to a value outside the range of int violates the > requirement (not a constraint) that "An integer character constant > has type int." A value exceeding INT_MAX does not have type int. It appears you are seeing something that isn't there. The Standard says that an integer character constant has type int. It does not say the /value/ has type int. A value is just a value. It is because the (typeless) value may lie outside the range of what int can represent that 6.4.4 p2 is there. Look now at the situation we're in. Integer constants cannot violate the second part of 6.4.4p2. Floating constants cannot violate 6.4.4p2. Enumeration constants cannot violate 6.4.4p2 (because of the constraint in 6.7.2.2 p2, we can be sure that their values are representable as int). All this leaves, for the second part of 6.4.4p2, is character constants. Any valid single-byte character constant cannot violate 6.4.4p2. The only case left is multi-character character constants. So it must be possible for the value of a multi-character character constant to be one that is not representable in its type, otherwise the second part of 6.4.4p2 is pointless. I should put in a note here that there is a funny corner case for single-byte character constants. In an implementation where UCHAR_MAX > INT_MAX, a hexadecimal escape sequence (but not an octal escape sequence) may satisfy the constraint in 6.4.4.4p9 yet still be larger than INT_MAX. Under C90 rules, there is no problem: the value is converted to int, the result of which conversion is always a representable value of int. In C99 and later, converting a value that is outside the range of int may generate an implementation-defined signal, which presumably means undefined behavior if it occurs at compile time. If there were such a case it would be reasonable to interpret it as violating the second part of 6.4.4p2. At the same time, it seems very unlikely that the second part of 6.4.4p2 was left in solely to cover cases that are (a) very unusual, and (b) in some sense generate no value at all. What seems more likely is that such unusual cases simply were not thought of, and the second part of 6.4.4 p2 was left in for multi-character character constants, which are a lot more obvious and also a lot more common. <snip> >> It's the responsibility of the programmer to make sure that for >> each MT1CC appearing in his code , the resulting value is in the >> range of int ; if he fails to do so , it is a constraint violation >> with all that entails. >> >> 2. The documentation of the implementation must describe an >> algorithm which for every MT1CC returns an integer value in the >> range of int . >> >> I am undecided which one I prefer. >> >> I would also be interested to know how people think that an >> implementation with a 16-bit int should handle '\U000FFFFF' (Tim's >> example). 6.4.4.4 paragraph 10 says >> >> If an integer character constant contains a single character or >> escape sequence, its value is the one that results when an >> object with type char whose value is that of the single >> character or escape sequence is converted to type int. >> >> Does this mean that its value should fit in a char ? > > Interesting case. I'll have to think about it. The element \U000FFFFF is not an escape sequence but a universal character name. As such either it maps to a single-byte member of the execution character set, in which case its value fits in a char, or it maps to a multibyte character having more than one character, in which case it is not a single character and the quoted sentence does not apply.
Back to comp.lang.c | Previous | Next — Previous in thread | Next in thread | Find similar
Re: longer 'char literals' meaning in c antispam@math.uni.wroc.pl - 2020-05-05 00:47 +0000
Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-04 20:08 -0700
Re: longer 'char literals' meaning in c James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-05-04 23:52 -0400
Re: longer 'char literals' meaning in c antispam@math.uni.wroc.pl - 2020-05-05 17:10 +0000
Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-05 13:41 -0700
Re: longer 'char literals' meaning in c antispam@math.uni.wroc.pl - 2020-05-05 22:29 +0000
Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-05 16:18 -0700
Re: longer 'char literals' meaning in c antispam@math.uni.wroc.pl - 2020-05-06 02:33 +0000
Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-05 20:06 -0700
Re: longer 'char literals' meaning in c antispam@math.uni.wroc.pl - 2020-05-06 18:57 +0000
Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-06 13:37 -0700
Re: longer 'char literals' meaning in c Spiros Bousbouras <spibou@gmail.com> - 2020-05-07 13:48 +0000
Re: longer 'char literals' meaning in c Spiros Bousbouras <spibou@gmail.com> - 2020-05-07 13:58 +0000
Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-24 17:36 -0700
Re: longer 'char literals' meaning in c Spiros Bousbouras <spibou@gmail.com> - 2020-05-25 00:56 +0000
Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-29 22:53 -0700
Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-07 18:01 -0700
Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-30 10:53 -0700
Re: longer 'char literals' meaning in c Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-05-30 15:11 -0700
Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-24 18:19 -0700
Re: longer 'char literals' meaning in c James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-05-24 20:06 -0700
Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-06-28 04:51 -0700
Re: longer 'char literals' meaning in c James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-06-28 17:45 -0700
Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-05-26 07:02 -0700
Re: longer 'char literals' meaning in c James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-05-05 20:18 -0400
Re: longer 'char literals' meaning in c antispam@math.uni.wroc.pl - 2020-05-06 20:01 +0000
Re: longer 'char literals' meaning in c James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-05-06 18:20 -0400
Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-06-28 04:42 -0700
Re: longer 'char literals' meaning in c James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-06-28 17:13 -0700
Re: longer 'char literals' meaning in c Tim Rentsch <tr.17687@z991.linuxsc.com> - 2020-06-03 09:08 -0700
csiph-web