Groups > comp.std.c > #6165 > unrolled thread

Add @ to basic character set?

Started by	Philipp Klaus Krause <pkk@spth.de>
First post	2020-12-05 08:58 +0100
Last post	2021-07-10 08:46 -0700
Articles	20 on this page of 31 — 9 participants

Back to article view | Back to comp.std.c

  Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2020-12-05 08:58 +0100
    Re: Add @ to basic character set? James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-12-05 10:53 -0500
      Re: Add @ to basic character set? David Brown <david.brown@hesbynett.no> - 2020-12-05 17:15 +0100
        Re: Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2020-12-05 20:55 +0100
      Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-05 14:17 -0800
        Re: Add @ to basic character set? Francis Glassborow <francis.glassborow@btinternet.com> - 2020-12-06 12:25 +0000
          Re: Add @ to basic character set? David Brown <david.brown@hesbynett.no> - 2020-12-06 13:47 +0100
            Re: Add @ to basic character set? Richard Damon <Richard@Damon-Family.org> - 2020-12-06 08:42 -0500
              Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-06 14:07 -0800
                Re: Add @ to basic character set? Richard Damon <Richard@Damon-Family.org> - 2020-12-06 17:44 -0500
                  Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-06 15:49 -0800
                    Re: Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2020-12-07 09:31 +0100
                    Re: Add @ to basic character set? Richard Damon <Richard@Damon-Family.org> - 2020-12-07 07:24 -0500
                      Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-07 12:16 -0800
                        Re: Add @ to basic character set? Richard Damon <Richard@Damon-Family.org> - 2020-12-07 15:51 -0500
                          Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-07 13:10 -0800
                            Re: Add @ to basic character set? Andreas Schwab <schwab@linux-m68k.org> - 2020-12-07 23:52 +0100
                              Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-07 15:27 -0800
                                Re: Add @ to basic character set? Richard Damon <Richard@Damon-Family.org> - 2020-12-07 18:54 -0500
                                  Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-07 16:10 -0800
                              Re: Add @ to basic character set? Richard Damon <Richard@Damon-Family.org> - 2020-12-07 18:31 -0500
                        Re: Add @ to basic character set? Andreas Schwab <schwab@linux-m68k.org> - 2020-12-07 23:08 +0100
                Re: Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2020-12-07 09:30 +0100
        Re: Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2020-12-07 09:17 +0100
    Re: Add @ to basic character set? Thomas David Rivers <rivers@dignus.com> - 2020-12-06 16:11 -0500
      Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-07 12:19 -0800
        Re: Add @ to basic character set? Thomas David Rivers <rivers@dignus.com> - 2020-12-07 17:02 -0500
    Re: Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2021-03-11 22:50 +0100
      Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-03-11 15:40 -0800
        Re: Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2021-03-12 15:25 +0100
      Re: Add @ to basic character set? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2021-07-10 08:46 -0700

Page 1 of 2 [1] 2 Next page →

#6165 — Add @ to basic character set?

From	Philipp Klaus Krause <pkk@spth.de>
Date	2020-12-05 08:58 +0100
Subject	Add @ to basic character set?
Message-ID	<rqfeip$lrr$1@solani.org>

I wonder if it would make sense to add @ to the basic character set.
Virtually everyone is using it in comments and strings already anyway
(for email addresses), and I don't see anything preventing
implementations from supporting it, as it is available in both ASCII and
common EBCDIC code pages:

http://www.colecovision.eu/stuff/proposal-basic-@.html

[toc] | [next] | [standalone]

#6167

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2020-12-05 10:53 -0500
Message-ID	<rqgae4$1u1$1@dont-email.me>
In reply to	#6165

On 12/5/20 2:58 AM, Philipp Klaus Krause wrote:
> I wonder if it would make sense to add @ to the basic character set.
> Virtually everyone is using it in comments and strings already anyway
> (for email addresses), and I don't see anything preventing
> implementations from supporting it, as it is available in both ASCII and
> common EBCDIC code pages:
> 
> http://www.colecovision.eu/stuff/proposal-basic-@.html

'@' is not in the ISO/IEC 646 invariant subset; in the Danish, Dutch,
French, French Canadian, German, Italian, Spanish, Swedish, and Swiss
national variants, that code point is assigned to some other character.
With UTF-8 (on Unix-like systems) and UTF-16 (on Windows systems)
becoming so common place, that is less of a concern than it used to be,
but it is still something the committee is likely to pay attention to.
There are other characters that already are part of the C basic
character set that aren't in the invariant subset: "# [ ] { } \ | ~ ^".
However, all of those characters played an important role in C syntax
long before ISO/IEC 646; that's not the case for '@'. Trigraphs were
invented to allow those characters to be used on systems that didn't
support them natively.

[toc] | [prev] | [next] | [standalone]

#6168

From	David Brown <david.brown@hesbynett.no>
Date	2020-12-05 17:15 +0100
Message-ID	<rqgbmt$bkv$1@dont-email.me>
In reply to	#6167

On 05/12/2020 16:53, James Kuyper wrote:
> On 12/5/20 2:58 AM, Philipp Klaus Krause wrote:
>> I wonder if it would make sense to add @ to the basic character set.
>> Virtually everyone is using it in comments and strings already anyway
>> (for email addresses), and I don't see anything preventing
>> implementations from supporting it, as it is available in both ASCII and
>> common EBCDIC code pages:
>>
>> http://www.colecovision.eu/stuff/proposal-basic-@.html
> 
> '@' is not in the ISO/IEC 646 invariant subset; in the Danish, Dutch,
> French, French Canadian, German, Italian, Spanish, Swedish, and Swiss
> national variants, that code point is assigned to some other character.
> With UTF-8 (on Unix-like systems) and UTF-16 (on Windows systems)
> becoming so common place, that is less of a concern than it used to be,
> but it is still something the committee is likely to pay attention to.
> There are other characters that already are part of the C basic
> character set that aren't in the invariant subset: "# [ ] { } \ | ~ ^".
> However, all of those characters played an important role in C syntax
> long before ISO/IEC 646; that's not the case for '@'. Trigraphs were
> invented to allow those characters to be used on systems that didn't
> support them natively.
> 

@ is used in existing C implementations as an extension feature.  In
particular, a number of embedded C compilers allow a syntax like
"uint8_t reg @ 0x1234;" to mean "reg is a uint8_t object located at
absolute address 0x1234".  If people are consistent about using spaces,
this could easily be solved if @ is made a letter by simply making @
alone into a keyword.  But if those compilers accept "uint8_t
reg@0x1234", then that fails.

Another cause for concern is if the symbol is used in identifiers, then
these could cause trouble for assemblers and/or linkers on some systems.
 (This applies to the common extension of allowing $ as a "letter" in C
- the gcc manual notes that this is not supported on some targets due to
the meaning of $ in assembly on those targets.)

The standards committee are always reluctant to make changes that could
interfere with known implementations and existing code, even if the
conflict is with an implementation-specific extension to the language.

I also think it makes sense to reserve such symbols for future purposes
in C or C++ - good punctuation symbols are too useful to waste as
letters.  For example, the proposed "metaclasses" in C++ suggests using
$ as part of the syntax, which I think is a very good idea.  It is
perhaps more likely that @ would find similar use in future C++ features
than future C features, but no one would benefit from adding new
conflicts between those languages.

[toc] | [prev] | [next] | [standalone]

#6169

From	Philipp Klaus Krause <pkk@spth.de>
Date	2020-12-05 20:55 +0100
Message-ID	<rqgojc$snq$1@solani.org>
In reply to	#6168

Am 05.12.20 um 17:15 schrieb David Brown:
> 
> @ is used in existing C implementations as an extension feature.  In
> particular, a number of embedded C compilers allow a syntax like
> "uint8_t reg @ 0x1234;" to mean "reg is a uint8_t object located at
> absolute address 0x1234".  If people are consistent about using spaces,
> this could easily be solved if @ is made a letter by simply making @
> alone into a keyword.  But if those compilers accept "uint8_t
> reg@0x1234", then that fails.
> 
> Another cause for concern is if the symbol is used in identifiers, then
> these could cause trouble for assemblers and/or linkers on some systems.
>  (This applies to the common extension of allowing $ as a "letter" in C
> - the gcc manual notes that this is not supported on some targets due to
> the meaning of $ in assembly on those targets.)
> 
> The standards committee are always reluctant to make changes that could
> interfere with known implementations and existing code, even if the
> conflict is with an implementation-specific extension to the language.

None of these would be a problem for adding it to the basic source
character set.
The allowed characters in identifiers are different from the basic
source character set. E.g. ] is in the basic source character set, but
not allowed in identifiers.
By adding it to the basic source character set, we can portably use it
in comments and string and character literals.

Philipp

[toc] | [prev] | [next] | [standalone]

#6170

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2020-12-05 14:17 -0800
Message-ID	<87pn3ngao5.fsf@nosuchdomain.example.com>
In reply to	#6167

James Kuyper <jameskuyper@alumni.caltech.edu> writes:
> On 12/5/20 2:58 AM, Philipp Klaus Krause wrote:
>> I wonder if it would make sense to add @ to the basic character set.
>> Virtually everyone is using it in comments and strings already anyway
>> (for email addresses), and I don't see anything preventing
>> implementations from supporting it, as it is available in both ASCII and
>> common EBCDIC code pages:
>> 
>> http://www.colecovision.eu/stuff/proposal-basic-@.html
>
> '@' is not in the ISO/IEC 646 invariant subset; in the Danish, Dutch,
> French, French Canadian, German, Italian, Spanish, Swedish, and Swiss
> national variants, that code point is assigned to some other character.
> With UTF-8 (on Unix-like systems) and UTF-16 (on Windows systems)
> becoming so common place, that is less of a concern than it used to be,
> but it is still something the committee is likely to pay attention to.
> There are other characters that already are part of the C basic
> character set that aren't in the invariant subset: "# [ ] { } \ | ~ ^".
> However, all of those characters played an important role in C syntax
> long before ISO/IEC 646; that's not the case for '@'. Trigraphs were
> invented to allow those characters to be used on systems that didn't
> support them natively.

Apparently the C++ committee felt that it was of so little concern that
they removed trigraphs in C++17.  I don't know of any plans to do the
same in C.

There are three printable ASCII characters that aren't in C's basic
character set: '$', '`', and '@'.  A guarantee that all three can be
used in string literals, character constants, and comments could be
useful.  (Most programmers probably already assume they can be.)

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#6171

From	Francis Glassborow <francis.glassborow@btinternet.com>
Date	2020-12-06 12:25 +0000
Message-ID	<rqiik6$58p$1@dont-email.me>
In reply to	#6170

On 05/12/2020 22:17, Keith Thompson wrote:
> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
>> On 12/5/20 2:58 AM, Philipp Klaus Krause wrote:
>>> I wonder if it would make sense to add @ to the basic character set.
>>> Virtually everyone is using it in comments and strings already anyway
>>> (for email addresses), and I don't see anything preventing
>>> implementations from supporting it, as it is available in both ASCII and
>>> common EBCDIC code pages:
>>>
>>> http://www.colecovision.eu/stuff/proposal-basic-@.html
>>
>> '@' is not in the ISO/IEC 646 invariant subset; in the Danish, Dutch,
>> French, French Canadian, German, Italian, Spanish, Swedish, and Swiss
>> national variants, that code point is assigned to some other character.
>> With UTF-8 (on Unix-like systems) and UTF-16 (on Windows systems)
>> becoming so common place, that is less of a concern than it used to be,
>> but it is still something the committee is likely to pay attention to.
>> There are other characters that already are part of the C basic
>> character set that aren't in the invariant subset: "# [ ] { } \ | ~ ^".
>> However, all of those characters played an important role in C syntax
>> long before ISO/IEC 646; that's not the case for '@'. Trigraphs were
>> invented to allow those characters to be used on systems that didn't
>> support them natively.
> 
> Apparently the C++ committee felt that it was of so little concern that
> they removed trigraphs in C++17.  I don't know of any plans to do the
> same in C.
> 
> There are three printable ASCII characters that aren't in C's basic
> character set: '$', '`', and '@'.  A guarantee that all three can be
> used in string literals, character constants, and comments could be
> useful.  (Most programmers probably already assume they can be.)
> 

1) Trigraphs were proving to be a road-block for C++. In addition they 
are so rarely used (certainly in C++) that many (probably most) 
programmers fail to recognise them. WG14 appears reluctant to remove 
things even when they have no practical use in modern code. The argument 
that they are needed for legacy systems is, I think, very weak; 
compilers will continue to support them where necessary by providing 
legacy code switches.

2) As one design feature of C is portability it is time that the three 
characters you mention that are added to the  basic character set. I do 
not see how that would have a negative effect on implementations that 
already use them for extensions. Those uses do not (or should not) rely 
on them not being part of the basic character set.

3) Instead of speculating that their inclusion would cause problems to 
some programmers we need evidence that that is the case. Considering 
that it would be hard to use a modern computer system without having 
both @ and $ available (think mobile and portable computer technology) I 
would be surprised if it would be a serious problem for anyone.

Just my 2c/p/d

Francis

[toc] | [prev] | [next] | [standalone]

#6172

From	David Brown <david.brown@hesbynett.no>
Date	2020-12-06 13:47 +0100
Message-ID	<rqijtd$dbr$1@dont-email.me>
In reply to	#6171

On 06/12/2020 13:25, Francis Glassborow wrote:
> On 05/12/2020 22:17, Keith Thompson wrote:
>> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
>>> On 12/5/20 2:58 AM, Philipp Klaus Krause wrote:
>>>> I wonder if it would make sense to add @ to the basic character set.
>>>> Virtually everyone is using it in comments and strings already anyway
>>>> (for email addresses), and I don't see anything preventing
>>>> implementations from supporting it, as it is available in both ASCII
>>>> and
>>>> common EBCDIC code pages:
>>>>
>>>> http://www.colecovision.eu/stuff/proposal-basic-@.html
>>>
>>> '@' is not in the ISO/IEC 646 invariant subset; in the Danish, Dutch,
>>> French, French Canadian, German, Italian, Spanish, Swedish, and Swiss
>>> national variants, that code point is assigned to some other character.
>>> With UTF-8 (on Unix-like systems) and UTF-16 (on Windows systems)
>>> becoming so common place, that is less of a concern than it used to be,
>>> but it is still something the committee is likely to pay attention to.
>>> There are other characters that already are part of the C basic
>>> character set that aren't in the invariant subset: "# [ ] { } \ | ~ ^".
>>> However, all of those characters played an important role in C syntax
>>> long before ISO/IEC 646; that's not the case for '@'. Trigraphs were
>>> invented to allow those characters to be used on systems that didn't
>>> support them natively.
>>
>> Apparently the C++ committee felt that it was of so little concern that
>> they removed trigraphs in C++17.  I don't know of any plans to do the
>> same in C.
>>
>> There are three printable ASCII characters that aren't in C's basic
>> character set: '$', '`', and '@'.  A guarantee that all three can be
>> used in string literals, character constants, and comments could be
>> useful.  (Most programmers probably already assume they can be.)
>>

Agreed.

> 
> 1) Trigraphs were proving to be a road-block for C++. In addition they
> are so rarely used (certainly in C++) that many (probably most)
> programmers fail to recognise them. WG14 appears reluctant to remove
> things even when they have no practical use in modern code. The argument
> that they are needed for legacy systems is, I think, very weak;
> compilers will continue to support them where necessary by providing
> legacy code switches.
> 

There is also the difference that C is used on a much wider range of
systems than C++, especially older systems.  C++ is able to drop support
for odder systems (such as those with more limited character sets, or
stranger integer representation) simply because it has not been used on
such systems.

> 2) As one design feature of C is portability it is time that the three
> characters you mention that are added to the  basic character set. I do
> not see how that would have a negative effect on implementations that
> already use them for extensions. Those uses do not (or should not) rely
> on them not being part of the basic character set.
> 

As long as they are only available (by standard) for using in strings
and comments, not identifiers, there should be no conflict unless they
can't be represented in the source (for comments) or execution (for
string literals) character set of the system.  But if these characters
are supported by the relevant character sets, then in any real-world
compiler (such as any that support ASCII), they will already be
supported as extended characters.

In other words, there is not actually anything significant useful to be
gained by putting these characters in the basic character set.  Equally,
there is no real risk in doing so.  It is purely a hypothetical issue,
AFAICS.  And the C standards committee are not known for spending extra
effort on something that makes no difference in reality.

> 3) Instead of speculating that their inclusion would cause problems to
> some programmers we need evidence that that is the case. Considering
> that it would be hard to use a modern computer system without having
> both @ and $ available (think mobile and portable computer technology) I
> would be surprised if it would be a serious problem for anyone.
> 
> Just my 2c/p/d
> 
> Francis

[toc] | [prev] | [next] | [standalone]

#6173

From	Richard Damon <Richard@Damon-Family.org>
Date	2020-12-06 08:42 -0500
Message-ID	<ud5zH.388173$gR8.7685@fx45.iad>
In reply to	#6172

On 12/6/20 7:47 AM, David Brown wrote:
> On 06/12/2020 13:25, Francis Glassborow wrote:
>> On 05/12/2020 22:17, Keith Thompson wrote:
>>> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
>>>> On 12/5/20 2:58 AM, Philipp Klaus Krause wrote:
>>>>> I wonder if it would make sense to add @ to the basic character set.
>>>>> Virtually everyone is using it in comments and strings already anyway
>>>>> (for email addresses), and I don't see anything preventing
>>>>> implementations from supporting it, as it is available in both ASCII
>>>>> and
>>>>> common EBCDIC code pages:
>>>>>
>>>>> http://www.colecovision.eu/stuff/proposal-basic-@.html
>>>>
>>>> '@' is not in the ISO/IEC 646 invariant subset; in the Danish, Dutch,
>>>> French, French Canadian, German, Italian, Spanish, Swedish, and Swiss
>>>> national variants, that code point is assigned to some other character.
>>>> With UTF-8 (on Unix-like systems) and UTF-16 (on Windows systems)
>>>> becoming so common place, that is less of a concern than it used to be,
>>>> but it is still something the committee is likely to pay attention to.
>>>> There are other characters that already are part of the C basic
>>>> character set that aren't in the invariant subset: "# [ ] { } \ | ~ ^".
>>>> However, all of those characters played an important role in C syntax
>>>> long before ISO/IEC 646; that's not the case for '@'. Trigraphs were
>>>> invented to allow those characters to be used on systems that didn't
>>>> support them natively.
>>>
>>> Apparently the C++ committee felt that it was of so little concern that
>>> they removed trigraphs in C++17.  I don't know of any plans to do the
>>> same in C.
>>>
>>> There are three printable ASCII characters that aren't in C's basic
>>> character set: '$', '`', and '@'.  A guarantee that all three can be
>>> used in string literals, character constants, and comments could be
>>> useful.  (Most programmers probably already assume they can be.)
>>>
> 
> Agreed.
> 
>>
>> 1) Trigraphs were proving to be a road-block for C++. In addition they
>> are so rarely used (certainly in C++) that many (probably most)
>> programmers fail to recognise them. WG14 appears reluctant to remove
>> things even when they have no practical use in modern code. The argument
>> that they are needed for legacy systems is, I think, very weak;
>> compilers will continue to support them where necessary by providing
>> legacy code switches.
>>
> 
> There is also the difference that C is used on a much wider range of
> systems than C++, especially older systems.  C++ is able to drop support
> for odder systems (such as those with more limited character sets, or
> stranger integer representation) simply because it has not been used on
> such systems.
> 
>> 2) As one design feature of C is portability it is time that the three
>> characters you mention that are added to the  basic character set. I do
>> not see how that would have a negative effect on implementations that
>> already use them for extensions. Those uses do not (or should not) rely
>> on them not being part of the basic character set.
>>
> 
> As long as they are only available (by standard) for using in strings
> and comments, not identifiers, there should be no conflict unless they
> can't be represented in the source (for comments) or execution (for
> string literals) character set of the system.  But if these characters
> are supported by the relevant character sets, then in any real-world
> compiler (such as any that support ASCII), they will already be
> supported as extended characters.
> 
> In other words, there is not actually anything significant useful to be
> gained by putting these characters in the basic character set.  Equally,
> there is no real risk in doing so.  It is purely a hypothetical issue,
> AFAICS.  And the C standards committee are not known for spending extra
> effort on something that makes no difference in reality.
> 

The issue with making them part of the basic character set is that it
makes any system that can't do this, because it uses a strange character
set, non-conforming. Since systems ARE allowed to add any characters
they want to the source or execution character set, those that currently
support them can do so. Forcing them to be included drops some system
from being able to have a conforming implementation, and the committee
has traditionally avoided gratuitously making systems non-conforming.

The only case that can be made to make them part, is that then programs
that use those characters might be able to become strictly conforming
programs instead of just being conforming programs, but strict
conformance isn't really that big of a deal in practicality, as
virtually all real programs are going to fail strict performance because
they are going to depend on some aspect of the environment (Like how I/O
actually works)

[toc] | [prev] | [next] | [standalone]

#6174

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2020-12-06 14:07 -0800
Message-ID	<87lfeafv1r.fsf@nosuchdomain.example.com>
In reply to	#6173

Richard Damon <Richard@Damon-Family.org> writes:
[...]
> The issue with making them part of the basic character set is that it
> makes any system that can't do this, because it uses a strange character
> set, non-conforming. Since systems ARE allowed to add any characters
> they want to the source or execution character set, those that currently
> support them can do so. Forcing them to be included drops some system
> from being able to have a conforming implementation, and the committee
> has traditionally avoided gratuitously making systems non-conforming.

(Context: The ASCII characters '@', '$', and '`'.)

I'd be interested in seeing an implementation for which this would
be relevant.  Such an implementation (a) would be unable to (easily)
represent those three character in source code and/or during
execution *and* (b) would otherwise conform to the hypothetical
edition of the C standard that would add them to the basic character
set if it were not for this change.

Implementations that can't support those characters are likely to be
for tiny exotic target systems, and very likely won't be conforming
anyway, and so could simply ignore the addition of those characters
to the basic character set.

> The only case that can be made to make them part, is that then programs
> that use those characters might be able to become strictly conforming
> programs instead of just being conforming programs, but strict
> conformance isn't really that big of a deal in practicality, as
> virtually all real programs are going to fail strict performance because
> they are going to depend on some aspect of the environment (Like how I/O
> actually works)

I suppose I agree that it's not that big a deal.  Code that uses
those characters is *practically* 100% portable already, and I haven't
found a way to coax either gcc or clang to warn about puts("$@`").
The benefit would be minor, and the cost would be very close to zero
(unless an implementation as I've described above actually exists).
It would be one less thing to think about when writing code that's
intended to be as portable as possible.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#6175

From	Richard Damon <Richard@Damon-Family.org>
Date	2020-12-06 17:44 -0500
Message-ID	<D9dzH.185712$ql4.125078@fx39.iad>
In reply to	#6174

On 12/6/20 5:07 PM, Keith Thompson wrote:
> Richard Damon <Richard@Damon-Family.org> writes:
> [...]
>> The issue with making them part of the basic character set is that it
>> makes any system that can't do this, because it uses a strange character
>> set, non-conforming. Since systems ARE allowed to add any characters
>> they want to the source or execution character set, those that currently
>> support them can do so. Forcing them to be included drops some system
>> from being able to have a conforming implementation, and the committee
>> has traditionally avoided gratuitously making systems non-conforming.
> 
> (Context: The ASCII characters '@', '$', and '`'.)
> 
> I'd be interested in seeing an implementation for which this would
> be relevant.  Such an implementation (a) would be unable to (easily)
> represent those three character in source code and/or during
> execution *and* (b) would otherwise conform to the hypothetical
> edition of the C standard that would add them to the basic character
> set if it were not for this change.

As was mentioned, all that you need is to want to support ISO/IEC 646
for a naional character set that doesn't define code point 64 as @

This includes Canadian, French, German, Irish, and a number of others.

See https://en.wikipedia.org/wiki/ISO/IEC_646 for a chart of these.

> 
> Implementations that can't support those characters are likely to be
> for tiny exotic target systems, and very likely won't be conforming
> anyway, and so could simply ignore the addition of those characters
> to the basic character set.
> 
>> The only case that can be made to make them part, is that then programs
>> that use those characters might be able to become strictly conforming
>> programs instead of just being conforming programs, but strict
>> conformance isn't really that big of a deal in practicality, as
>> virtually all real programs are going to fail strict performance because
>> they are going to depend on some aspect of the environment (Like how I/O
>> actually works)
> 
> I suppose I agree that it's not that big a deal.  Code that uses
> those characters is *practically* 100% portable already, and I haven't
> found a way to coax either gcc or clang to warn about puts("$@`").
> The benefit would be minor, and the cost would be very close to zero
> (unless an implementation as I've described above actually exists).
> It would be one less thing to think about when writing code that's
> intended to be as portable as possible.
>

[toc] | [prev] | [next] | [standalone]

#6176

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2020-12-06 15:49 -0800
Message-ID	<87czzmfqb3.fsf@nosuchdomain.example.com>
In reply to	#6175

Richard Damon <Richard@Damon-Family.org> writes:
> On 12/6/20 5:07 PM, Keith Thompson wrote:
>> Richard Damon <Richard@Damon-Family.org> writes:
>> [...]
>>> The issue with making them part of the basic character set is that it
>>> makes any system that can't do this, because it uses a strange character
>>> set, non-conforming. Since systems ARE allowed to add any characters
>>> they want to the source or execution character set, those that currently
>>> support them can do so. Forcing them to be included drops some system
>>> from being able to have a conforming implementation, and the committee
>>> has traditionally avoided gratuitously making systems non-conforming.
>> 
>> (Context: The ASCII characters '@', '$', and '`'.)
>> 
>> I'd be interested in seeing an implementation for which this would
>> be relevant.  Such an implementation (a) would be unable to (easily)
>> represent those three character in source code and/or during
>> execution *and* (b) would otherwise conform to the hypothetical
>> edition of the C standard that would add them to the basic character
>> set if it were not for this change.
>
> As was mentioned, all that you need is to want to support ISO/IEC 646
> for a naional character set that doesn't define code point 64 as @
>
> This includes Canadian, French, German, Irish, and a number of others.
>
> See https://en.wikipedia.org/wiki/ISO/IEC_646 for a chart of these.

What C implementations support those character sets (and are likely to
attempt to conform to a future C standard that adds '@' to the basic
character set)?

The following characters are also not part of the invariant character
set: # [ \ ] ^ { | } ~  (We have trigraphs for those.  I *do not*
suggest adding trigraphs for @ $ `.)

C++ has already dropped trigraphs because support for the old 7-bit
national character sets was considered unimportant.  (But C++17
did not add @ $ ` to its basic character set.)  I understand that
C has different issues than C++, but in my opinion adding @ $
` to C's basic character set would cause no actual harm.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#6179

From	Philipp Klaus Krause <pkk@spth.de>
Date	2020-12-07 09:31 +0100
Message-ID	<rqkp9a$qhb$2@solani.org>
In reply to	#6176

I think that as this point, WG14 mostly thinks that adding trigraphs was
a mistake. But a mistake tat can't be undone.

[toc] | [prev] | [next] | [standalone]

#6180

From	Richard Damon <Richard@Damon-Family.org>
Date	2020-12-07 07:24 -0500
Message-ID	<4apzH.236649$xe4.230701@fx41.iad>
In reply to	#6176

On 12/6/20 6:49 PM, Keith Thompson wrote:
> Richard Damon <Richard@Damon-Family.org> writes:
>> On 12/6/20 5:07 PM, Keith Thompson wrote:
>>> Richard Damon <Richard@Damon-Family.org> writes:
>>> [...]
>>>> The issue with making them part of the basic character set is that it
>>>> makes any system that can't do this, because it uses a strange character
>>>> set, non-conforming. Since systems ARE allowed to add any characters
>>>> they want to the source or execution character set, those that currently
>>>> support them can do so. Forcing them to be included drops some system
>>>> from being able to have a conforming implementation, and the committee
>>>> has traditionally avoided gratuitously making systems non-conforming.
>>>
>>> (Context: The ASCII characters '@', '$', and '`'.)
>>>
>>> I'd be interested in seeing an implementation for which this would
>>> be relevant.  Such an implementation (a) would be unable to (easily)
>>> represent those three character in source code and/or during
>>> execution *and* (b) would otherwise conform to the hypothetical
>>> edition of the C standard that would add them to the basic character
>>> set if it were not for this change.
>>
>> As was mentioned, all that you need is to want to support ISO/IEC 646
>> for a naional character set that doesn't define code point 64 as @
>>
>> This includes Canadian, French, German, Irish, and a number of others.
>>
>> See https://en.wikipedia.org/wiki/ISO/IEC_646 for a chart of these.
> 
> What C implementations support those character sets (and are likely to
> attempt to conform to a future C standard that adds '@' to the basic
> character set)?

gcc (and many others) with the right choice of file encoding options.
The key point here is that this change would be telling a number of
national bodies that their whole national character set (and thus in
some respects their language) will no longer be supported.

> 
> The following characters are also not part of the invariant character
> set: # [ \ ] ^ { | } ~  (We have trigraphs for those.  I *do not*
> suggest adding trigraphs for @ $ `.)

The issue is that trigraphs were created to solve the issue of the
character set, and was adoptted precisely so that those national bodies
would allow the C language to become a standard. They were done a bit
hastily, and it shows, but did provide a solution that satisfied the
national bodies requesting a solution.

> 
> C++ has already dropped trigraphs because support for the old 7-bit
> national character sets was considered unimportant.  (But C++17
> did not add @ $ ` to its basic character set.)  I understand that
> C has different issues than C++, but in my opinion adding @ $
> ` to C's basic character set would cause no actual harm.
> 

C++ has historically been much less concerned about backwards
compatibility issues.

I will add that while you think it causes little harm (besides making
programs stored in these national character sets perhaps no longer
conforming), it also adds little benefit. As has been pointed out,
basically all existing implementations include them in the extended
character set (when the encoding has those characters), so there is no
change to the programmer as far as allowing them in comments or strings,
which would be their only universal usage.

[toc] | [prev] | [next] | [standalone]

#6182

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2020-12-07 12:16 -0800
Message-ID	<874kkxfk35.fsf@nosuchdomain.example.com>
In reply to	#6180

Richard Damon <Richard@Damon-Family.org> writes:
> On 12/6/20 6:49 PM, Keith Thompson wrote:
>> Richard Damon <Richard@Damon-Family.org> writes:
>>> On 12/6/20 5:07 PM, Keith Thompson wrote:
>>>> Richard Damon <Richard@Damon-Family.org> writes:
>>>> [...]
>>>>> The issue with making them part of the basic character set is that it
>>>>> makes any system that can't do this, because it uses a strange character
>>>>> set, non-conforming. Since systems ARE allowed to add any characters
>>>>> they want to the source or execution character set, those that currently
>>>>> support them can do so. Forcing them to be included drops some system
>>>>> from being able to have a conforming implementation, and the committee
>>>>> has traditionally avoided gratuitously making systems non-conforming.
>>>>
>>>> (Context: The ASCII characters '@', '$', and '`'.)
>>>>
>>>> I'd be interested in seeing an implementation for which this would
>>>> be relevant.  Such an implementation (a) would be unable to (easily)
>>>> represent those three character in source code and/or during
>>>> execution *and* (b) would otherwise conform to the hypothetical
>>>> edition of the C standard that would add them to the basic character
>>>> set if it were not for this change.
>>>
>>> As was mentioned, all that you need is to want to support ISO/IEC 646
>>> for a naional character set that doesn't define code point 64 as @
>>>
>>> This includes Canadian, French, German, Irish, and a number of others.
>>>
>>> See https://en.wikipedia.org/wiki/ISO/IEC_646 for a chart of these.
>> 
>> What C implementations support those character sets (and are likely to
>> attempt to conform to a future C standard that adds '@' to the basic
>> character set)?
>
> gcc (and many others) with the right choice of file encoding options.
> The key point here is that this change would be telling a number of
> national bodies that their whole national character set (and thus in
> some respects their language) will no longer be supported.

OK.  Can you explain precisely how to invoke gcc with the right choice
of file encoding options?  I've found this option in the gcc manual:

'-finput-charset=CHARSET'
     Set the input character set, used for translation from the
     character set of the input file to the source character set used by
     GCC.  If the locale does not specify, or GCC cannot get this
     information from the locale, the default is UTF-8.  This can be
     overridden by either the locale or this command-line option.
     Currently the command-line option takes precedence if there's a
     conflict.  CHARSET can be any encoding supported by the system's
     'iconv' library routine.

but I had never used it.

I just used "iconv -l" to get what I presume is a list of valid CHARSET
values (there are over 1000 of them), which led me to this:

    gcc -std=c11 -pedantic-errors -finput-charset=ISO646-FR -c c.c

With this source file:

    #include <stdio.h>
    int main(void) {
        puts("$@`");
    }

it produced a cascade of errors, starting with:

    In file included from <command-line>:31:
    /usr/include/stdc-predef.h:18:1: error: stray ‘\302’ in program
       18 | #ifndef _STDC_PREDEF_H
             | ^

It looks like something translated the # character to \302 (0xc2).
I have no idea why.  (And it didn't complain about "$@`".)

If there's a way to invoke gcc telling it to use a character set that
doesn't include those characters, that would be a good refutation
to my point.  If doing so is actually useful in some contexts,
it would be an even better refutation.  So far I'm not convinced,
but I'm prepared to be.

My impression is that the old 7-bit national character sets are
no longer relevant, and that dropping support for them in the
C standard (more precisely, updating the C standard in a manner
that's inconsistent with those character sets) would be very nearly
harmless.  I'm looking for evidence that that's not the case.

[...]

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#6184

From	Richard Damon <Richard@Damon-Family.org>
Date	2020-12-07 15:51 -0500
Message-ID	<rBwzH.225$y74.202@fx36.iad>
In reply to	#6182

On 12/7/20 3:16 PM, Keith Thompson wrote:
> Richard Damon <Richard@Damon-Family.org> writes:
>> On 12/6/20 6:49 PM, Keith Thompson wrote:
>>> Richard Damon <Richard@Damon-Family.org> writes:
>>>> On 12/6/20 5:07 PM, Keith Thompson wrote:
>>>>> Richard Damon <Richard@Damon-Family.org> writes:
>>>>> [...]
>>>>>> The issue with making them part of the basic character set is that it
>>>>>> makes any system that can't do this, because it uses a strange character
>>>>>> set, non-conforming. Since systems ARE allowed to add any characters
>>>>>> they want to the source or execution character set, those that currently
>>>>>> support them can do so. Forcing them to be included drops some system
>>>>>> from being able to have a conforming implementation, and the committee
>>>>>> has traditionally avoided gratuitously making systems non-conforming.
>>>>>
>>>>> (Context: The ASCII characters '@', '$', and '`'.)
>>>>>
>>>>> I'd be interested in seeing an implementation for which this would
>>>>> be relevant.  Such an implementation (a) would be unable to (easily)
>>>>> represent those three character in source code and/or during
>>>>> execution *and* (b) would otherwise conform to the hypothetical
>>>>> edition of the C standard that would add them to the basic character
>>>>> set if it were not for this change.
>>>>
>>>> As was mentioned, all that you need is to want to support ISO/IEC 646
>>>> for a naional character set that doesn't define code point 64 as @
>>>>
>>>> This includes Canadian, French, German, Irish, and a number of others.
>>>>
>>>> See https://en.wikipedia.org/wiki/ISO/IEC_646 for a chart of these.
>>>
>>> What C implementations support those character sets (and are likely to
>>> attempt to conform to a future C standard that adds '@' to the basic
>>> character set)?
>>
>> gcc (and many others) with the right choice of file encoding options.
>> The key point here is that this change would be telling a number of
>> national bodies that their whole national character set (and thus in
>> some respects their language) will no longer be supported.
> 
> OK.  Can you explain precisely how to invoke gcc with the right choice
> of file encoding options?  I've found this option in the gcc manual:
> 
> '-finput-charset=CHARSET'
>      Set the input character set, used for translation from the
>      character set of the input file to the source character set used by
>      GCC.  If the locale does not specify, or GCC cannot get this
>      information from the locale, the default is UTF-8.  This can be
>      overridden by either the locale or this command-line option.
>      Currently the command-line option takes precedence if there's a
>      conflict.  CHARSET can be any encoding supported by the system's
>      'iconv' library routine.
> 
> but I had never used it.
> 
> I just used "iconv -l" to get what I presume is a list of valid CHARSET
> values (there are over 1000 of them), which led me to this:
> 
>     gcc -std=c11 -pedantic-errors -finput-charset=ISO646-FR -c c.c
> 
> With this source file:
> 
>     #include <stdio.h>
>     int main(void) {
>         puts("$@`");
>     }
> 
> it produced a cascade of errors, starting with:
> 
>     In file included from <command-line>:31:
>     /usr/include/stdc-predef.h:18:1: error: stray ‘\302’ in program
>        18 | #ifndef _STDC_PREDEF_H
>              | ^
> 
> It looks like something translated the # character to \302 (0xc2).
> I have no idea why.  (And it didn't complain about "$@`".)
> 
> If there's a way to invoke gcc telling it to use a character set that
> doesn't include those characters, that would be a good refutation
> to my point.  If doing so is actually useful in some contexts,
> it would be an even better refutation.  So far I'm not convinced,
> but I'm prepared to be.
> 
> My impression is that the old 7-bit national character sets are
> no longer relevant, and that dropping support for them in the
> C standard (more precisely, updating the C standard in a manner
> that's inconsistent with those character sets) would be very nearly
> harmless.  I'm looking for evidence that that's not the case.
> 
> [...]
> 

One problem is that file is NOT compatible with ISO646-FR as the '#'
character in it would not be a HashTag (or Pound Sign), but would be the
character £ which is illegal in C. It is one of the encodings that NEEDS
the trigraphs or digraphs in the files to use C.

[toc] | [prev] | [next] | [standalone]

#6185

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2020-12-07 13:10 -0800
Message-ID	<87r1o1e2zg.fsf@nosuchdomain.example.com>
In reply to	#6184

Richard Damon <Richard@Damon-Family.org> writes:
> On 12/7/20 3:16 PM, Keith Thompson wrote:
>> Richard Damon <Richard@Damon-Family.org> writes:
>>> On 12/6/20 6:49 PM, Keith Thompson wrote:
>>>> Richard Damon <Richard@Damon-Family.org> writes:
>>>>> On 12/6/20 5:07 PM, Keith Thompson wrote:
>>>>>> Richard Damon <Richard@Damon-Family.org> writes:
>>>>>> [...]
>>>>>>> The issue with making them part of the basic character set is that it
>>>>>>> makes any system that can't do this, because it uses a strange character
>>>>>>> set, non-conforming. Since systems ARE allowed to add any characters
>>>>>>> they want to the source or execution character set, those that currently
>>>>>>> support them can do so. Forcing them to be included drops some system
>>>>>>> from being able to have a conforming implementation, and the committee
>>>>>>> has traditionally avoided gratuitously making systems non-conforming.
>>>>>>
>>>>>> (Context: The ASCII characters '@', '$', and '`'.)
>>>>>>
>>>>>> I'd be interested in seeing an implementation for which this would
>>>>>> be relevant.  Such an implementation (a) would be unable to (easily)
>>>>>> represent those three character in source code and/or during
>>>>>> execution *and* (b) would otherwise conform to the hypothetical
>>>>>> edition of the C standard that would add them to the basic character
>>>>>> set if it were not for this change.
>>>>>
>>>>> As was mentioned, all that you need is to want to support ISO/IEC 646
>>>>> for a naional character set that doesn't define code point 64 as @
>>>>>
>>>>> This includes Canadian, French, German, Irish, and a number of others.
>>>>>
>>>>> See https://en.wikipedia.org/wiki/ISO/IEC_646 for a chart of these.
>>>>
>>>> What C implementations support those character sets (and are likely to
>>>> attempt to conform to a future C standard that adds '@' to the basic
>>>> character set)?
>>>
>>> gcc (and many others) with the right choice of file encoding options.
>>> The key point here is that this change would be telling a number of
>>> national bodies that their whole national character set (and thus in
>>> some respects their language) will no longer be supported.
>> 
>> OK.  Can you explain precisely how to invoke gcc with the right choice
>> of file encoding options?  I've found this option in the gcc manual:
>> 
>> '-finput-charset=CHARSET'
>>      Set the input character set, used for translation from the
>>      character set of the input file to the source character set used by
>>      GCC.  If the locale does not specify, or GCC cannot get this
>>      information from the locale, the default is UTF-8.  This can be
>>      overridden by either the locale or this command-line option.
>>      Currently the command-line option takes precedence if there's a
>>      conflict.  CHARSET can be any encoding supported by the system's
>>      'iconv' library routine.
>> 
>> but I had never used it.
>> 
>> I just used "iconv -l" to get what I presume is a list of valid CHARSET
>> values (there are over 1000 of them), which led me to this:
>> 
>>     gcc -std=c11 -pedantic-errors -finput-charset=ISO646-FR -c c.c
>> 
>> With this source file:
>> 
>>     #include <stdio.h>
>>     int main(void) {
>>         puts("$@`");
>>     }
>> 
>> it produced a cascade of errors, starting with:
>> 
>>     In file included from <command-line>:31:
>>     /usr/include/stdc-predef.h:18:1: error: stray ‘\302’ in program
>>        18 | #ifndef _STDC_PREDEF_H
>>              | ^
>> 
>> It looks like something translated the # character to \302 (0xc2).
>> I have no idea why.  (And it didn't complain about "$@`".)
>> 
>> If there's a way to invoke gcc telling it to use a character set that
>> doesn't include those characters, that would be a good refutation
>> to my point.  If doing so is actually useful in some contexts,
>> it would be an even better refutation.  So far I'm not convinced,
>> but I'm prepared to be.
>> 
>> My impression is that the old 7-bit national character sets are
>> no longer relevant, and that dropping support for them in the
>> C standard (more precisely, updating the C standard in a manner
>> that's inconsistent with those character sets) would be very nearly
>> harmless.  I'm looking for evidence that that's not the case.
>> 
>> [...]
>> 
>
> One problem is that file is NOT compatible with ISO646-FR as the '#'
> character in it would not be a HashTag (or Pound Sign), but would be the
> character £ which is illegal in C. It is one of the encodings that NEEDS
> the trigraphs or digraphs in the files to use C.

The first file it complains about, /usr/include/stdc-predef.h,
is part of the implementation (specifically part of glibc).
Either the implementation doesn't support ISO646-FR, or there's
some configuration I would need to perform to make it support it.

I'd still be interested in seeing an existing implementation that
does support ISO646-FR or something similar, and that would become
non-conforming if '@' were made part of the basic character set.

I recognize that the burden of proof is on any proposal to make a
change to the standard, but so far I've seen no evidence that such a
change would actually break anything (at least anything that isn't
already broken).

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#6187

From	Andreas Schwab <schwab@linux-m68k.org>
Date	2020-12-07 23:52 +0100
Message-ID	<87k0ttjkkq.fsf@igel.home>
In reply to	#6185

On Dez 07 2020, Keith Thompson wrote:

> The first file it complains about, /usr/include/stdc-predef.h,
> is part of the implementation (specifically part of glibc).
> Either the implementation doesn't support ISO646-FR, or there's
> some configuration I would need to perform to make it support it.

The system files are encoded in UTF-8, so if you want to use them in a
ISO646-FR context, you have to convert them first.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

[toc] | [prev] | [next] | [standalone]

#6188

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2020-12-07 15:27 -0800
Message-ID	<87mtypdwok.fsf@nosuchdomain.example.com>
In reply to	#6187

Andreas Schwab <schwab@linux-m68k.org> writes:
> On Dez 07 2020, Keith Thompson wrote:
>> The first file it complains about, /usr/include/stdc-predef.h,
>> is part of the implementation (specifically part of glibc).
>> Either the implementation doesn't support ISO646-FR, or there's
>> some configuration I would need to perform to make it support it.
>
> The system files are encoded in UTF-8, so if you want to use them in a
> ISO646-FR context, you have to convert them first.

I suppose that would work (and would break the implementation for my
normal use).

That's not a reasonable thing to expect a user to do.  If that's the
simplest way to get the implementation to support ISO646-FR, then I'd
say the implementation doesn't support ISO646-FR.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#6190

From	Richard Damon <Richard@Damon-Family.org>
Date	2020-12-07 18:54 -0500
Message-ID	<ugzzH.4855$Am.2699@fx45.iad>
In reply to	#6188

On 12/7/20 6:27 PM, Keith Thompson wrote:
> Andreas Schwab <schwab@linux-m68k.org> writes:
>> On Dez 07 2020, Keith Thompson wrote:
>>> The first file it complains about, /usr/include/stdc-predef.h,
>>> is part of the implementation (specifically part of glibc).
>>> Either the implementation doesn't support ISO646-FR, or there's
>>> some configuration I would need to perform to make it support it.
>>
>> The system files are encoded in UTF-8, so if you want to use them in a
>> ISO646-FR context, you have to convert them first.
> 
> I suppose that would work (and would break the implementation for my
> normal use).
> 
> That's not a reasonable thing to expect a user to do.  If that's the
> simplest way to get the implementation to support ISO646-FR, then I'd
> say the implementation doesn't support ISO646-FR.
> 

Actually, unless the files use characteres outside the basic set, all
that it requires is encoding the problematic characters as trigraphs or
digraphs, which will work for all users.

[toc] | [prev] | [next] | [standalone]

#6191

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2020-12-07 16:10 -0800
Message-ID	<87im9ddup1.fsf@nosuchdomain.example.com>
In reply to	#6190

Richard Damon <Richard@Damon-Family.org> writes:
> On 12/7/20 6:27 PM, Keith Thompson wrote:
>> Andreas Schwab <schwab@linux-m68k.org> writes:
>>> On Dez 07 2020, Keith Thompson wrote:
>>>> The first file it complains about, /usr/include/stdc-predef.h,
>>>> is part of the implementation (specifically part of glibc).
>>>> Either the implementation doesn't support ISO646-FR, or there's
>>>> some configuration I would need to perform to make it support it.
>>>
>>> The system files are encoded in UTF-8, so if you want to use them in a
>>> ISO646-FR context, you have to convert them first.
>> 
>> I suppose that would work (and would break the implementation for my
>> normal use).
>> 
>> That's not a reasonable thing to expect a user to do.  If that's the
>> simplest way to get the implementation to support ISO646-FR, then I'd
>> say the implementation doesn't support ISO646-FR.
>
> Actually, unless the files use characteres outside the basic set, all
> that it requires is encoding the problematic characters as trigraphs or
> digraphs, which will work for all users.

Yes, that could work too.  (It would break the implementation for
modes in which trigraphs are disabled, but since such modes are
non-conforming it's not relevant to the current disucssion.)

Still, it doesn't provide what I'm looking for: an example of an
existing real-world conforming implementation that would not be
conforming if '@' et al were added to the basic character set.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

Page 1 of 2 [1] 2 Next page →

csiph-web

Add @ to basic character set?

Contents

#6165 — Add @ to basic character set?

#6167

#6168

#6169

#6170

#6171

#6172

#6173

#6174

#6175

#6176

#6179

#6180

#6182

#6184

#6185

#6187

#6188

#6190

#6191