Path: csiph.com!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: Keith Thompson <Keith.S.Thompson+u@gmail.com>
Newsgroups: comp.std.c
Subject: Re: Add @ to basic character set?
Date: Mon, 07 Dec 2020 12:16:14 -0800
Organization: None to speak of
Lines: 94
Message-ID: <874kkxfk35.fsf@nosuchdomain.example.com>
References: <rqfeip$lrr$1@solani.org> <rqgae4$1u1$1@dont-email.me> <87pn3ngao5.fsf@nosuchdomain.example.com> <rqiik6$58p$1@dont-email.me> <rqijtd$dbr$1@dont-email.me> <ud5zH.388173$gR8.7685@fx45.iad> <87lfeafv1r.fsf@nosuchdomain.example.com> <D9dzH.185712$ql4.125078@fx39.iad> <87czzmfqb3.fsf@nosuchdomain.example.com> <4apzH.236649$xe4.230701@fx41.iad>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: reader02.eternal-september.org; posting-host="8c1d1df3fcf9f83d93d1ccf626f65b8d"; logging-data="27800"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+ZAzc4hiq9E2bsvKNhiDoe"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
Cancel-Lock: sha1:l2bpvnf62SUtgMMVq5tG6uR0rPg= sha1:lcETnk6iEVof5nX3gprZGyxEaDs=
Xref: csiph.com comp.std.c:6182

Richard Damon <Richard@Damon-Family.org> writes:
> On 12/6/20 6:49 PM, Keith Thompson wrote:
>> Richard Damon <Richard@Damon-Family.org> writes:
>>> On 12/6/20 5:07 PM, Keith Thompson wrote:
>>>> Richard Damon <Richard@Damon-Family.org> writes:
>>>> [...]
>>>>> The issue with making them part of the basic character set is that it
>>>>> makes any system that can't do this, because it uses a strange character
>>>>> set, non-conforming. Since systems ARE allowed to add any characters
>>>>> they want to the source or execution character set, those that currently
>>>>> support them can do so. Forcing them to be included drops some system
>>>>> from being able to have a conforming implementation, and the committee
>>>>> has traditionally avoided gratuitously making systems non-conforming.
>>>>
>>>> (Context: The ASCII characters '@', '$', and '`'.)
>>>>
>>>> I'd be interested in seeing an implementation for which this would
>>>> be relevant.  Such an implementation (a) would be unable to (easily)
>>>> represent those three character in source code and/or during
>>>> execution *and* (b) would otherwise conform to the hypothetical
>>>> edition of the C standard that would add them to the basic character
>>>> set if it were not for this change.
>>>
>>> As was mentioned, all that you need is to want to support ISO/IEC 646
>>> for a naional character set that doesn't define code point 64 as @
>>>
>>> This includes Canadian, French, German, Irish, and a number of others.
>>>
>>> See https://en.wikipedia.org/wiki/ISO/IEC_646 for a chart of these.
>> 
>> What C implementations support those character sets (and are likely to
>> attempt to conform to a future C standard that adds '@' to the basic
>> character set)?
>
> gcc (and many others) with the right choice of file encoding options.
> The key point here is that this change would be telling a number of
> national bodies that their whole national character set (and thus in
> some respects their language) will no longer be supported.

OK.  Can you explain precisely how to invoke gcc with the right choice
of file encoding options?  I've found this option in the gcc manual:

'-finput-charset=CHARSET'
     Set the input character set, used for translation from the
     character set of the input file to the source character set used by
     GCC.  If the locale does not specify, or GCC cannot get this
     information from the locale, the default is UTF-8.  This can be
     overridden by either the locale or this command-line option.
     Currently the command-line option takes precedence if there's a
     conflict.  CHARSET can be any encoding supported by the system's
     'iconv' library routine.

but I had never used it.

I just used "iconv -l" to get what I presume is a list of valid CHARSET
values (there are over 1000 of them), which led me to this:

    gcc -std=c11 -pedantic-errors -finput-charset=ISO646-FR -c c.c

With this source file:

    #include <stdio.h>
    int main(void) {
        puts("$@`");
    }

it produced a cascade of errors, starting with:

    In file included from <command-line>:31:
    /usr/include/stdc-predef.h:18:1: error: stray ‘\302’ in program
       18 | #ifndef _STDC_PREDEF_H
             | ^

It looks like something translated the # character to \302 (0xc2).
I have no idea why.  (And it didn't complain about "$@`".)

If there's a way to invoke gcc telling it to use a character set that
doesn't include those characters, that would be a good refutation
to my point.  If doing so is actually useful in some contexts,
it would be an even better refutation.  So far I'm not convinced,
but I'm prepared to be.

My impression is that the old 7-bit national character sets are
no longer relevant, and that dropping support for them in the
C standard (more precisely, updating the C standard in a manner
that's inconsistent with those character sets) would be very nearly
harmless.  I'm looking for evidence that that's not the case.

[...]

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */