Path: csiph.com!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail From: Keith Thompson Newsgroups: comp.std.c Subject: Re: Add @ to basic character set? Date: Mon, 07 Dec 2020 12:16:14 -0800 Organization: None to speak of Lines: 94 Message-ID: <874kkxfk35.fsf@nosuchdomain.example.com> References: <87pn3ngao5.fsf@nosuchdomain.example.com> <87lfeafv1r.fsf@nosuchdomain.example.com> <87czzmfqb3.fsf@nosuchdomain.example.com> <4apzH.236649$xe4.230701@fx41.iad> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: reader02.eternal-september.org; posting-host="8c1d1df3fcf9f83d93d1ccf626f65b8d"; logging-data="27800"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+ZAzc4hiq9E2bsvKNhiDoe" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) Cancel-Lock: sha1:l2bpvnf62SUtgMMVq5tG6uR0rPg= sha1:lcETnk6iEVof5nX3gprZGyxEaDs= Xref: csiph.com comp.std.c:6182 Richard Damon writes: > On 12/6/20 6:49 PM, Keith Thompson wrote: >> Richard Damon writes: >>> On 12/6/20 5:07 PM, Keith Thompson wrote: >>>> Richard Damon writes: >>>> [...] >>>>> The issue with making them part of the basic character set is that it >>>>> makes any system that can't do this, because it uses a strange character >>>>> set, non-conforming. Since systems ARE allowed to add any characters >>>>> they want to the source or execution character set, those that currently >>>>> support them can do so. Forcing them to be included drops some system >>>>> from being able to have a conforming implementation, and the committee >>>>> has traditionally avoided gratuitously making systems non-conforming. >>>> >>>> (Context: The ASCII characters '@', '$', and '`'.) >>>> >>>> I'd be interested in seeing an implementation for which this would >>>> be relevant. Such an implementation (a) would be unable to (easily) >>>> represent those three character in source code and/or during >>>> execution *and* (b) would otherwise conform to the hypothetical >>>> edition of the C standard that would add them to the basic character >>>> set if it were not for this change. >>> >>> As was mentioned, all that you need is to want to support ISO/IEC 646 >>> for a naional character set that doesn't define code point 64 as @ >>> >>> This includes Canadian, French, German, Irish, and a number of others. >>> >>> See https://en.wikipedia.org/wiki/ISO/IEC_646 for a chart of these. >> >> What C implementations support those character sets (and are likely to >> attempt to conform to a future C standard that adds '@' to the basic >> character set)? > > gcc (and many others) with the right choice of file encoding options. > The key point here is that this change would be telling a number of > national bodies that their whole national character set (and thus in > some respects their language) will no longer be supported. OK. Can you explain precisely how to invoke gcc with the right choice of file encoding options? I've found this option in the gcc manual: '-finput-charset=CHARSET' Set the input character set, used for translation from the character set of the input file to the source character set used by GCC. If the locale does not specify, or GCC cannot get this information from the locale, the default is UTF-8. This can be overridden by either the locale or this command-line option. Currently the command-line option takes precedence if there's a conflict. CHARSET can be any encoding supported by the system's 'iconv' library routine. but I had never used it. I just used "iconv -l" to get what I presume is a list of valid CHARSET values (there are over 1000 of them), which led me to this: gcc -std=c11 -pedantic-errors -finput-charset=ISO646-FR -c c.c With this source file: #include int main(void) { puts("$@`"); } it produced a cascade of errors, starting with: In file included from :31: /usr/include/stdc-predef.h:18:1: error: stray ‘\302’ in program 18 | #ifndef _STDC_PREDEF_H | ^ It looks like something translated the # character to \302 (0xc2). I have no idea why. (And it didn't complain about "$@`".) If there's a way to invoke gcc telling it to use a character set that doesn't include those characters, that would be a good refutation to my point. If doing so is actually useful in some contexts, it would be an even better refutation. So far I'm not convinced, but I'm prepared to be. My impression is that the old 7-bit national character sets are no longer relevant, and that dropping support for them in the C standard (more precisely, updating the C standard in a manner that's inconsistent with those character sets) would be very nearly harmless. I'm looking for evidence that that's not the case. [...] -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com Working, but not speaking, for Philips Healthcare void Void(void) { Void(); } /* The recursive call of the void */