Path: csiph.com!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail From: Keith Thompson Newsgroups: comp.std.c Subject: Re: Add @ to basic character set? Date: Thu, 11 Mar 2021 15:40:28 -0800 Organization: None to speak of Lines: 59 Message-ID: <8735x1i7ir.fsf@nosuchdomain.example.com> References: Mime-Version: 1.0 Content-Type: text/plain Injection-Info: reader02.eternal-september.org; posting-host="ecd7eddb6f754043ee77cce7f02dca07"; logging-data="27251"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/3E53QU/0vcE6aBvRLcO0B" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) Cancel-Lock: sha1:PTLVn+RRPsyfzsJTT/U5ks2vEJ8= sha1:IFDToPZqIzc1bWVD+I1LyDPjjhY= Xref: csiph.com comp.std.c:6217 Philipp Klaus Krause writes: > Am 05.12.20 um 08:58 schrieb Philipp Klaus Krause: >> I wonder if it would make sense to add @ to the basic character set. >> Virtually everyone is using it in comments and strings already anyway >> (for email addresses), and I don't see anything preventing >> implementations from supporting it, as it is available in both ASCII and >> common EBCDIC code pages: >> >> http://www.colecovision.eu/stuff/proposal-basic-@.html > > After some discussion and thought, IMO, the way forward is to add @ to > the source and execution character sets, but not the basic source > character set: > > http://www.colecovision.eu/stuff/proposal-@.html > > Do you think this proposal makes sense as is? If yes, do you have a > preference for adding them as single bytes vs. not specifying if they > are single bytes? If yes, why? It's not *necesary*, but I wouldn't object to it. If this change is going to be made, I'd advocate also adding $ (mentioned in the proposal) and ` (not mentioned). None of @, $, and ` are required for any C tokens, but many implementations allow $ in identifiers. @, $, and ` are the only ASCII characters that are not part of the C basic character sets. All are commonly used in character constants and string literals. (`, backtick, is used in Markdown and some other languages.) The *basic* characters are those that are required for all implementations. The set of *extended* characters is implementation-defined, and may be empty. The @, $, and ` characters are extended characters in most or all current implementations. If @, $, and ` are going to be required, I think they should be in the basic character set. That's the point of the distinction between basic and extended characters. Both ASCII and the EBCDIC code pages that support them represent all these characters in one byte. Their representations should be required to fit in a byte, since that already applies to all the other basic characters; allowing them to be multi-byte wouldn't help portability and would add complexity. The vast majority of implementations already conform to this proposal, except perhaps for a minor documentation update. The only reasons I can think of *not* to make this change are (a) *any* change to the standard needs to justify the work needed to make the change and this one isn't really necessary, and (b) apparently some EBCDIC codepages don't support all these characters. If the latter affects any actual implementations, the could pick some other printable characters to stand in (similar things have been done in the past for old ASCII variants). -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com Working, but not speaking, for Philips Healthcare void Void(void) { Void(); } /* The recursive call of the void */