Groups > comp.std.c > #6165 > unrolled thread

Add @ to basic character set?

Started by	Philipp Klaus Krause <pkk@spth.de>
First post	2020-12-05 08:58 +0100
Last post	2021-07-10 08:46 -0700
Articles	11 on this page of 31 — 9 participants

Back to article view | Back to comp.std.c

  Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2020-12-05 08:58 +0100
    Re: Add @ to basic character set? James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-12-05 10:53 -0500
      Re: Add @ to basic character set? David Brown <david.brown@hesbynett.no> - 2020-12-05 17:15 +0100
        Re: Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2020-12-05 20:55 +0100
      Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-05 14:17 -0800
        Re: Add @ to basic character set? Francis Glassborow <francis.glassborow@btinternet.com> - 2020-12-06 12:25 +0000
          Re: Add @ to basic character set? David Brown <david.brown@hesbynett.no> - 2020-12-06 13:47 +0100
            Re: Add @ to basic character set? Richard Damon <Richard@Damon-Family.org> - 2020-12-06 08:42 -0500
              Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-06 14:07 -0800
                Re: Add @ to basic character set? Richard Damon <Richard@Damon-Family.org> - 2020-12-06 17:44 -0500
                  Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-06 15:49 -0800
                    Re: Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2020-12-07 09:31 +0100
                    Re: Add @ to basic character set? Richard Damon <Richard@Damon-Family.org> - 2020-12-07 07:24 -0500
                      Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-07 12:16 -0800
                        Re: Add @ to basic character set? Richard Damon <Richard@Damon-Family.org> - 2020-12-07 15:51 -0500
                          Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-07 13:10 -0800
                            Re: Add @ to basic character set? Andreas Schwab <schwab@linux-m68k.org> - 2020-12-07 23:52 +0100
                              Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-07 15:27 -0800
                                Re: Add @ to basic character set? Richard Damon <Richard@Damon-Family.org> - 2020-12-07 18:54 -0500
                                  Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-07 16:10 -0800
                              Re: Add @ to basic character set? Richard Damon <Richard@Damon-Family.org> - 2020-12-07 18:31 -0500
                        Re: Add @ to basic character set? Andreas Schwab <schwab@linux-m68k.org> - 2020-12-07 23:08 +0100
                Re: Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2020-12-07 09:30 +0100
        Re: Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2020-12-07 09:17 +0100
    Re: Add @ to basic character set? Thomas David Rivers <rivers@dignus.com> - 2020-12-06 16:11 -0500
      Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-07 12:19 -0800
        Re: Add @ to basic character set? Thomas David Rivers <rivers@dignus.com> - 2020-12-07 17:02 -0500
    Re: Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2021-03-11 22:50 +0100
      Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-03-11 15:40 -0800
        Re: Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2021-03-12 15:25 +0100
      Re: Add @ to basic character set? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2021-07-10 08:46 -0700

Page 2 of 2 — ← Prev page 1 [2]

#6189

From	Richard Damon <Richard@Damon-Family.org>
Date	2020-12-07 18:31 -0500
Message-ID	<YWyzH.1118$ED6.419@fx38.iad>
In reply to	#6187

On 12/7/20 5:52 PM, Andreas Schwab wrote:
> On Dez 07 2020, Keith Thompson wrote:
> 
>> The first file it complains about, /usr/include/stdc-predef.h,
>> is part of the implementation (specifically part of glibc).
>> Either the implementation doesn't support ISO646-FR, or there's
>> some configuration I would need to perform to make it support it.
> 
> The system files are encoded in UTF-8, so if you want to use them in a
> ISO646-FR context, you have to convert them first.
> 
> Andreas.
> 

It is perhaps a weakness in GCC that is seems that there is just one
global file encoding parameter, so you need different versions of them
for each encoding of your source files. I think you can make parrallel
directories and change the system directory path for eacn.

Now, likely your don't REALLY need all those different copies, as you
could make just one copy for the ones that are missing any of the needed
characters and replace them with trigraph or digraph encodings.

You could of course just always use that encoded file, but that version
would be less readable to those using the more 'normal' character sets.

[toc] | [prev] | [next] | [standalone]

#6186

From	Andreas Schwab <schwab@linux-m68k.org>
Date	2020-12-07 23:08 +0100
Message-ID	<87o8j5jmlu.fsf@igel.home>
In reply to	#6182

On Dez 07 2020, Keith Thompson wrote:

> I just used "iconv -l" to get what I presume is a list of valid CHARSET
> values (there are over 1000 of them), which led me to this:
>
>     gcc -std=c11 -pedantic-errors -finput-charset=ISO646-FR -c c.c
>
> With this source file:
>
>     #include <stdio.h>
>     int main(void) {
>         puts("$@`");
>     }
>
> it produced a cascade of errors, starting with:
>
>     In file included from <command-line>:31:
>     /usr/include/stdc-predef.h:18:1: error: stray ‘\302’ in program
>        18 | #ifndef _STDC_PREDEF_H
>              | ^
>
> It looks like something translated the # character to \302 (0xc2).
> I have no idea why.  (And it didn't complain about "$@`".)

That is the first byte of the UTF-8 representation of <U00A3>, which is
what 0x23 translates to in ISO646-FR.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

[toc] | [prev] | [next] | [standalone]

#6178

From	Philipp Klaus Krause <pkk@spth.de>
Date	2020-12-07 09:30 +0100
Message-ID	<rqkp6h$qhb$1@solani.org>
In reply to	#6174

Am 06.12.20 um 23:07 schrieb Keith Thompson:

> 
> Implementations that can't support […] are likely to be
> for tiny exotic target systems,
I made that mistake before, with N2576. Spoiler: ctype.h would be hard
to provide for freestanding implementations targeting IBM mainframes.

I don't expect @ $ ` to be a problem for tiny targets. But I am not
familiar with IBm mainframes using EBCDIC and I am not familiar with
weird character sets that might still be in use in parts of Asia.

[toc] | [prev] | [next] | [standalone]

#6177

From	Philipp Klaus Krause <pkk@spth.de>
Date	2020-12-07 09:17 +0100
Message-ID	<rqkof3$pup$1@solani.org>
In reply to	#6170

Am 05.12.20 um 23:17 schrieb Keith Thompson:

> 
> There are three printable ASCII characters that aren't in C's basic
> character set: '$', '`', and '@'.  A guarantee that all three can be
> used in string literals, character constants, and comments could be
> useful.  (Most programmers probably already assume they can be.)
> 

` is a bit different from the other two: Some EBCDIC code pages that
contain $ and @ do not contain it, e.g. codepage 410 Cyrillic. AFAIK,
one can currently write the basic character set (with use of digraphs
for { and }) in EBCDIC codepage 410, which would no longer be possible
when ` gets added.

[toc] | [prev] | [next] | [standalone]

#6181

From	Thomas David Rivers <rivers@dignus.com>
Date	2020-12-06 16:11 -0500
Message-ID	<rqlaj0$1s77$1@gioia.aioe.org>
In reply to	#6165

Philipp Klaus Krause wrote:

>I wonder if it would make sense to add @ to the basic character set.
>Virtually everyone is using it in comments and strings already anyway
>(for email addresses), and I don't see anything preventing
>implementations from supporting it, as it is available in both ASCII and
>common EBCDIC code pages:
>
>http://www.colecovision.eu/stuff/proposal-basic-@.html
>  
>
Just to add to the "used as an extension" list of compilers; the Dignus
compilers (and the SAS/C compilers) for the mainframe use @ to be similar
to &, except that it can accept an rvalue.   If an rvalue is present
after a @, then the address of a copy is generated.  The copy is 
declared within
the inner-most scope.

This is helpful in some situations on the mainframe where pass-by-reference
is the norm, as in:

   FOO(@1, @2);

(where FOO is defined in some other language, e.g. PL/I, where the 
parameters
are pass-by-reference.)

     - Dave R. -

-- 
rivers@dignus.com                        Work: (919) 676-0847
Get your mainframe programming tools at http://www.dignus.com

[toc] | [prev] | [next] | [standalone]

#6183

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2020-12-07 12:19 -0800
Message-ID	<87zh2pe5cr.fsf@nosuchdomain.example.com>
In reply to	#6181

Thomas David Rivers <rivers@dignus.com> writes:
> Philipp Klaus Krause wrote:
>
>>I wonder if it would make sense to add @ to the basic character set.
>>Virtually everyone is using it in comments and strings already anyway
>>(for email addresses), and I don't see anything preventing
>>implementations from supporting it, as it is available in both ASCII and
>>common EBCDIC code pages:
>>
>>http://www.colecovision.eu/stuff/proposal-basic-@.html
>>  
> Just to add to the "used as an extension" list of compilers; the Dignus
> compilers (and the SAS/C compilers) for the mainframe use @ to be similar
> to &, except that it can accept an rvalue.   If an rvalue is present
> after a @, then the address of a copy is generated.  The copy is
> declared within
> the inner-most scope.
>
> This is helpful in some situations on the mainframe where pass-by-reference
> is the norm, as in:
>
>   FOO(@1, @2);
>
> (where FOO is defined in some other language, e.g. PL/I, where the
> parameters
> are pass-by-reference.)

You can do the same thing with a compound literal starting in C99:

#include <stdio.h>

void FOO(int *a, int *b) {
    printf("%d %d\n", *a, *b);
}

int main(void) {
    FOO(&(int){1}, &(int){2});
}

I suspect the extension predates compound literals.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#6192

From	Thomas David Rivers <rivers@dignus.com>
Date	2020-12-07 17:02 -0500
Message-ID	<5FCEA667.2000108@dignus.com>
In reply to	#6183

Keith Thompson wrote:

>Thomas David Rivers <rivers@dignus.com> writes:
>  
>
>>Philipp Klaus Krause wrote:
>>
>>    
>>
>>>I wonder if it would make sense to add @ to the basic character set.
>>>Virtually everyone is using it in comments and strings already anyway
>>>(for email addresses), and I don't see anything preventing
>>>implementations from supporting it, as it is available in both ASCII and
>>>common EBCDIC code pages:
>>>
>>>http://www.colecovision.eu/stuff/proposal-basic-@.html
>>> 
>>>      
>>>
>>Just to add to the "used as an extension" list of compilers; the Dignus
>>compilers (and the SAS/C compilers) for the mainframe use @ to be similar
>>to &, except that it can accept an rvalue.   If an rvalue is present
>>after a @, then the address of a copy is generated.  The copy is
>>declared within
>>the inner-most scope.
>>
>>This is helpful in some situations on the mainframe where pass-by-reference
>>is the norm, as in:
>>
>>  FOO(@1, @2);
>>
>>(where FOO is defined in some other language, e.g. PL/I, where the
>>parameters
>>are pass-by-reference.)
>>    
>>
>
>You can do the same thing with a compound literal starting in C99:
>
>#include <stdio.h>
>
>void FOO(int *a, int *b) {
>    printf("%d %d\n", *a, *b);
>}
>
>int main(void) {
>    FOO(&(int){1}, &(int){2});
>}
>
>I suspect the extension predates compound literals.
>
>  
>
Yep - this extension predates those.

And - very clever use of them!  It certainly does what someone would need
in this situation.

    - Dave R. -



-- 
rivers@dignus.com                        Work: (919) 676-0847
Get your mainframe programming tools at http://www.dignus.com

[toc] | [prev] | [next] | [standalone]

#6216

From	Philipp Klaus Krause <pkk@spth.de>
Date	2021-03-11 22:50 +0100
Message-ID	<s2e3av$429$1@solani.org>
In reply to	#6165

Am 05.12.20 um 08:58 schrieb Philipp Klaus Krause:
> I wonder if it would make sense to add @ to the basic character set.
> Virtually everyone is using it in comments and strings already anyway
> (for email addresses), and I don't see anything preventing
> implementations from supporting it, as it is available in both ASCII and
> common EBCDIC code pages:
> 
> http://www.colecovision.eu/stuff/proposal-basic-@.html
> 

After some discussion and thought, IMO, the way forward is to add @ to
the source and execution character sets, but not the basic source
character set:

http://www.colecovision.eu/stuff/proposal-@.html

Do you think this proposal makes sense as is? If yes, do you have a
preference for adding them as single bytes vs. not specifying if they
are single bytes? If yes, why?

Philipp

[toc] | [prev] | [next] | [standalone]

#6217

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2021-03-11 15:40 -0800
Message-ID	<8735x1i7ir.fsf@nosuchdomain.example.com>
In reply to	#6216

Philipp Klaus Krause <pkk@spth.de> writes:
> Am 05.12.20 um 08:58 schrieb Philipp Klaus Krause:
>> I wonder if it would make sense to add @ to the basic character set.
>> Virtually everyone is using it in comments and strings already anyway
>> (for email addresses), and I don't see anything preventing
>> implementations from supporting it, as it is available in both ASCII and
>> common EBCDIC code pages:
>> 
>> http://www.colecovision.eu/stuff/proposal-basic-@.html
>
> After some discussion and thought, IMO, the way forward is to add @ to
> the source and execution character sets, but not the basic source
> character set:
>
> http://www.colecovision.eu/stuff/proposal-@.html
>
> Do you think this proposal makes sense as is? If yes, do you have a
> preference for adding them as single bytes vs. not specifying if they
> are single bytes? If yes, why?

It's not *necesary*, but I wouldn't object to it.

If this change is going to be made, I'd advocate also adding $
(mentioned in the proposal) and ` (not mentioned).  None of @,
$, and ` are required for any C tokens, but many implementations
allow $ in identifiers.  @, $, and ` are the only ASCII characters
that are not part of the C basic character sets.  All are commonly
used in character constants and string literals.  (`, backtick,
is used in Markdown and some other languages.)

The *basic* characters are those that are required for all
implementations.  The set of *extended* characters is
implementation-defined, and may be empty.  The @, $, and ` characters
are extended characters in most or all current implementations. If @, $,
and ` are going to be required, I think they should be in the basic
character set.  That's the point of the distinction between basic and
extended characters.

Both ASCII and the EBCDIC code pages that support them represent
all these characters in one byte.  Their representations should be
required to fit in a byte, since that already applies to all the
other basic characters; allowing them to be multi-byte wouldn't
help portability and would add complexity.

The vast majority of implementations already conform to this proposal,
except perhaps for a minor documentation update.

The only reasons I can think of *not* to make this change are (a) *any*
change to the standard needs to justify the work needed to make the
change and this one isn't really necessary, and (b) apparently some
EBCDIC codepages don't support all these characters.  If the latter
affects any actual implementations, the could pick some other printable
characters to stand in (similar things have been done in the past for
old ASCII variants).

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#6218

From	Philipp Klaus Krause <pkk@spth.de>
Date	2021-03-12 15:25 +0100
Message-ID	<s2ftki$brt$1@solani.org>
In reply to	#6217

Am 12.03.21 um 00:40 schrieb Keith Thompson:
> Philipp Klaus Krause <pkk@spth.de> writes:
>> Am 05.12.20 um 08:58 schrieb Philipp Klaus Krause:
>>> I wonder if it would make sense to add @ to the basic character set.
>>> Virtually everyone is using it in comments and strings already anyway
>>> (for email addresses), and I don't see anything preventing
>>> implementations from supporting it, as it is available in both ASCII and
>>> common EBCDIC code pages:
>>>
>>> http://www.colecovision.eu/stuff/proposal-basic-@.html
>>
>> After some discussion and thought, IMO, the way forward is to add @ to
>> the source and execution character sets, but not the basic source
>> character set:
>>
>> http://www.colecovision.eu/stuff/proposal-@.html
>>
>> Do you think this proposal makes sense as is? If yes, do you have a
>> preference for adding them as single bytes vs. not specifying if they
>> are single bytes? If yes, why?
> 
> It's not *necesary*, but I wouldn't object to it.
> 
> If this change is going to be made, I'd advocate also adding $
> (mentioned in the proposal) and ` (not mentioned).  None of @,
> $, and ` are required for any C tokens, but many implementations
> allow $ in identifiers.  @, $, and ` are the only ASCII characters
> that are not part of the C basic character sets.  All are commonly
> used in character constants and string literals.  (`, backtick,
> is used in Markdown and some other languages.)

` makes sense. However, I don't know if WG14 wants it, so I'd make that
a separate question in the same paper.

> 
> The *basic* characters are those that are required for all
> implementations.  The set of *extended* characters is
> implementation-defined, and may be empty.  The @, $, and ` characters
> are extended characters in most or all current implementations. If @, $,
> and ` are going to be required, I think they should be in the basic
> character set.  That's the point of the distinction between basic and
> extended characters.

On the other hand, currently, using universal character names for
characters in the basic source character set is not allowed, so moving
characters into the basic source character set can actually break things.

Also, there is undefined behaviour when a character outside the basic
source character set is encountered in a source file, except in an
identifier, a character constant, a string literal, a header name, a
comment, or a preprocessing token that is never converted to a token.
Since some implementations use @ and $ for special purposes, is makes
sense to keep this undefined behaviour.

Philipp

[toc] | [prev] | [next] | [standalone]

#6263

From	Tim Rentsch <tr.17687@z991.linuxsc.com>
Date	2021-07-10 08:46 -0700
Message-ID	<86zguu894g.fsf@linuxsc.com>
In reply to	#6216

Philipp Klaus Krause <pkk@spth.de> writes:

> Am 05.12.20 um 08:58 schrieb Philipp Klaus Krause:
>
>> I wonder if it would make sense to add @ to the basic character set.
>> Virtually everyone is using it in comments and strings already anyway
>> (for email addresses), and I don't see anything preventing
>> implementations from supporting it, as it is available in both ASCII and
>> common EBCDIC code pages:
>>
>> http://www.colecovision.eu/stuff/proposal-basic-@.html
>
> After some discussion and thought, IMO, the way forward is to add @ to
> the source and execution character sets, but not the basic source
> character set:
>
> http://www.colecovision.eu/stuff/proposal-@.html
>
> Do you think this proposal makes sense as is?  If yes, do you have a
> preference for adding them as single bytes vs. not specifying if they
> are single bytes?  If yes, why?

I would vote against the proposal, because it does nothing useful.

[toc] | [prev] | [standalone]

Page 2 of 2 — ← Prev page 1 [2]

csiph-web

Add @ to basic character set?

Contents

#6189

#6186

#6178

#6177

#6181

#6183

#6192

#6216

#6217

#6218

#6263