Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.std.c > #6165 > unrolled thread
| Started by | Philipp Klaus Krause <pkk@spth.de> |
|---|---|
| First post | 2020-12-05 08:58 +0100 |
| Last post | 2021-07-10 08:46 -0700 |
| Articles | 11 on this page of 31 — 9 participants |
Back to article view | Back to comp.std.c
Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2020-12-05 08:58 +0100
Re: Add @ to basic character set? James Kuyper <jameskuyper@alumni.caltech.edu> - 2020-12-05 10:53 -0500
Re: Add @ to basic character set? David Brown <david.brown@hesbynett.no> - 2020-12-05 17:15 +0100
Re: Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2020-12-05 20:55 +0100
Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-05 14:17 -0800
Re: Add @ to basic character set? Francis Glassborow <francis.glassborow@btinternet.com> - 2020-12-06 12:25 +0000
Re: Add @ to basic character set? David Brown <david.brown@hesbynett.no> - 2020-12-06 13:47 +0100
Re: Add @ to basic character set? Richard Damon <Richard@Damon-Family.org> - 2020-12-06 08:42 -0500
Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-06 14:07 -0800
Re: Add @ to basic character set? Richard Damon <Richard@Damon-Family.org> - 2020-12-06 17:44 -0500
Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-06 15:49 -0800
Re: Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2020-12-07 09:31 +0100
Re: Add @ to basic character set? Richard Damon <Richard@Damon-Family.org> - 2020-12-07 07:24 -0500
Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-07 12:16 -0800
Re: Add @ to basic character set? Richard Damon <Richard@Damon-Family.org> - 2020-12-07 15:51 -0500
Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-07 13:10 -0800
Re: Add @ to basic character set? Andreas Schwab <schwab@linux-m68k.org> - 2020-12-07 23:52 +0100
Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-07 15:27 -0800
Re: Add @ to basic character set? Richard Damon <Richard@Damon-Family.org> - 2020-12-07 18:54 -0500
Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-07 16:10 -0800
Re: Add @ to basic character set? Richard Damon <Richard@Damon-Family.org> - 2020-12-07 18:31 -0500
Re: Add @ to basic character set? Andreas Schwab <schwab@linux-m68k.org> - 2020-12-07 23:08 +0100
Re: Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2020-12-07 09:30 +0100
Re: Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2020-12-07 09:17 +0100
Re: Add @ to basic character set? Thomas David Rivers <rivers@dignus.com> - 2020-12-06 16:11 -0500
Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2020-12-07 12:19 -0800
Re: Add @ to basic character set? Thomas David Rivers <rivers@dignus.com> - 2020-12-07 17:02 -0500
Re: Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2021-03-11 22:50 +0100
Re: Add @ to basic character set? Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-03-11 15:40 -0800
Re: Add @ to basic character set? Philipp Klaus Krause <pkk@spth.de> - 2021-03-12 15:25 +0100
Re: Add @ to basic character set? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2021-07-10 08:46 -0700
Page 2 of 2 — ← Prev page 1 [2]
| From | Richard Damon <Richard@Damon-Family.org> |
|---|---|
| Date | 2020-12-07 18:31 -0500 |
| Message-ID | <YWyzH.1118$ED6.419@fx38.iad> |
| In reply to | #6187 |
On 12/7/20 5:52 PM, Andreas Schwab wrote: > On Dez 07 2020, Keith Thompson wrote: > >> The first file it complains about, /usr/include/stdc-predef.h, >> is part of the implementation (specifically part of glibc). >> Either the implementation doesn't support ISO646-FR, or there's >> some configuration I would need to perform to make it support it. > > The system files are encoded in UTF-8, so if you want to use them in a > ISO646-FR context, you have to convert them first. > > Andreas. > It is perhaps a weakness in GCC that is seems that there is just one global file encoding parameter, so you need different versions of them for each encoding of your source files. I think you can make parrallel directories and change the system directory path for eacn. Now, likely your don't REALLY need all those different copies, as you could make just one copy for the ones that are missing any of the needed characters and replace them with trigraph or digraph encodings. You could of course just always use that encoded file, but that version would be less readable to those using the more 'normal' character sets.
[toc] | [prev] | [next] | [standalone]
| From | Andreas Schwab <schwab@linux-m68k.org> |
|---|---|
| Date | 2020-12-07 23:08 +0100 |
| Message-ID | <87o8j5jmlu.fsf@igel.home> |
| In reply to | #6182 |
On Dez 07 2020, Keith Thompson wrote:
> I just used "iconv -l" to get what I presume is a list of valid CHARSET
> values (there are over 1000 of them), which led me to this:
>
> gcc -std=c11 -pedantic-errors -finput-charset=ISO646-FR -c c.c
>
> With this source file:
>
> #include <stdio.h>
> int main(void) {
> puts("$@`");
> }
>
> it produced a cascade of errors, starting with:
>
> In file included from <command-line>:31:
> /usr/include/stdc-predef.h:18:1: error: stray ‘\302’ in program
> 18 | #ifndef _STDC_PREDEF_H
> | ^
>
> It looks like something translated the # character to \302 (0xc2).
> I have no idea why. (And it didn't complain about "$@`".)
That is the first byte of the UTF-8 representation of <U00A3>, which is
what 0x23 translates to in ISO646-FR.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
[toc] | [prev] | [next] | [standalone]
| From | Philipp Klaus Krause <pkk@spth.de> |
|---|---|
| Date | 2020-12-07 09:30 +0100 |
| Message-ID | <rqkp6h$qhb$1@solani.org> |
| In reply to | #6174 |
Am 06.12.20 um 23:07 schrieb Keith Thompson: > > Implementations that can't support […] are likely to be > for tiny exotic target systems, I made that mistake before, with N2576. Spoiler: ctype.h would be hard to provide for freestanding implementations targeting IBM mainframes. I don't expect @ $ ` to be a problem for tiny targets. But I am not familiar with IBm mainframes using EBCDIC and I am not familiar with weird character sets that might still be in use in parts of Asia.
[toc] | [prev] | [next] | [standalone]
| From | Philipp Klaus Krause <pkk@spth.de> |
|---|---|
| Date | 2020-12-07 09:17 +0100 |
| Message-ID | <rqkof3$pup$1@solani.org> |
| In reply to | #6170 |
Am 05.12.20 um 23:17 schrieb Keith Thompson:
>
> There are three printable ASCII characters that aren't in C's basic
> character set: '$', '`', and '@'. A guarantee that all three can be
> used in string literals, character constants, and comments could be
> useful. (Most programmers probably already assume they can be.)
>
` is a bit different from the other two: Some EBCDIC code pages that
contain $ and @ do not contain it, e.g. codepage 410 Cyrillic. AFAIK,
one can currently write the basic character set (with use of digraphs
for { and }) in EBCDIC codepage 410, which would no longer be possible
when ` gets added.
[toc] | [prev] | [next] | [standalone]
| From | Thomas David Rivers <rivers@dignus.com> |
|---|---|
| Date | 2020-12-06 16:11 -0500 |
| Message-ID | <rqlaj0$1s77$1@gioia.aioe.org> |
| In reply to | #6165 |
Philipp Klaus Krause wrote:
>I wonder if it would make sense to add @ to the basic character set.
>Virtually everyone is using it in comments and strings already anyway
>(for email addresses), and I don't see anything preventing
>implementations from supporting it, as it is available in both ASCII and
>common EBCDIC code pages:
>
>http://www.colecovision.eu/stuff/proposal-basic-@.html
>
>
Just to add to the "used as an extension" list of compilers; the Dignus
compilers (and the SAS/C compilers) for the mainframe use @ to be similar
to &, except that it can accept an rvalue. If an rvalue is present
after a @, then the address of a copy is generated. The copy is
declared within
the inner-most scope.
This is helpful in some situations on the mainframe where pass-by-reference
is the norm, as in:
FOO(@1, @2);
(where FOO is defined in some other language, e.g. PL/I, where the
parameters
are pass-by-reference.)
- Dave R. -
--
rivers@dignus.com Work: (919) 676-0847
Get your mainframe programming tools at http://www.dignus.com
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2020-12-07 12:19 -0800 |
| Message-ID | <87zh2pe5cr.fsf@nosuchdomain.example.com> |
| In reply to | #6181 |
Thomas David Rivers <rivers@dignus.com> writes:
> Philipp Klaus Krause wrote:
>
>>I wonder if it would make sense to add @ to the basic character set.
>>Virtually everyone is using it in comments and strings already anyway
>>(for email addresses), and I don't see anything preventing
>>implementations from supporting it, as it is available in both ASCII and
>>common EBCDIC code pages:
>>
>>http://www.colecovision.eu/stuff/proposal-basic-@.html
>>
> Just to add to the "used as an extension" list of compilers; the Dignus
> compilers (and the SAS/C compilers) for the mainframe use @ to be similar
> to &, except that it can accept an rvalue. If an rvalue is present
> after a @, then the address of a copy is generated. The copy is
> declared within
> the inner-most scope.
>
> This is helpful in some situations on the mainframe where pass-by-reference
> is the norm, as in:
>
> FOO(@1, @2);
>
> (where FOO is defined in some other language, e.g. PL/I, where the
> parameters
> are pass-by-reference.)
You can do the same thing with a compound literal starting in C99:
#include <stdio.h>
void FOO(int *a, int *b) {
printf("%d %d\n", *a, *b);
}
int main(void) {
FOO(&(int){1}, &(int){2});
}
I suspect the extension predates compound literals.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | Thomas David Rivers <rivers@dignus.com> |
|---|---|
| Date | 2020-12-07 17:02 -0500 |
| Message-ID | <5FCEA667.2000108@dignus.com> |
| In reply to | #6183 |
Keith Thompson wrote:
>Thomas David Rivers <rivers@dignus.com> writes:
>
>
>>Philipp Klaus Krause wrote:
>>
>>
>>
>>>I wonder if it would make sense to add @ to the basic character set.
>>>Virtually everyone is using it in comments and strings already anyway
>>>(for email addresses), and I don't see anything preventing
>>>implementations from supporting it, as it is available in both ASCII and
>>>common EBCDIC code pages:
>>>
>>>http://www.colecovision.eu/stuff/proposal-basic-@.html
>>>
>>>
>>>
>>Just to add to the "used as an extension" list of compilers; the Dignus
>>compilers (and the SAS/C compilers) for the mainframe use @ to be similar
>>to &, except that it can accept an rvalue. If an rvalue is present
>>after a @, then the address of a copy is generated. The copy is
>>declared within
>>the inner-most scope.
>>
>>This is helpful in some situations on the mainframe where pass-by-reference
>>is the norm, as in:
>>
>> FOO(@1, @2);
>>
>>(where FOO is defined in some other language, e.g. PL/I, where the
>>parameters
>>are pass-by-reference.)
>>
>>
>
>You can do the same thing with a compound literal starting in C99:
>
>#include <stdio.h>
>
>void FOO(int *a, int *b) {
> printf("%d %d\n", *a, *b);
>}
>
>int main(void) {
> FOO(&(int){1}, &(int){2});
>}
>
>I suspect the extension predates compound literals.
>
>
>
Yep - this extension predates those.
And - very clever use of them! It certainly does what someone would need
in this situation.
- Dave R. -
--
rivers@dignus.com Work: (919) 676-0847
Get your mainframe programming tools at http://www.dignus.com
[toc] | [prev] | [next] | [standalone]
| From | Philipp Klaus Krause <pkk@spth.de> |
|---|---|
| Date | 2021-03-11 22:50 +0100 |
| Message-ID | <s2e3av$429$1@solani.org> |
| In reply to | #6165 |
Am 05.12.20 um 08:58 schrieb Philipp Klaus Krause: > I wonder if it would make sense to add @ to the basic character set. > Virtually everyone is using it in comments and strings already anyway > (for email addresses), and I don't see anything preventing > implementations from supporting it, as it is available in both ASCII and > common EBCDIC code pages: > > http://www.colecovision.eu/stuff/proposal-basic-@.html > After some discussion and thought, IMO, the way forward is to add @ to the source and execution character sets, but not the basic source character set: http://www.colecovision.eu/stuff/proposal-@.html Do you think this proposal makes sense as is? If yes, do you have a preference for adding them as single bytes vs. not specifying if they are single bytes? If yes, why? Philipp
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2021-03-11 15:40 -0800 |
| Message-ID | <8735x1i7ir.fsf@nosuchdomain.example.com> |
| In reply to | #6216 |
Philipp Klaus Krause <pkk@spth.de> writes:
> Am 05.12.20 um 08:58 schrieb Philipp Klaus Krause:
>> I wonder if it would make sense to add @ to the basic character set.
>> Virtually everyone is using it in comments and strings already anyway
>> (for email addresses), and I don't see anything preventing
>> implementations from supporting it, as it is available in both ASCII and
>> common EBCDIC code pages:
>>
>> http://www.colecovision.eu/stuff/proposal-basic-@.html
>
> After some discussion and thought, IMO, the way forward is to add @ to
> the source and execution character sets, but not the basic source
> character set:
>
> http://www.colecovision.eu/stuff/proposal-@.html
>
> Do you think this proposal makes sense as is? If yes, do you have a
> preference for adding them as single bytes vs. not specifying if they
> are single bytes? If yes, why?
It's not *necesary*, but I wouldn't object to it.
If this change is going to be made, I'd advocate also adding $
(mentioned in the proposal) and ` (not mentioned). None of @,
$, and ` are required for any C tokens, but many implementations
allow $ in identifiers. @, $, and ` are the only ASCII characters
that are not part of the C basic character sets. All are commonly
used in character constants and string literals. (`, backtick,
is used in Markdown and some other languages.)
The *basic* characters are those that are required for all
implementations. The set of *extended* characters is
implementation-defined, and may be empty. The @, $, and ` characters
are extended characters in most or all current implementations. If @, $,
and ` are going to be required, I think they should be in the basic
character set. That's the point of the distinction between basic and
extended characters.
Both ASCII and the EBCDIC code pages that support them represent
all these characters in one byte. Their representations should be
required to fit in a byte, since that already applies to all the
other basic characters; allowing them to be multi-byte wouldn't
help portability and would add complexity.
The vast majority of implementations already conform to this proposal,
except perhaps for a minor documentation update.
The only reasons I can think of *not* to make this change are (a) *any*
change to the standard needs to justify the work needed to make the
change and this one isn't really necessary, and (b) apparently some
EBCDIC codepages don't support all these characters. If the latter
affects any actual implementations, the could pick some other printable
characters to stand in (similar things have been done in the past for
old ASCII variants).
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | Philipp Klaus Krause <pkk@spth.de> |
|---|---|
| Date | 2021-03-12 15:25 +0100 |
| Message-ID | <s2ftki$brt$1@solani.org> |
| In reply to | #6217 |
Am 12.03.21 um 00:40 schrieb Keith Thompson: > Philipp Klaus Krause <pkk@spth.de> writes: >> Am 05.12.20 um 08:58 schrieb Philipp Klaus Krause: >>> I wonder if it would make sense to add @ to the basic character set. >>> Virtually everyone is using it in comments and strings already anyway >>> (for email addresses), and I don't see anything preventing >>> implementations from supporting it, as it is available in both ASCII and >>> common EBCDIC code pages: >>> >>> http://www.colecovision.eu/stuff/proposal-basic-@.html >> >> After some discussion and thought, IMO, the way forward is to add @ to >> the source and execution character sets, but not the basic source >> character set: >> >> http://www.colecovision.eu/stuff/proposal-@.html >> >> Do you think this proposal makes sense as is? If yes, do you have a >> preference for adding them as single bytes vs. not specifying if they >> are single bytes? If yes, why? > > It's not *necesary*, but I wouldn't object to it. > > If this change is going to be made, I'd advocate also adding $ > (mentioned in the proposal) and ` (not mentioned). None of @, > $, and ` are required for any C tokens, but many implementations > allow $ in identifiers. @, $, and ` are the only ASCII characters > that are not part of the C basic character sets. All are commonly > used in character constants and string literals. (`, backtick, > is used in Markdown and some other languages.) ` makes sense. However, I don't know if WG14 wants it, so I'd make that a separate question in the same paper. > > The *basic* characters are those that are required for all > implementations. The set of *extended* characters is > implementation-defined, and may be empty. The @, $, and ` characters > are extended characters in most or all current implementations. If @, $, > and ` are going to be required, I think they should be in the basic > character set. That's the point of the distinction between basic and > extended characters. On the other hand, currently, using universal character names for characters in the basic source character set is not allowed, so moving characters into the basic source character set can actually break things. Also, there is undefined behaviour when a character outside the basic source character set is encountered in a source file, except in an identifier, a character constant, a string literal, a header name, a comment, or a preprocessing token that is never converted to a token. Since some implementations use @ and $ for special purposes, is makes sense to keep this undefined behaviour. Philipp
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2021-07-10 08:46 -0700 |
| Message-ID | <86zguu894g.fsf@linuxsc.com> |
| In reply to | #6216 |
Philipp Klaus Krause <pkk@spth.de> writes: > Am 05.12.20 um 08:58 schrieb Philipp Klaus Krause: > >> I wonder if it would make sense to add @ to the basic character set. >> Virtually everyone is using it in comments and strings already anyway >> (for email addresses), and I don't see anything preventing >> implementations from supporting it, as it is available in both ASCII and >> common EBCDIC code pages: >> >> http://www.colecovision.eu/stuff/proposal-basic-@.html > > After some discussion and thought, IMO, the way forward is to add @ to > the source and execution character sets, but not the basic source > character set: > > http://www.colecovision.eu/stuff/proposal-@.html > > Do you think this proposal makes sense as is? If yes, do you have a > preference for adding them as single bytes vs. not specifying if they > are single bytes? If yes, why? I would vote against the proposal, because it does nothing useful.
[toc] | [prev] | [standalone]
Page 2 of 2 — ← Prev page 1 [2]
Back to top | Article view | comp.std.c
csiph-web