Path: csiph.com!news.mixmin.net!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Tim Rentsch
Newsgroups: comp.lang.c
Subject: Re: C vs Haskell for XML parsing
Date: Wed, 30 Aug 2023 23:36:48 -0700
Organization: A noiseless patient Spider
Lines: 27
Message-ID: <86a5u7u39b.fsf@linuxsc.com>
References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com> <20230826123929.770@kylheku.com> <20230826210521.20@kylheku.com> <20230827151627.814@kylheku.com> <87edjocbqj.fsf@nosuchdomain.example.com> <86edjnxo81.fsf@linuxsc.com> <87ledubyeh.fsf@nosuchdomain.example.com> <861qfmwwvy.fsf@linuxsc.com> <20230828182115.305@kylheku.com> <875y4xboly.fsf@nosuchdomain.example.com> <87o7io9xsv.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: dont-email.me; posting-host="49a0c7fba7d7c0f06cea865d80b29294"; logging-data="3311199"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/6yAWsRoJ5k4i68CLsR14+0UMgxZrzz4s="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:wMzkdbio37vOnphN8Fl9EYhRiNU= sha1:oO1iax3n9vB1tkTNQ1ptyh+OaiA=
Xref: csiph.com comp.lang.c:173379
Keith Thompson writes:
[..syntax for identifiers...]
> One odd thing (in both N1570 and N3096) is that the Semantics
> subsection uses "shall". For example, N1570 6.4.2.1p3 says:
>
> Each universal character name in an identifier shall designate
> a character whose encoding in ISO/IEC 10646 falls into one of
> the ranges specified in D.1. The initial character shall not
> be a universal character name designating a character whose
> encoding falls into one of the ranges specified in D.2.
>
> This implies that a violation of such a requirement has undefined
> behavior. I would have expected it to be a syntax error.
Clearly the idea is that implementations be allowed to choose
what other universal character names, if any, are to be permitted
in identifiers. As an example, consider an implementation that
supports the common extension of allowing dollar signs in
identifiers. It would make sense, in case a keyboard is being
used that doesn't have a dollar sign key, to allow the universal
character name for dollar sign (\u0024, IIANM). Other universal
character names might serve some other purpose, not being part
of the identifier but not necessarily causing an error either.
Apparently anything less than undefined behavior was thought to
be too limiting.