Path: csiph.com!news.mixmin.net!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch Newsgroups: comp.lang.c Subject: Re: C vs Haskell for XML parsing Date: Wed, 30 Aug 2023 23:36:48 -0700 Organization: A noiseless patient Spider Lines: 27 Message-ID: <86a5u7u39b.fsf@linuxsc.com> References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com> <20230826123929.770@kylheku.com> <20230826210521.20@kylheku.com> <20230827151627.814@kylheku.com> <87edjocbqj.fsf@nosuchdomain.example.com> <86edjnxo81.fsf@linuxsc.com> <87ledubyeh.fsf@nosuchdomain.example.com> <861qfmwwvy.fsf@linuxsc.com> <20230828182115.305@kylheku.com> <875y4xboly.fsf@nosuchdomain.example.com> <87o7io9xsv.fsf@nosuchdomain.example.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: dont-email.me; posting-host="49a0c7fba7d7c0f06cea865d80b29294"; logging-data="3311199"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/6yAWsRoJ5k4i68CLsR14+0UMgxZrzz4s=" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:wMzkdbio37vOnphN8Fl9EYhRiNU= sha1:oO1iax3n9vB1tkTNQ1ptyh+OaiA= Xref: csiph.com comp.lang.c:173379 Keith Thompson writes: [..syntax for identifiers...] > One odd thing (in both N1570 and N3096) is that the Semantics > subsection uses "shall". For example, N1570 6.4.2.1p3 says: > > Each universal character name in an identifier shall designate > a character whose encoding in ISO/IEC 10646 falls into one of > the ranges specified in D.1. The initial character shall not > be a universal character name designating a character whose > encoding falls into one of the ranges specified in D.2. > > This implies that a violation of such a requirement has undefined > behavior. I would have expected it to be a syntax error. Clearly the idea is that implementations be allowed to choose what other universal character names, if any, are to be permitted in identifiers. As an example, consider an implementation that supports the common extension of allowing dollar signs in identifiers. It would make sense, in case a keyboard is being used that doesn't have a dollar sign key, to allow the universal character name for dollar sign (\u0024, IIANM). Other universal character names might serve some other purpose, not being part of the identifier but not necessarily causing an error either. Apparently anything less than undefined behavior was thought to be too limiting.