Path: csiph.com!news.mixmin.net!eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch Newsgroups: comp.lang.c Subject: Re: C vs Haskell for XML parsing Date: Tue, 03 Oct 2023 03:19:24 -0700 Organization: A noiseless patient Spider Lines: 79 Message-ID: <86wmw4c8k3.fsf@linuxsc.com> References: <576801fa-2842-40dc-bf19-221a5b1cf660n@googlegroups.com> <20230826123929.770@kylheku.com> <20230826210521.20@kylheku.com> <20230827151627.814@kylheku.com> <87edjocbqj.fsf@nosuchdomain.example.com> <86edjnxo81.fsf@linuxsc.com> <87ledubyeh.fsf@nosuchdomain.example.com> <861qfmwwvy.fsf@linuxsc.com> <20230828182115.305@kylheku.com> <875y4xboly.fsf@nosuchdomain.example.com> <8734zv7cgb.fsf@nosuchdomain.example.com> <86il8rqeps.fsf@linuxsc.com> <87y1hm6dyl.fsf@nosuchdomain.example.com> <86wmx6q0o5.fsf@linuxsc.com> <87ttsa6ax2.fsf@nosuchdomain.example.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: dont-email.me; posting-host="a7f15c7033ff7991d1a910385a293cab"; logging-data="3690250"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/QltZAO5gys9kz6qKXUmqM5DZ1jowBTnE=" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:+1w4tek9G/mWsTnW5ZiH05K+xDw= sha1:LRvVHa7n/AHKr0y5LVsj5iB9ibs= Xref: csiph.com comp.lang.c:177179 Keith Thompson writes: > Tim Rentsch writes: > >> Keith Thompson writes: >> >>> Tim Rentsch writes: >>> >>>> Keith Thompson writes: >>>> >>>>> Keith Thompson writes: >>>>> >>>>>> David Brown writes: >>>>>> [...] >>>>>> >>>>>>> Being able to accept $ in identifiers is a convenient extension. >>>>>> >>>>>> Quibble: $ in identifiers is not an extension as specified in section 4 >>>>>> of the standard. Starting in C99, the set of characters accepted in >>>>>> identifiers is implementation-defined. (I'm not sure what difference >>>>>> that makes.) >>>>> >>>>> On further thought, there is a significant difference. >>>>> >>>>> An implementation that supports $ in identifiers via an via the >>>>> "other implementation-defined characters" wording in the syntax >>>>> of an identifier can accept foo$bar as an identifier without >>>>> issuing a diagnostic. If it's an extension as defined in section 4 >>>>> (Conformance) of the standard, it can accept foo$bar but it must >>>>> still issue a diagnostic (presumably a non-fatal warning). >>>> >>>> I'm not sure that's right. Section 5.1.1.3 paragraph 1 says >>>> >>>> A conforming implementation shall produce at least one >>>> diagnostic message (identified in an implementation-defined >>>> manner) if a preprocessing translation unit or translation >>>> unit contains a violation of any syntax rule or constraint, >>>> even if the behavior is also explicitly specified as >>>> undefined or implementation-defined. >>>> >>>> Note the last clause: "even if the behavior is also explicitly >>>> specified as undefined or implementation-defined." This clause >>>> suggests that accepting $ as one of the implementation-defined >>>> characters still warrants a diagnostic. >>> >>> Ah, but the "implementation-defined characters" are part of the >>> syntax. [...] >> >> I know that. Since the sentence in 5.1.1.3 p1 specifically calls >> out cases that are explicitly implementation-defined, it should >> take precedence. There is a violation of a syntax rule; any >> argument that there isn't has to rely on implementation-defined >> behavior, and thus a diagnostic is required. > > I disagree. A diagnostic is required if: > - A syntax rule or constraint is violated; or > - A syntax rule or constraint is violated and the behavior is explicitly > undefined; or > - A syntax rule or constraint is violated and the behavior is > explicitly implementation-defined. > > (The last two cases are arguably covered by the first.) > > In this case, no syntax rule is violated, so none of the cases apply, > so no diagnostic is required. > > The syntax rule is not violated because the implementation-defined > character is part of the syntax rule. (See the parent article where I > quoted the syntax rule, or see the standard.) It's not the *behavior* > that's implementation defined, it's the *syntax rule*. > > I do find it a bit odd, and potentially inconvenient, that one > implementation can quietly accept foo$bar as an identifier and another > can reject it as a syntax error, but that's what the standard says. I appreciate your comments and the thoughtfulness of the message. I think there is more to be said in this discussion but I have decided not to pursue it for now.