Path: csiph.com!eternal-september.org!feeder.eternal-september.org!mx02.eternal-september.org!.POSTED!not-for-mail From: Keith Thompson Newsgroups: comp.lang.c Subject: Re: // comments and \ Date: Tue, 01 Dec 2015 11:07:58 -0800 Organization: None to speak of Lines: 70 Message-ID: References: <87fuzmdznb.fsf@bsb.me.uk> <63922dcf-c5cb-4aae-a271-d1fe0b450603@googlegroups.com> <87y4deccp1.fsf@bsb.me.uk> <463ed1ae-80d5-4718-8cb9-6e0d95e3a005@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: mx02.eternal-september.org; posting-host="945944de09706c9b4e29b53c9d2efdc2"; logging-data="7491"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/VlM1TFYS58yNWvVGSOecu" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) Cancel-Lock: sha1:bV4Uh4wlGbQIF3LGiQQSXRoOs1c= sha1:noKN4LacRJMM63OdBdcj0zPkp6k= Xref: csiph.com comp.lang.c:77547 supercat@casperkitty.com writes: > On Tuesday, December 1, 2015 at 11:15:51 AM UTC-6, Ben Bacarisse wrote: >> By the way, handling them before anything else sounds unwise to me since >> you'd interfere with trigraphs and the implementation-defined >> translation to the source character set. I think you mean they would be >> done in a new translation phase between 1 and 2. Even so, some existing >> conforming code would be broken. > > Actually, I meant before phase 1. A stand-alone utility which feeds each > line through a simple state machine and echoes everything as it goes along > unless it reaches a "strip remainder" state. [...] To be clear, you're suggesting handling // comments before phase 1. Phase 1 is: Physical source file multibyte characters are mapped, in an implementation-defined manner, to the source character set (introducing new-line characters for end-of-line indicators) if necessary. Trigraph sequences are replaced by corresponding single-character internal representations. Before phase 1, the input isn't even necessarily in the source character set. String literals and character constants aren't identified until phase 3, so to handle: puts("// This is not a comment"); you'd have to recognize string literals and character constants in phases 0 *and* 3. Phase 0 would also have to recognize trigraphs. Would your phase 0 handle both "//" and "/*...*/" comments? It would at least have to *recognize* "/*...*/" comments: /* This // is a valid comment */ puts("This is not a comment"); Here's an alternative suggestion to solve the (IMHO minor) problem of line-spliced "//" comments: Currently, in phase 2 a backslash immediately followed by a newline is deleted. Proposal: Do the same thing for a backslash followed by whitespace followed by a newline. Proposal: Each backslash-whitespace-newline sequence, rather than simply being deleted, is replaced by a special marker (which could be implemented by the compiler remembering where the deletion occurred). In phase 3, any occurrence of that marker within a // comment is a syntax error. The marker may appear in the middle of a token, and is otherwise ignored in this and all following phases. The changes in behavior from the current standard would be: 1. Invisible trailing whitespace doesn't affect line splicing (gcc and clang already behave this way). 2. Line splicing within a // comment is an unambiguous error. It's likely such a change would affect some existing code -- but most such code is likely incorrect anyway. The most common case is probably Windows-style file paths at the end of // comments. -- Keith Thompson (The_Other_Keith) kst-u@mib.org Working, but not speaking, for JetHead Development, Inc. "We must do something. This is something. Therefore, we must do this." -- Antony Jay and Jonathan Lynn, "Yes Minister"