Path: csiph.com!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch Newsgroups: comp.lang.c Subject: Re: Programming exercise/challenge Date: Tue, 29 Dec 2020 19:18:44 -0800 Organization: A noiseless patient Spider Lines: 133 Message-ID: <86v9ckugkr.fsf@linuxsc.com> References: <86wnxwkyol.fsf@linuxsc.com> <871rg2rffu.fsf@bsb.me.uk> <86v9dehts2.fsf@linuxsc.com> <87360hq0si.fsf@bsb.me.uk> <1bpzH.151400$zz79.48736@fx17.ams4> <86lfe779k6.fsf@linuxsc.com> <865z4ryphr.fsf@linuxsc.com> <877dp79cl9.fsf@nosuchdomain.example.com> <8635zrxx30.fsf@linuxsc.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: reader02.eternal-september.org; posting-host="bad9f3398650fdf0e991b5310f5ac4d7"; logging-data="15117"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18VG66XPSCuI5At3zpd+f38eYvkxwFPnEc=" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:OFCCHmBbjOzAr27ktdYhU4CvD/M= sha1:YYtPI8kksS7kWOubXAOJdx1UsWs= Xref: csiph.com comp.lang.c:157877 Richard Damon writes: > On 12/27/20 7:17 AM, Tim Rentsch wrote: > >> Keith Thompson writes: >> >>> Tim Rentsch writes: >>> >>>> Richard Damon writes: >>>> >>>>> On 12/9/20 1:55 AM, Tim Rentsch wrote: >>>>> >>>>>> Bart writes: >>>>>> [...] >>>>>> >>>>>>> The spec did say to make your own decisions on corner cases. >>>>>> >>>>>> Corner cases are meant to be only for input that the C >>>>>> standard specifies as undefined behavior. >>>>> >>>>> or implementation defined or unspecified behavior, like the >>>>> \ case. >>>> >>>> No, I meant what I said. Furthermore any compiler that >>>> accepts \ as a line continuation is not >>>> conforming, as I have explained else-thread. >>> >>> I thought I had seen (and perhaps even made) an argument that phase 1: >>> >>> Physical source file multibyte characters are mapped, in an >>> implementation-defined manner, to the source character set >>> (introducing new-line characters for end-of-line indicators) if >>> necessary. >>> >>> could include removing trailing spaces. I admit it's a bit of a >>> stretch of the meaning of "mapped". >> >> There is no way to make that work. Let me call the two kinds >> of spaces [PSF] and [SCS]. If we have a physical >> source file with a line >> >> int[PSF]x;[PSF][PSF][PSF] >> >> presumably you would want that to map to >> >> int[SCS]x;[SCS] >> >> which means [PSF] would be a single-byte character and >> also part of a non-single-byte multi-byte character. It can't >> be both. > > Who says that is prohibited at a system level? Note that this sort of > stuff DOES happen in some character sets, that you need to use a bit of > look ahead to decode what a character means. Yes, it wouldn't be surprising to see such a thing in cases where for example the unadorned character is an 'e' and the adorned character is an 'e' with an accent. On the other hand, it's a safe bet that no existing character set has a multi-byte character that is simply a redundant representation of a single-byte character, or has unbounded lookahead. The point of mapping physical source multi-byte characters is to conform to an externally chosen representation, not to let a C implementation transmogrify the input according to its whims. Short summary: probably technically within the letter of the C standard, but surely not the intended meaning. > Note also, that the issue does NOT exist at the level of the Source > Character Set, as by the point we get to that, the spaces before the > newline have been removed, so the source character set does not have > that issue. The source /character set/ certainly does have the problem. A bizarro mapping can eliminate the possibility of a particular input during stage 1, but the source character set still has the ability to represent the undesired inputs. >>> Aside from what the standard actually says, I certainly think it's >>> more convenient to be able to ignore spaces at the end of a line >>> following a \ character. Treating backslash+space at the end of a >>> line differently than backslash at the end of a line is awkward, >>> even if it's conforming. I'd like to see a modification to phase 1 >>> saying that white space between \ and the end of a line is >>> discarded. >> >> I oppose such a change. It's a needless complication and an >> unnecessary incompatibility. I have compiled probably tens of >> millions of lines, if not hundreds of millions of lines, of C >> code, and I don't remember ever seeing this problem except in >> cases constructed specifically to test this rule. Moreover if >> someone wants to guard against end-of-line spaces it's almost >> trivial to do that with a single grep or sed command. Or just >> compile with gcc, which will give a diagnostic in the particular >> case of spaces/tabs between \ and newline. Any change to the C >> standard, and especially any change that causes incompatibility >> between different versions, should yield substantial benefits to >> justify its introduction. The case we're talking about here >> occurs so rarely that it is nowhere close to reaching the bar. > > Maybe I am just 'luckier' than you, as I HAVE seen cases in the wild > where it made a difference. Code was copy and pasted from an article and > cleaned up. Results was much of the code had trailing spaces, and the > lines that had multi-line macros just didn't compile right. > > The person who did this was TOTALLY puzzled by the error messages, > because in the editor it was clear that this was a continuation line, > but only with careful operation of the editor could the space after the > \ be detected. Even so, the ROI is very close to zero. It isn't like this happens every day, or even every year; you get puzzled once, and after it happens the makefiles are fixed so it doesn't happen again. Done. > As far as I know, the only way valid code would be broken with this > would be a line oriented comment that ends with a \ followed by spaces > then the next line is not a comment. (the most likely cause would be a > comment ending with a Linux path name to a directory, especially to root #define BACKSLANT \ (with spaces after the \) #define SOMETHING ... > I am not sure that such a line would be considered good practice anyway, > as if you ever DID one of the cleanups you propose, it would break that > line. Oh nonsense. Using grep would flag the line but not change anything. An intentional white-space-after-backslant could be done using a TAB character rather than a space, assuming one wanted to do that. Or a sed command could change spaces after a backslant into '\/*!*/' and then a subsequent grep could look for that pattern. etc...