Path: csiph.com!eternal-september.org!feeder.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail From: Keith Thompson Newsgroups: comp.lang.c Subject: Re: "C's Biggest Mistake" Date: Sat, 07 Apr 2018 13:55:51 -0700 Organization: None to speak of Lines: 51 Message-ID: References: <9c7013a2-17bd-4f94-a378-1c45151d0d45@googlegroups.com> <2%LxC.241942$Oy5.103980@fx11.am4> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: reader02.eternal-september.org; posting-host="a514ab945b6e8c6b28a176a69904e725"; logging-data="14597"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/eZwh4/kuDmIQbRDhU38G2" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) Cancel-Lock: sha1:bMStHxFFgKxaLikGcmGsgxw6he4= sha1:shJfQ+SGMrfgyFRJhfSLajVfH/w= Xref: csiph.com comp.lang.c:128905 bartc writes: > On 07/04/2018 21:13, Keith Thompson wrote: [...] >> Both your version and bartc's example (in his own language) using >> ranges are non-portable. The language guarantees that '0'..'9' are >> contiguous, but it makes no such guarantee for 'A'..'Z' or 'a'..'z'. >> And in fact in EBCDIC there are extra characters between 'A' and >> 'Z', and between 'a' and 'z'. > > The Linux kernel sources include these lines (randomly combined): [snip] The Linux kernel sources are not portable code. That's fine, they don't have to be. [...] > If Linux programmers don't care about EBCDIC then neither do I. > > Same story with Tiny C sources. And with Seed 7 sources. And CPython. > And numerous other programs. And all mine for over 3 decades > > It seems nobody cares about EBCDIC (except perhaps you). (Isn't everyone > using Unicode now anyway? The first 128 Unicode characters, and the > first 128 codes in UTF8, are ASCII.) My point is that the C standard is designed to allow for EBCDIC. There have been, and there still are, C implementations for EBCDIC-based systems. Nobody says you have to care about EBCDIC. And since you mentioned Unicode, `c >= 'a' && c <= 'z'` does not tell you whether c is a lower case letter unless you already know that c is restricted to something like the ASCII character set, excluding accented and non-Latin letters. That may be part of the reason C doesn't have case ranges, though that's just speculation on my part. See also the standard islower() and isupper() functions, which have locale-specific behavior. Again, if *you* want to assume ASCII, you're free to do so. I merely said your code is non-portable. -- Keith Thompson (The_Other_Keith) kst-u@mib.org Working, but not speaking, for JetHead Development, Inc. "We must do something. This is something. Therefore, we must do this." -- Antony Jay and Jonathan Lynn, "Yes Minister"