Path: csiph.com!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: Keith Thompson Newsgroups: comp.lang.c Subject: Re: Rationale for aligning data on even bytes in a Unix shell file? Date: Thu, 08 May 2025 14:13:43 -0700 Organization: None to speak of Lines: 41 Message-ID: <87v7qaerg8.fsf@nosuchdomain.example.com> References: <20250507202430.00005bb9@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Date: Thu, 08 May 2025 23:13:45 +0200 (CEST) Injection-Info: dont-email.me; posting-host="1ef4b2b5ec94e94b8636022ae34fa37c"; logging-data="2264471"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19TJ7FNEmRc0oG34bCoD1R2" User-Agent: Gnus/5.13 (Gnus v5.13) Cancel-Lock: sha1:BSwdb/q185u7NzUC0PZ+Zw33TAk= sha1:90V4/2Lnuuno3Fv7WhaP2lA7VPw= Xref: csiph.com comp.lang.c:393263 BGB writes: > On 5/8/2025 6:13 AM, Janis Papanagnou wrote: >> On 08.05.2025 05:30, BGB wrote: >>> [...] >>> >>> Though, even for the Latin alphabet, once one goes much outside of ASCII >>> and Latin-1, it gets messy. >> I noticed that in several places you were referring to >> Latin-1. Since >> decades that has been replaced by the Latin-9 (ISO 8859-15) character >> set[*] for practical reasons ('€' sign, for example). >> Why is your focus still on the old Latin-1 (ISO 8859-1) character >> set? >> Janis, just curious >> [*] Unless Unicode and its encodings are used. >> > > U+00A0..U+00FF are designated as Latin-1 in Unicode. I don't think that's accurate. Do you have a reference for that? It's true that those characters have the same names in Unicode as in Latin-1. Though the Wikipedia article says that the ranges 0x00..0x1F and 0x7F..0x9F are *undefined*. (That doesn't match my recollection; I thought they were defined as control characters.) In any case, Latin-1 and Latin-9 treat those ranges in the same way. Both can be seen as encodings for small subsets of Unicode. [...] > CP-1252, is the dominant remaining ASCII character set in use, is > based on Latin-1, with a few characters from Latin-15 shoved into the > places where control codes previously went. CP-1252 is not an ASCII character set. ASCII is a 7-bit character set. CP-1252 is an 8-bit character set as are the Latin-* sets. Most 8-bit sets are *based on* ASCII. -- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com void Void(void) { Void(); } /* The recursive call of the void */