Path: csiph.com!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: Keith Thompson <Keith.S.Thompson+u@gmail.com>
Newsgroups: comp.lang.c
Subject: Re: Rationale for aligning data on even bytes in a Unix shell file?
Date: Thu, 08 May 2025 14:13:43 -0700
Organization: None to speak of
Lines: 41
Message-ID: <87v7qaerg8.fsf@nosuchdomain.example.com>
References: <vuih43$2agfa$1@dont-email.me> <vunbgo$2q5u8$1@dont-email.me> <vunbjg$2q72n$1@raubtier-asyl.eternal-september.org> <vund1f$2rh3j$1@dont-email.me> <vungko$2uoa2$1@raubtier-asyl.eternal-september.org> <X9MPP.1383458$f81.819466@fx48.iad> <vuobri$3o38b$1@raubtier-asyl.eternal-september.org> <XtOPP.2986761$t84d.2537581@fx11.iad> <vuohq9$3tlhf$1@raubtier-asyl.eternal-september.org> <vuoig5$3ub4j$1@dont-email.me> <vuorpf$6tnn$1@raubtier-asyl.eternal-september.org> <vup2nt$bi1k$2@dont-email.me> <vupofl$13pg2$2@raubtier-asyl.eternal-september.org> <vuprce$15sqo$2@dont-email.me> <vvd6n5$353gs$1@raubtier-asyl.eternal-september.org> <vvfbnj$ulpc$1@dont-email.me> <vvflec$11b72$1@dont-email.me> <20250507202430.00005bb9@yahoo.com> <vvh8qg$1ha26$2@dont-email.me> <vvi3k6$1o09d$1@dont-email.me> <vvj3qe$246ff$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 08 May 2025 23:13:45 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="1ef4b2b5ec94e94b8636022ae34fa37c"; logging-data="2264471"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19TJ7FNEmRc0oG34bCoD1R2"
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:BSwdb/q185u7NzUC0PZ+Zw33TAk= sha1:90V4/2Lnuuno3Fv7WhaP2lA7VPw=
Xref: csiph.com comp.lang.c:393263

BGB <cr88192@gmail.com> writes:
> On 5/8/2025 6:13 AM, Janis Papanagnou wrote:
>> On 08.05.2025 05:30, BGB wrote:
>>> [...]
>>>
>>> Though, even for the Latin alphabet, once one goes much outside of ASCII
>>> and Latin-1, it gets messy.
>> I noticed that in several places you were referring to
>> Latin-1. Since
>> decades that has been replaced by the Latin-9 (ISO 8859-15) character
>> set[*] for practical reasons ('€' sign, for example).
>> Why is your focus still on the old Latin-1 (ISO 8859-1) character
>> set?
>> Janis, just curious
>> [*] Unless Unicode and its encodings are used.
>> 
>
> U+00A0..U+00FF are designated as Latin-1 in Unicode.

I don't think that's accurate.  Do you have a reference for that?
It's true that those characters have the same names in Unicode
as in Latin-1.  Though the Wikipedia article says that the ranges
0x00..0x1F and 0x7F..0x9F are *undefined*.  (That doesn't match my
recollection; I thought they were defined as control characters.)

In any case, Latin-1 and Latin-9 treat those ranges in the same way.
Both can be seen as encodings for small subsets of Unicode.

[...]

> CP-1252, is the dominant remaining ASCII character set in use, is
> based on Latin-1, with a few characters from Latin-15 shoved into the
> places where control codes previously went.

CP-1252 is not an ASCII character set.  ASCII is a 7-bit character set.
CP-1252 is an 8-bit character set as are the Latin-* sets.  Most 8-bit
sets are *based on* ASCII.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */