Path: csiph.com!weretis.net!feeder9.news.weretis.net!panix!.POSTED.panix5.panix.com!qz!not-for-mail
From: Eli the Bearded <*@eli.users.panix.com>
Newsgroups: comp.os.linux.misc,alt.folklore.computers
Subject: Re: ISO 8859-1 ("Latin 1") (was: Recent history of vi)
Date: Thu, 20 Nov 2025 02:09:45 -0000 (UTC)
Organization: Some absurd concept
Message-ID: <eli$2511192048@qaz.wtf>
References: <CtydnVqzjrtfd5P0nZ2dnZfqn_adnZ2d@giganews.com> <10fig07$8oe$1@news.misty.com> <akjvulxcnk.ln2@Telcontar.valinor> <AABpHczoTjwAAAKA.A3.flnews@WStation5.stz-e.de>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Injection-Date: Thu, 20 Nov 2025 02:09:45 -0000 (UTC)
Injection-Info: reader2.panix.com; posting-host="panix5.panix.com:166.84.1.5"; logging-data="17539"; mail-complaints-to="abuse@panix.com"
User-Agent: Vectrex rn 2.1 (beta)
X-Liz: It's actually happened, the entire Internet is a massive game of Redcode
X-Motto: "Erosion of rights never seems to reverse itself." -- kenny@panix
X-US-Congress: Moronic Fucks.
X-Attribution: EtB
XFrom: is a real address
Encrypted: double rot-13
Xref: csiph.com comp.os.linux.misc:77749 alt.folklore.computers:232243

In comp.os.linux.misc, Michael Bäuerle <michael.baeuerle@gmx.net> wrote:
> ISO 8859-1 ("Latin 1") is a special case. No mapping table is required
> for conversion to Unicode, because all ISO 8859-1 codepoints have 1:1
> mappings to Unicode codepoints. This means any UTF can be directly
> applied to ISO 8859-1 codepoints.
...
> The MIME declaration "ISO-8859-1" includes CO and C1 control characters.

Be technical. The MIME charset ISO-8859-1 includes the CO and C1
control characters and has all of its characters at the same codepoints
as Unicode but the character encoding is different from all Unicode
character encodings.

"charset" is a very specific term from MIME and it conflates character
set with character encoding. In a world were all characters fit in
eight bits, that's a very easy mistake to make, but since the MIME
designers were aware of (and specifically working to accomodate) worlds
where 8-bit encodings might not be used, that's was a poor choice.

charset="utf-8" is an encoding using variable lengths for all of the
codepoints in the Unicode character set. In UTF-8, codepoints that
are under 128 are encoded in a single octet with the highbit unset. All
codepoints over 127 are encoded in multiple octets all with the highbit
set.

charset="utf-7" is an encoding using variable lengths for many of the
codepoints in the Unicode character set. In UTF-7 some characters are
left as is, some characters (those above codepoint 65535) cannot be
represented, and many characters are multibyte sequences. But
critically, none of the bytes have the highbit set.

charset="utf-ebcdic" is an encoding using variable lengths for all of
the codepoints in the Unicode character set. In UTF-EBCDIC an encoding
very similar to UTF-8 encodes Unicode codepoints five bits at a time
into EBCDIC. Codepoints that are under 160 are encoded in a single octet 
and codepoints above 159 are encoded in multiple octets all with the
highbit set. Only the C1 control chacters are native highbit set EBCDIC.

Elijah
------
here is the map to the map you want