Groups | Search | Server Info | Login | Register
| From | Aidan Kehoe <kehoea@parhasard.net> |
|---|---|
| Newsgroups | comp.emacs |
| Subject | Re: Fixing Garbled Encodings |
| Date | 2026-02-25 13:55 +0000 |
| Message-ID | <87cy1tvsfd.fsf@parhasard.net> (permalink) |
| References | (3 earlier) <slrn10p932m.5h9c.jcb@kotte.inf.ed.ac.uk> <10nj0eh$3g050$1@dont-email.me> <10nja5p$3j6lg$1@dont-email.me> <87ldgiwr02.fsf@parhasard.net> <10nl1du$6bih$1@dont-email.me> |
Ar an ceathrú lá is fiche de mí Feabhra, scríobh Lawrence D’Oliveiro:
> On Tue, 24 Feb 2026 07:16:29 +0000, Aidan Kehoe wrote:
>
> > XEmacs doesn’t have either (no “bytes” object, no differential between
> > unibyte and multibyte) and these problems do not really arise, which is, I
> > think, the better approach.
>
> That seems a bit implausible. Can you confirm that with an experiment?
>
> First, capture the output of the following Python code
>
> s = "“±”"
> print("".join(chr(b) for b in s.encode()))
>
> directly
As Henry comments, that (“directly”) is the catch. The characters that end up
in XEmacs depend on the associated coding system (roughly equal to the MIME
charset) of the process object, which depends on your language environment.
Usually UTF-8 on POSIX these days, of course, but there’s still plenty of
variation about.
As my language environment is currently set up, the coding system for decoding
process output is “undecided”. This does autodetection, and the python3 output
(I don’t have python2 on this machine) is detected correctly as UTF-8.
> into an Emacs buffer. It should look like
>
> â\200\234±â\200\235
>
> except that each backslash-octal sequence is a single character.
It does, but we display control characters between ?\x80 and and ?\x9e in a
separate face with their windows-1252 values, since that is usually what is
intended. Thus it displays as follows (ignoring the face):
“±â€\235
with a corresponding underlying character code for each as follows:
(mapcar #'+ "â±â")
=> (226 128 156 194 177 226 128 157)
> Then try using the decode-coding-string function on that string, and see if
> you get the original text (the value of the s variable in the Python code)
> back.
(decode-coding-string (string 226 128 156 194 177 226 128 157) 'utf-8)
=> "“±”"
Same result with the string itself.
> When I try it in GNU Emacs, I don’t. It just returns the string unchanged.
--
‘As I sat looking up at the Guinness ad, I could never figure out /
How your man stayed up on the surfboard after fourteen pints of stout’
(C. Moore)
Back to comp.emacs | Previous | Next — Previous in thread | Find similar
Fixing Garbled Encodings Lawrence D’Oliveiro <ldo@nz.invalid> - 2026-02-15 05:48 +0000
Re: Fixing Garbled Encodings Julian Bradfield <jcb@inf.ed.ac.uk> - 2026-02-16 10:11 +0000
Re: Fixing Garbled Encodings Lawrence D’Oliveiro <ldo@nz.invalid> - 2026-02-16 20:35 +0000
Re: Fixing Garbled Encodings Lawrence D’Oliveiro <ldo@nz.invalid> - 2026-02-17 03:52 +0000
Re: Fixing Garbled Encodings Julian Bradfield <jcb@inf.ed.ac.uk> - 2026-02-17 15:42 +0000
Re: Fixing Garbled Encodings Lawrence D’Oliveiro <ldo@nz.invalid> - 2026-02-24 01:55 +0000
Re: Fixing Garbled Encodings Lawrence D’Oliveiro <ldo@nz.invalid> - 2026-02-24 04:41 +0000
Re: Fixing Garbled Encodings Aidan Kehoe <kehoea@parhasard.net> - 2026-02-24 07:16 +0000
Re: Fixing Garbled Encodings Lawrence D’Oliveiro <ldo@nz.invalid> - 2026-02-24 20:24 +0000
Re: Fixing Garbled Encodings "Henry S. Thompson" <ht@home.hst.name> - 2026-02-24 23:02 +0000
Re: Fixing Garbled Encodings Lawrence D’Oliveiro <ldo@nz.invalid> - 2026-02-24 23:52 +0000
Re: Fixing Garbled Encodings "Henry S. Thompson" <ht@home.hst.name> - 2026-02-26 15:35 +0000
Re: Fixing Garbled Encodings Aidan Kehoe <kehoea@parhasard.net> - 2026-02-26 21:24 +0000
Re: Fixing Garbled Encodings Aidan Kehoe <kehoea@parhasard.net> - 2026-02-25 06:39 +0000
Re: Fixing Garbled Encodings Aidan Kehoe <kehoea@parhasard.net> - 2026-02-25 13:55 +0000
csiph-web