Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > gnu.bash.bug > #15591 > unrolled thread
| Started by | István Pásztor <pasztorpisti@gmail.com> |
|---|---|
| First post | 2019-11-10 14:07 +0000 |
| Last post | 2019-11-10 14:07 +0000 |
| Articles | 1 — 1 participant |
Back to article view | Back to gnu.bash.bug
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Minor utf32-to-utf8 bug István Pásztor <pasztorpisti@gmail.com> - 2019-11-10 14:07 +0000
| From | István Pásztor <pasztorpisti@gmail.com> |
|---|---|
| Date | 2019-11-10 14:07 +0000 |
| Subject | Minor utf32-to-utf8 bug |
| Message-ID | <mailman.1182.1573396992.13325.bug-bash@gnu.org> |
[Multipart message — attachments visible in raw view] — view raw
Hi The encoding of six bytes long utf-8 sequences is buggy. Today unicode requires at most 4 bytes long utf-8 sequences but if we handle 5 and 6 too then let's do it the right way. The attached patch was created using a fresh master clone (d894cfd104086ddf68c286e67a5fb2e02eb43b7b). I'm writing a tool (pxargs, an xargs variant) that can accept input strings in shell-quoted format and used the bash manual and source code as references. I haven't compiled the latest bash sources to check the bug but it is likely to affect ANSI-C quoted strings like $'\U7fffffff'. Best Regards, Istvan Pasztor
Back to top | Article view | gnu.bash.bug
csiph-web