Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > gnu.bash.bug > #14763
| Path | csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail |
|---|---|
| From | Kalle Olavi Niemitalo <kon@iki.fi> |
| Newsgroups | gnu.bash.bug |
| Subject | printf '\uFEFF' outputs invalid UTF-8 on Windows |
| Date | Mon, 05 Nov 2018 19:09:06 +0200 |
| Lines | 43 |
| Approved | bug-bash@gnu.org |
| Message-ID | <mailman.3504.1541441576.1284.bug-bash@gnu.org> (permalink) |
| NNTP-Posting-Host | lists.gnu.org |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=us-ascii |
| X-Trace | usenet.stanford.edu 1541441576 31586 208.118.235.17 (5 Nov 2018 18:12:56 GMT) |
| X-Complaints-To | action@cs.stanford.edu |
| To | bug-bash@gnu.org |
| Envelope-to | bug-bash@gnu.org |
| User-Agent | Gnus/5.110007 (No Gnus v0.7) Emacs/23.0.51 (gnu/linux) |
| X-Accept-Language | fi;q=1.0, en;q=0.9, sv;q=0.5, de;q=0.1 |
| X-detected-operating-system | by eggs.gnu.org: GNU/Linux 3.x |
| X-Received-From | 62.142.5.110 |
| X-Mailman-Approved-At | Mon, 05 Nov 2018 13:12:54 -0500 |
| X-BeenThere | bug-bash@gnu.org |
| X-Mailman-Version | 2.1.21 |
| Precedence | list |
| List-Id | Bug reports for the GNU Bourne Again SHell <bug-bash.gnu.org> |
| List-Unsubscribe | <https://lists.gnu.org/mailman/options/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=unsubscribe> |
| List-Archive | <http://lists.gnu.org/archive/html/bug-bash/> |
| List-Post | <mailto:bug-bash@gnu.org> |
| List-Help | <mailto:bug-bash-request@gnu.org?subject=help> |
| List-Subscribe | <https://lists.gnu.org/mailman/listinfo/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=subscribe> |
| Xref | csiph.com gnu.bash.bug:14763 |
Show key headers only | View raw
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: msys
Compiler: gcc
Compilation CFLAGS: -DPROGRAM='bash.exe' -DCONF_HOSTTYPE='x86_64'
-DCONF_OSTYPE='msys' -DCONF_MACHTYPE='x86_64-pc-msys' -DCONF_VENDOR='pc'
-DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H
-DRECYCLES_PIDS -I. -I. -I./include -I./lib -DWORDEXP_OPTION
-Wno-discarded-qualifiers -march=x86-64 -mtune=generic -O2 -pipe
-Wno-parentheses -Wno-format-security -D_STATIC_BUILD -g
uname output: MINGW64_NT-6.1 fjkallen 2.10.0(0.325/5/3) 2018-07-25 13:06
x86_64 Msys
Machine Type: x86_64-pc-msys
Bash Version: 4.4
Patch Level: 19
Release Status: release
Description:
The builtin printf '\uFEFF' outputs ED 9F BF ED BB BF in a
UTF-8 locale on Microsoft Windows, where sizeof(wchar_t) == 2.
It should output EF BB BF, like printf (GNU coreutils) 8.30
does.
The incorrect output ED 9F BF ED BB BF is a UTF-8-like encoding
of U+D7FF U+DEFF, which looks somewhat like a UTF-16 surrogate
pair but the U+D7FF character is not in the surrogate range.
Repeat-By:
Install Git for Windows 2.19.1, on Windows 7 SP1.
Start "Git Bash" from the Start menu.
Run the command:
env --ignore-environment LANG=en_US.UTF-8 \
/usr/bin/bash --noprofile -c 'builtin printf "\ufeff"' \
| od -t x1
Fix:
In lib/sh/unicode.c, change u32toutf16 to treat characters in the
U+E000...U+FFFF range just like the U+0000...U+D7FF range, i.e.
copy them unchanged to the output and not make a surrogate pair.
I did not test that change but the function clearly has a bug and
it matches the symptoms perfectly.
Back to gnu.bash.bug | Previous | Next | Find similar | Unroll thread
printf '\uFEFF' outputs invalid UTF-8 on Windows Kalle Olavi Niemitalo <kon@iki.fi> - 2018-11-05 19:09 +0200
csiph-web