Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > gnu.bash.bug > #14763

printf '\uFEFF' outputs invalid UTF-8 on Windows

Path csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail
From Kalle Olavi Niemitalo <kon@iki.fi>
Newsgroups gnu.bash.bug
Subject printf '\uFEFF' outputs invalid UTF-8 on Windows
Date Mon, 05 Nov 2018 19:09:06 +0200
Lines 43
Approved bug-bash@gnu.org
Message-ID <mailman.3504.1541441576.1284.bug-bash@gnu.org> (permalink)
NNTP-Posting-Host lists.gnu.org
Mime-Version 1.0
Content-Type text/plain; charset=us-ascii
X-Trace usenet.stanford.edu 1541441576 31586 208.118.235.17 (5 Nov 2018 18:12:56 GMT)
X-Complaints-To action@cs.stanford.edu
To bug-bash@gnu.org
Envelope-to bug-bash@gnu.org
User-Agent Gnus/5.110007 (No Gnus v0.7) Emacs/23.0.51 (gnu/linux)
X-Accept-Language fi;q=1.0, en;q=0.9, sv;q=0.5, de;q=0.1
X-detected-operating-system by eggs.gnu.org: GNU/Linux 3.x
X-Received-From 62.142.5.110
X-Mailman-Approved-At Mon, 05 Nov 2018 13:12:54 -0500
X-BeenThere bug-bash@gnu.org
X-Mailman-Version 2.1.21
Precedence list
List-Id Bug reports for the GNU Bourne Again SHell <bug-bash.gnu.org>
List-Unsubscribe <https://lists.gnu.org/mailman/options/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=unsubscribe>
List-Archive <http://lists.gnu.org/archive/html/bug-bash/>
List-Post <mailto:bug-bash@gnu.org>
List-Help <mailto:bug-bash-request@gnu.org?subject=help>
List-Subscribe <https://lists.gnu.org/mailman/listinfo/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=subscribe>
Xref csiph.com gnu.bash.bug:14763

Show key headers only | View raw


Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: msys
Compiler: gcc
Compilation CFLAGS:  -DPROGRAM='bash.exe' -DCONF_HOSTTYPE='x86_64' 
-DCONF_OSTYPE='msys' -DCONF_MACHTYPE='x86_64-pc-msys' -DCONF_VENDOR='pc' 
-DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H 
-DRECYCLES_PIDS   -I.  -I. -I./include -I./lib  -DWORDEXP_OPTION 
-Wno-discarded-qualifiers -march=x86-64 -mtune=generic -O2 -pipe 
-Wno-parentheses -Wno-format-security -D_STATIC_BUILD -g
uname output: MINGW64_NT-6.1 fjkallen 2.10.0(0.325/5/3) 2018-07-25 13:06 
x86_64 Msys
Machine Type: x86_64-pc-msys

Bash Version: 4.4
Patch Level: 19
Release Status: release

Description:
        The builtin printf '\uFEFF' outputs ED 9F BF ED BB BF in a
        UTF-8 locale on Microsoft Windows, where sizeof(wchar_t) == 2.
        It should output EF BB BF, like printf (GNU coreutils) 8.30
        does.

        The incorrect output ED 9F BF ED BB BF is a UTF-8-like encoding
        of U+D7FF U+DEFF, which looks somewhat like a UTF-16 surrogate
        pair but the U+D7FF character is not in the surrogate range.

Repeat-By:
        Install Git for Windows 2.19.1, on Windows 7 SP1.
        Start "Git Bash" from the Start menu.
        Run the command:
          env --ignore-environment LANG=en_US.UTF-8 \
          /usr/bin/bash --noprofile -c 'builtin printf "\ufeff"' \
          | od -t x1

Fix:
        In lib/sh/unicode.c, change u32toutf16 to treat characters in the
        U+E000...U+FFFF range just like the U+0000...U+D7FF range, i.e.
        copy them unchanged to the output and not make a surrogate pair.
        I did not test that change but the function clearly has a bug and
        it matches the symptoms perfectly.

Back to gnu.bash.bug | Previous | Next | Find similar | Unroll thread


Thread

printf '\uFEFF' outputs invalid UTF-8 on Windows Kalle Olavi Niemitalo <kon@iki.fi> - 2018-11-05 19:09 +0200

csiph-web