Path: csiph.com!optima2.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!usenet.stanford.edu!not-for-mail
From: isabella parakiss <izaberina@gmail.com>
Newsgroups: gnu.bash.bug
Subject: Re: bash-4.3: casemod word expansions broken with UTF-8
Date: Tue, 17 Nov 2015 01:28:45 +0100
Lines: 32
Approved: bug-bash@gnu.org
Message-ID: <mailman.25.1447720378.31583.bug-bash@gnu.org>
References: <22088.36043.764500.752406@a1i15.kph.uni-mainz.de>
NNTP-Posting-Host: lists.gnu.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Cc: bug-bash@gnu.org
To: Ulrich Mueller <ulm@gentoo.org>
Envelope-to: bug-bash@gnu.org
In-Reply-To: <22088.36043.764500.752406@a1i15.kph.uni-mainz.de>
Precedence: list
Xref: csiph.com gnu.bash.bug:11892

On 11/15/15, Ulrich Mueller <ulm@gentoo.org> wrote:
> Description:
> 	In an UTF-8 locale like en_US.UTF-8, the case-modifying
> 	parameter expansions sometimes return invalid UTF-8 encodings.
>
> 	This seems to happen when the UTF-8 byte sequences that are
> 	encoding upper and lower case have different lengths.
>
> Repeat-By:
> 	$ LC_ALL=en_US.UTF-8
> 	$ x=$'\xc4\xb1' # LATIN SMALL LETTER DOTLESS I
> 	$ echo -n "${x^}" | od -t x1
> 	0000000 49 b1
> 	0000002
>
> 	This should have output "49" for "I" only. The "b1" is illegal
> 	as the first byte of an UTF-8 sequence.
>
> 	$ x=$'\xe1\xba\x9e' # LATIN CAPITAL LETTER SHARP S
> 	$ echo -n "${x,}" | od -t x1
> 	0000000 c3 9f 9e
> 	0000003
>
> 	This should have output "c3 9f" (for "sharp s") only.
>

Both examples should work as expected in 4.4-beta.


---
xoxo iza