Groups > gnu.bash.bug > #11888 > unrolled thread

Re: bash-4.3: casemod word expansions broken with UTF-8

Started by	Ulrich Mueller <ulm@gentoo.org>
First post	2015-11-15 17:56 +0100
Last post	2015-11-15 17:56 +0100
Articles	1 — 1 participant

Back to article view | Back to gnu.bash.bug

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: bash-4.3: casemod word expansions broken with UTF-8 Ulrich Mueller <ulm@gentoo.org> - 2015-11-15 17:56 +0100

#11888 — Re: bash-4.3: casemod word expansions broken with UTF-8

From	Ulrich Mueller <ulm@gentoo.org>
Date	2015-11-15 17:56 +0100
Subject	Re: bash-4.3: casemod word expansions broken with UTF-8
Message-ID	<mailman.1.1447720277.31583.bug-bash@gnu.org>

>>>>> On Sun, 15 Nov 2015, Ulrich Mueller wrote:

> Description:
> 	In an UTF-8 locale like en_US.UTF-8, the case-modifying
> 	parameter expansions sometimes return invalid UTF-8 encodings.

> 	This seems to happen when the UTF-8 byte sequences that are
> 	encoding upper and lower case have different lengths.

Even more interesting effects happen if the string contains a
character whose UTF-8 encoding gets *longer* after case conversion,
because then the terminating null byte will be overwritten.

For example, U+0250 "LATIN SMALL LETTER TURNED A" is represented by a
two byte sequence in UTF-8, while its uppercase equivalent U+2C6F
needs three bytes:

	$ LC_ALL=en_US.UTF-8
	$ x=$'aaaaa\xc9\x90'
	$ y=${x^^}
	$ echo -n "$y" | od -t x1
	0000000 41 41 41 41 41 e2 90 af 6f 6d 65 2f 75 6c 6d
	0000017

y contains some trailing garbage (could be a part of $HOME or $PWD).

[toc] | [standalone]

csiph-web

Re: bash-4.3: casemod word expansions broken with UTF-8

Contents

#11888 — Re: bash-4.3: casemod word expansions broken with UTF-8