Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > gnu.bash.bug > #11892

Re: bash-4.3: casemod word expansions broken with UTF-8

From isabella parakiss <izaberina@gmail.com>
Newsgroups gnu.bash.bug
Subject Re: bash-4.3: casemod word expansions broken with UTF-8
Date 2015-11-17 01:28 +0100
Message-ID <mailman.25.1447720378.31583.bug-bash@gnu.org> (permalink)
References <22088.36043.764500.752406@a1i15.kph.uni-mainz.de>

Show all headers | View raw


On 11/15/15, Ulrich Mueller <ulm@gentoo.org> wrote:
> Description:
> 	In an UTF-8 locale like en_US.UTF-8, the case-modifying
> 	parameter expansions sometimes return invalid UTF-8 encodings.
>
> 	This seems to happen when the UTF-8 byte sequences that are
> 	encoding upper and lower case have different lengths.
>
> Repeat-By:
> 	$ LC_ALL=en_US.UTF-8
> 	$ x=$'\xc4\xb1' # LATIN SMALL LETTER DOTLESS I
> 	$ echo -n "${x^}" | od -t x1
> 	0000000 49 b1
> 	0000002
>
> 	This should have output "49" for "I" only. The "b1" is illegal
> 	as the first byte of an UTF-8 sequence.
>
> 	$ x=$'\xe1\xba\x9e' # LATIN CAPITAL LETTER SHARP S
> 	$ echo -n "${x,}" | od -t x1
> 	0000000 c3 9f 9e
> 	0000003
>
> 	This should have output "c3 9f" (for "sharp s") only.
>

Both examples should work as expected in 4.4-beta.


---
xoxo iza

Back to gnu.bash.bug | Previous | Next | Find similar


Thread

Re: bash-4.3: casemod word expansions broken with UTF-8 isabella parakiss <izaberina@gmail.com> - 2015-11-17 01:28 +0100

csiph-web