Path: csiph.com!optima2.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!usenet.stanford.edu!not-for-mail From: isabella parakiss Newsgroups: gnu.bash.bug Subject: Re: bash-4.3: casemod word expansions broken with UTF-8 Date: Tue, 17 Nov 2015 01:28:45 +0100 Lines: 32 Approved: bug-bash@gnu.org Message-ID: References: <22088.36043.764500.752406@a1i15.kph.uni-mainz.de> NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: usenet.stanford.edu 1447720379 6231 208.118.235.17 (17 Nov 2015 00:32:59 GMT) X-Complaints-To: action@cs.stanford.edu Cc: bug-bash@gnu.org To: Ulrich Mueller Envelope-to: bug-bash@gnu.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=1BG9YiZA0a8tXJa5QdaDk8WEk28NKZg144VCXK+WntM=; b=nuDC69uZHlF3vYfIUupeKgcJpFRfJUBaD9ncpItb2eYcO04UOpvN3kQ9NkjzK+uKlc tePD/WtRFy1mf+EpkFbUrRDKo3Tg+5yl7TeD8mqnCltdyIBVpGu3UcANzjtEXiy7QCfc KOIzF1Bf6Av0q5udHtKdoe2aBpoIJlhLU7rfrW8GeG5QAj6j0bl2oVmlTfpVYHL9EBfX 7XnqcK12JVbo3SKGkd/nYZEFodDJVE8DIRwix2y9Gsu0p29LpawlTXJoL/nR+qi51/B4 fMgWBpbz5uG6PtNRzL1tnVcY721iGvXOSsMTNRkVvymfetDbiHWZ22P2sed6vB2AtqwY Mwbg== X-Received: by 10.107.10.233 with SMTP id 102mr35318244iok.31.1447720126052; Mon, 16 Nov 2015 16:28:46 -0800 (PST) In-Reply-To: <22088.36043.764500.752406@a1i15.kph.uni-mainz.de> X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2607:f8b0:4001:c06::22b X-BeenThere: bug-bash@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Bug reports for the GNU Bourne Again SHell List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com gnu.bash.bug:11892 On 11/15/15, Ulrich Mueller wrote: > Description: > In an UTF-8 locale like en_US.UTF-8, the case-modifying > parameter expansions sometimes return invalid UTF-8 encodings. > > This seems to happen when the UTF-8 byte sequences that are > encoding upper and lower case have different lengths. > > Repeat-By: > $ LC_ALL=en_US.UTF-8 > $ x=$'\xc4\xb1' # LATIN SMALL LETTER DOTLESS I > $ echo -n "${x^}" | od -t x1 > 0000000 49 b1 > 0000002 > > This should have output "49" for "I" only. The "b1" is illegal > as the first byte of an UTF-8 sequence. > > $ x=$'\xe1\xba\x9e' # LATIN CAPITAL LETTER SHARP S > $ echo -n "${x,}" | od -t x1 > 0000000 c3 9f 9e > 0000003 > > This should have output "c3 9f" (for "sharp s") only. > Both examples should work as expected in 4.4-beta. --- xoxo iza