Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > gnu.bash.bug > #11888
| Path | csiph.com!xmission!news.glorb.com!usenet.stanford.edu!not-for-mail |
|---|---|
| From | Ulrich Mueller <ulm@gentoo.org> |
| Newsgroups | gnu.bash.bug |
| Subject | Re: bash-4.3: casemod word expansions broken with UTF-8 |
| Date | Sun, 15 Nov 2015 17:56:59 +0100 |
| Lines | 26 |
| Approved | bug-bash@gnu.org |
| Message-ID | <mailman.1.1447720277.31583.bug-bash@gnu.org> (permalink) |
| References | <22088.36043.764500.752406@a1i15.kph.uni-mainz.de> |
| NNTP-Posting-Host | lists.gnu.org |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=us-ascii |
| Content-Transfer-Encoding | 7bit |
| X-Trace | usenet.stanford.edu 1447720278 6199 208.118.235.17 (17 Nov 2015 00:31:18 GMT) |
| X-Complaints-To | action@cs.stanford.edu |
| To | bug-bash@gnu.org |
| Envelope-to | bug-bash@gnu.org |
| In-Reply-To | <22088.36043.764500.752406@a1i15.kph.uni-mainz.de> |
| X-Mailer | VM 8.2.0b under 24.3.1 (x86_64-pc-linux-gnu) |
| X-detected-operating-system | by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] |
| X-Received-From | 134.93.134.1 |
| X-Mailman-Approved-At | Mon, 16 Nov 2015 15:58:54 -0500 |
| X-BeenThere | bug-bash@gnu.org |
| X-Mailman-Version | 2.1.14 |
| Precedence | list |
| List-Id | Bug reports for the GNU Bourne Again SHell <bug-bash.gnu.org> |
| List-Unsubscribe | <https://lists.gnu.org/mailman/options/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=unsubscribe> |
| List-Archive | <http://lists.gnu.org/archive/html/bug-bash> |
| List-Post | <mailto:bug-bash@gnu.org> |
| List-Help | <mailto:bug-bash-request@gnu.org?subject=help> |
| List-Subscribe | <https://lists.gnu.org/mailman/listinfo/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=subscribe> |
| Xref | csiph.com gnu.bash.bug:11888 |
Show key headers only | View raw
>>>>> On Sun, 15 Nov 2015, Ulrich Mueller wrote:
> Description:
> In an UTF-8 locale like en_US.UTF-8, the case-modifying
> parameter expansions sometimes return invalid UTF-8 encodings.
> This seems to happen when the UTF-8 byte sequences that are
> encoding upper and lower case have different lengths.
Even more interesting effects happen if the string contains a
character whose UTF-8 encoding gets *longer* after case conversion,
because then the terminating null byte will be overwritten.
For example, U+0250 "LATIN SMALL LETTER TURNED A" is represented by a
two byte sequence in UTF-8, while its uppercase equivalent U+2C6F
needs three bytes:
$ LC_ALL=en_US.UTF-8
$ x=$'aaaaa\xc9\x90'
$ y=${x^^}
$ echo -n "$y" | od -t x1
0000000 41 41 41 41 41 e2 90 af 6f 6d 65 2f 75 6c 6d
0000017
y contains some trailing garbage (could be a part of $HOME or $PWD).
Back to gnu.bash.bug | Previous | Next | Find similar | Unroll thread
Re: bash-4.3: casemod word expansions broken with UTF-8 Ulrich Mueller <ulm@gentoo.org> - 2015-11-15 17:56 +0100
csiph-web