Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > gnu.bash.bug > #15611 > unrolled thread

Re: Locale not Obeyed by Parameter Expansion with Pattern Substitution

Started byStephane Chazelas <stephane.chazelas@gmail.com>
First post2019-11-19 07:56 +0000
Last post2019-11-19 07:56 +0000
Articles 1 — 1 participant

Back to article view | Back to gnu.bash.bug

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Locale not Obeyed by Parameter Expansion with Pattern Substitution Stephane Chazelas <stephane.chazelas@gmail.com> - 2019-11-19 07:56 +0000

#15611 — Re: Locale not Obeyed by Parameter Expansion with Pattern Substitution

FromStephane Chazelas <stephane.chazelas@gmail.com>
Date2019-11-19 07:56 +0000
SubjectRe: Locale not Obeyed by Parameter Expansion with Pattern Substitution
Message-ID<mailman.1918.1574150225.13325.bug-bash@gnu.org>
2019-11-18 20:46:26 +0000, Stephane Chazelas:
[...]
> > printf -v B '\u204B'
> > set -- ${B//?()/ }
> > echo "${@@Q}"       #-> $'\342' $'\201' $'\213'
[...]
> It seems to me that zsh's approach is best:
> 
> $ A=$'\u2048\201\u2048' zsh  -c "printf '%q\n' \"\${A//$'\201'/:}\""
> ⁈:⁈
> 
> That is replace that \201 byte, except when it's part of a
> properly encoded character.
[...]

Actually, zsh would also break a character if the byte to be
replaced is the first of the character:

$ A=$'\u2048\342\u2048' zsh -c "printf '%q\n' \"\${A//$'\342'/:}\""
:$'\201'$'\210'::$'\201'$'\210'

Note that in charsets like BIG5/GB18030... which have characters
whose encoding contains the encoding of other characters, bash
seems to behave better than in UTF-8.

For instance the encoding of é in BIG5-HKSCS is 0x88 0x6d where
0x6d is also the encoding of "m" like in ASCII.

$ printf é | iconv -t big5-hkscs | od -tc -tx1
0000000 210   m
         88  6d
0000002
$ LC_ALL=zh_HK.big5hkscs luit
$ U=Stéphane bash -c 'printf "%s\n" "${U//m}"'
Stéphane
$ U=Stéphane ksh93 -c 'printf "%s\n" "${U//m}"'
Stéphane
$ U=Stéphane zsh -c 'printf "%s\n" "${U//m}"'
Stéphane

All 3 shells OK, but:

$ U=Stéphane bash -c 'printf "%s\n" "${U//$'\''\210'\''}"'
Stmphane
$ U=Stéphane ksh  -c 'printf "%s\n" "${U//$'\''\210'\''}"'
Stmphane
$ U=Stéphane zsh  -c 'printf "%s\n" "${U//$'\''\210'\''}"'
Stmphane

All 3 shells "break" that é character there.

-- 
Stephane

[toc] | [standalone]


Back to top | Article view | gnu.bash.bug


csiph-web