Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > gnu.bash.bug > #15611 > unrolled thread
| Started by | Stephane Chazelas <stephane.chazelas@gmail.com> |
|---|---|
| First post | 2019-11-19 07:56 +0000 |
| Last post | 2019-11-19 07:56 +0000 |
| Articles | 1 — 1 participant |
Back to article view | Back to gnu.bash.bug
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Locale not Obeyed by Parameter Expansion with Pattern Substitution Stephane Chazelas <stephane.chazelas@gmail.com> - 2019-11-19 07:56 +0000
| From | Stephane Chazelas <stephane.chazelas@gmail.com> |
|---|---|
| Date | 2019-11-19 07:56 +0000 |
| Subject | Re: Locale not Obeyed by Parameter Expansion with Pattern Substitution |
| Message-ID | <mailman.1918.1574150225.13325.bug-bash@gnu.org> |
2019-11-18 20:46:26 +0000, Stephane Chazelas:
[...]
> > printf -v B '\u204B'
> > set -- ${B//?()/ }
> > echo "${@@Q}" #-> $'\342' $'\201' $'\213'
[...]
> It seems to me that zsh's approach is best:
>
> $ A=$'\u2048\201\u2048' zsh -c "printf '%q\n' \"\${A//$'\201'/:}\""
> ⁈:⁈
>
> That is replace that \201 byte, except when it's part of a
> properly encoded character.
[...]
Actually, zsh would also break a character if the byte to be
replaced is the first of the character:
$ A=$'\u2048\342\u2048' zsh -c "printf '%q\n' \"\${A//$'\342'/:}\""
:$'\201'$'\210'::$'\201'$'\210'
Note that in charsets like BIG5/GB18030... which have characters
whose encoding contains the encoding of other characters, bash
seems to behave better than in UTF-8.
For instance the encoding of é in BIG5-HKSCS is 0x88 0x6d where
0x6d is also the encoding of "m" like in ASCII.
$ printf é | iconv -t big5-hkscs | od -tc -tx1
0000000 210 m
88 6d
0000002
$ LC_ALL=zh_HK.big5hkscs luit
$ U=Stéphane bash -c 'printf "%s\n" "${U//m}"'
Stéphane
$ U=Stéphane ksh93 -c 'printf "%s\n" "${U//m}"'
Stéphane
$ U=Stéphane zsh -c 'printf "%s\n" "${U//m}"'
Stéphane
All 3 shells OK, but:
$ U=Stéphane bash -c 'printf "%s\n" "${U//$'\''\210'\''}"'
Stmphane
$ U=Stéphane ksh -c 'printf "%s\n" "${U//$'\''\210'\''}"'
Stmphane
$ U=Stéphane zsh -c 'printf "%s\n" "${U//$'\''\210'\''}"'
Stmphane
All 3 shells "break" that é character there.
--
Stephane
Back to top | Article view | gnu.bash.bug
csiph-web