Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > gnu.bash.bug > #14851 > unrolled thread

Re: Bash removes unrequested characters in bracket expressions (not a range).

Started byBize Ma <binaryzebra@gmail.com>
First post2018-11-24 17:34 -0400
Last post2018-11-24 17:34 -0400
Articles 1 — 1 participant

Back to article view | Back to gnu.bash.bug

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Bash removes unrequested characters in bracket expressions (not a range). Bize Ma <binaryzebra@gmail.com> - 2018-11-24 17:34 -0400

#14851 — Re: Bash removes unrequested characters in bracket expressions (not a range).

FromBize Ma <binaryzebra@gmail.com>
Date2018-11-24 17:34 -0400
SubjectRe: Bash removes unrequested characters in bracket expressions (not a range).
Message-ID<mailman.4547.1543095311.1284.bug-bash@gnu.org>
Chet Ramey (<chet.ramey@case.edu>) wrote:

> On 11/23/18 6:09 PM, Bize Ma wrote:
>
> > Bash Version: 4.4
> > Patch Level: 12
> > Release Status: release
>


> > Description:
> >
> > Bash is removing characters not explicitly listed in a bracket
> > expression (character range).
> > In this example, it is removing digits from other languages.
>
> What is your locale?
>
>
The locale used was en_US.utf-8 but also happens with  459
locales out of 868 available under Debian (not in C, for example).

Also in all locales affected (except one), setting either
LC_ALL=$loc or LC_COLLATE=$loc did the same.
Except in zh_CN.gb18030

But IMO locale collation should not be used for an explicit list.

I have been made aware that there is a
      cstart = cend = FOLD (cstart);
inside the `sm_loop.c` file that will convert into a range many
individual character. If that understanding is correct that is the
source of the difference with other shells.

I have the perception that a collation table *must have a "total order"*,
in fact, an strict total order. If two characters `a` and `b` could sort as
equal the order will fail to provide a confirmation that a character is
absent from the list. Consider characters `a`, `b` and `c`, if a and b
sort as equal, a sorted list in which we find `a` followed by `c` doesn't
confirm that `b` is absent as the order could well be `b a c`.

In this case, there must not be any other character than `a` in the
range `a-a` and using a range `a-a` is equivalent (just slower and
more complex) to the single character `a`.

If this is not the case, the error is in the collation table, not in using
single (faster) characters. And what should be updated is such
collation table IMO.

[toc] | [standalone]


Back to top | Article view | gnu.bash.bug


csiph-web