Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > gnu.bash.bug > #14851

Re: Bash removes unrequested characters in bracket expressions (not a range).

Path csiph.com!3.us.feeder.erje.net!feeder.erje.net!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail
From Bize Ma <binaryzebra@gmail.com>
Newsgroups gnu.bash.bug
Subject Re: Bash removes unrequested characters in bracket expressions (not a range).
Date Sat, 24 Nov 2018 17:34:55 -0400
Lines 48
Approved bug-bash@gnu.org
Message-ID <mailman.4547.1543095311.1284.bug-bash@gnu.org> (permalink)
References <CAFra36hcAjBHGgd_8sHjOV4wSzjmdCyLV2aQo8Ww1bwJqkxYQA@mail.gmail.com> <1c24a279-f439-a13c-be60-901096ccd4e1@case.edu>
NNTP-Posting-Host lists.gnu.org
Mime-Version 1.0
Content-Type text/plain; charset="UTF-8"
X-Trace usenet.stanford.edu 1543095312 19093 208.118.235.17 (24 Nov 2018 21:35:12 GMT)
X-Complaints-To action@cs.stanford.edu
Cc bug-bash <bug-bash@gnu.org>, bash@packages.debian.org
To Chester Ramey <chet.ramey@case.edu>
Envelope-to bug-bash@gnu.org
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=vizIMoqY7Xu5ayTJwK2GpLmGG/7odWLkK0QzLvwdzC4=; b=akhchKwycMGISCg8VmnyiJEBdMf6yQ3GQzkKO0LDObfDelvEMJ/GavRhdMbFGYF5jH PLap38d7j1bF0+IWOINpvZPkiBUbmM4Lo4BDykBdQAEZ6CnrarUFluU+T6eYJtSA5q+M E1qMx43bNO1jt/UR6JRiN0KV6Qus4V1zFH+RWy6KHfxTdqV9hJT8fmX7IMi2aUwvFgGB Wje2EyU1hT9uP+b6ztqs8CMpFIaVilHvS46fWIgyBNKKgi1ruAEMY1pn90jgzK3TCqId pPS+t224xIBb3CIh7d5NGnWpmuz4ZWl94c0tONX9aJRgXRmYNLiBV/hAWyAuDxjcikMe q1SQ==
X-Google-DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=vizIMoqY7Xu5ayTJwK2GpLmGG/7odWLkK0QzLvwdzC4=; b=pEdVMcO4cR6GsNbBGwbt7nft4OgQ3exY18vCBW8d9B5EouUZUmAOi2v01kkd4FIZa2 EHXwPDqetKfpaFm63YCOPWSEvB6uzz0KkoYUF7SENvHfNldVsd6dweQ1KwQ6dq2fLb13 WvrwKSgqBlVPmiPAOeHTDkd03X8f3GrYjjMwO9BkTsGI8vZWmglJ9cZOQ3Lqxi40mDsq sibzDT4+0yZspuDpca8lcpDkUJOsubyrroUqRrN57nqocM9vSW+v28JdPXKXz4luG5Tx 3SxMKF/pSOyZXm+cZqW5QL/ZDq66M2SPxVZUKRYhjEHjV2DJyXjJ6DPjFIfOfUDZYUr6 QTaw==
X-Gm-Message-State AA+aEWaQ+hEOjUswPpOYQqGgaxBAmsCfqJyif289zAQ/4H9SDTyTYAPe G71Wa1sGxOEyUC7tMvOTPe0Q1W2kwr21UE3pnCc=
X-Google-Smtp-Source AFSGD/VEfx/bdB0oWL6Fza2PSc+JtNZrQC5NU83QrND41hUNFb02FutBn59kncvXBEjVfcoa90UevMewySZ2D+NTNEE=
X-Received by 2002:a9d:d73:: with SMTP id 106mr7083238oti.291.1543095308574; Sat, 24 Nov 2018 13:35:08 -0800 (PST)
In-Reply-To <1c24a279-f439-a13c-be60-901096ccd4e1@case.edu>
X-detected-operating-system by eggs.gnu.org: Genre and OS details not recognized.
X-Received-From 2607:f8b0:4864:20::335
X-Content-Filtered-By Mailman/MimeDel 2.1.21
X-BeenThere bug-bash@gnu.org
X-Mailman-Version 2.1.21
Precedence list
List-Id Bug reports for the GNU Bourne Again SHell <bug-bash.gnu.org>
List-Unsubscribe <https://lists.gnu.org/mailman/options/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=unsubscribe>
List-Archive <http://lists.gnu.org/archive/html/bug-bash/>
List-Post <mailto:bug-bash@gnu.org>
List-Help <mailto:bug-bash-request@gnu.org?subject=help>
List-Subscribe <https://lists.gnu.org/mailman/listinfo/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=subscribe>
Xref csiph.com gnu.bash.bug:14851

Show key headers only | View raw


Chet Ramey (<chet.ramey@case.edu>) wrote:

> On 11/23/18 6:09 PM, Bize Ma wrote:
>
> > Bash Version: 4.4
> > Patch Level: 12
> > Release Status: release
>


> > Description:
> >
> > Bash is removing characters not explicitly listed in a bracket
> > expression (character range).
> > In this example, it is removing digits from other languages.
>
> What is your locale?
>
>
The locale used was en_US.utf-8 but also happens with  459
locales out of 868 available under Debian (not in C, for example).

Also in all locales affected (except one), setting either
LC_ALL=$loc or LC_COLLATE=$loc did the same.
Except in zh_CN.gb18030

But IMO locale collation should not be used for an explicit list.

I have been made aware that there is a
      cstart = cend = FOLD (cstart);
inside the `sm_loop.c` file that will convert into a range many
individual character. If that understanding is correct that is the
source of the difference with other shells.

I have the perception that a collation table *must have a "total order"*,
in fact, an strict total order. If two characters `a` and `b` could sort as
equal the order will fail to provide a confirmation that a character is
absent from the list. Consider characters `a`, `b` and `c`, if a and b
sort as equal, a sorted list in which we find `a` followed by `c` doesn't
confirm that `b` is absent as the order could well be `b a c`.

In this case, there must not be any other character than `a` in the
range `a-a` and using a range `a-a` is equivalent (just slower and
more complex) to the single character `a`.

If this is not the case, the error is in the collation table, not in using
single (faster) characters. And what should be updated is such
collation table IMO.

Back to gnu.bash.bug | Previous | Next | Find similar | Unroll thread


Thread

Re: Bash removes unrequested characters in bracket expressions (not a range). Bize Ma <binaryzebra@gmail.com> - 2018-11-24 17:34 -0400

csiph-web