Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > gnu.bash.bug > #16631
| Path | csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail |
|---|---|
| From | Harald van Dijk <harald@gigawatt.nl> |
| Newsgroups | gnu.bash.bug |
| Subject | Equivalence classes handled differently in mb vs non-mb patterns |
| Date | Tue, 28 Jul 2020 09:17:57 +0100 |
| Lines | 51 |
| Approved | bug-bash@gnu.org |
| Message-ID | <mailman.186.1595924290.2739.bug-bash@gnu.org> (permalink) |
| References | <e988ae23-36bf-43c2-8c4c-a9abad806ce8@gigawatt.nl> |
| NNTP-Posting-Host | lists.gnu.org |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=utf-8; format=flowed |
| Content-Transfer-Encoding | 8bit |
| X-Trace | usenet.stanford.edu 1595924290 5698 209.51.188.17 (28 Jul 2020 08:18:10 GMT) |
| X-Complaints-To | action@cs.stanford.edu |
| To | bug-bash@gnu.org |
| Envelope-to | bug-bash@gnu.org |
| DKIM-Filter | OpenDKIM Filter v2.11.0 mail.gigawatt.nl CC7A73C6 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gigawatt.nl; s=default; t=1595924279; bh=mAjddyF8EqOzSUyLAcIxEzhJjO56CCnA13HbzXzWAGE=; h=From:Subject:To:Date:From; b=UosXjw7O6RWigLg6arxSR2BKsSWJ2xfTkqU9kYuWimOe5qkKalaDfwbfhwFwGzoAS 2bns9VIxkpEeDeq4X7jMDDy6r6oBysCl6HtTfY7ngRQZR/Fn8NeCRSdxy9GwWsBHAn 9P4lEd8sHmt6vNV8+lsg6KUmqKqFToDkeO6WH3QM= |
| User-Agent | Mozilla/5.0 (X11; Linux x86_64; rv:79.0) Gecko/20100101 Thunderbird/79.0 |
| Content-Language | en-US |
| Received-SPF | pass client-ip=2001:41d0:801:2000::19e9; envelope-from=harald@gigawatt.nl; helo=mail.gigawatt.nl |
| X-detected-operating-system | by eggs.gnu.org: No matching host in p0f cache. That's all we know. |
| X-Spam_score_int | 12 |
| X-Spam_score | 1.2 |
| X-Spam_bar | + |
| X-Spam_report | (1.2 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_SBL_CSS=3.335, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=no autolearn_force=no |
| X-Spam_action | no action |
| X-BeenThere | bug-bash@gnu.org |
| X-Mailman-Version | 2.1.23 |
| Precedence | list |
| List-Id | Bug reports for the GNU Bourne Again SHell <bug-bash.gnu.org> |
| List-Unsubscribe | <https://lists.gnu.org/mailman/options/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=unsubscribe> |
| List-Archive | <https://lists.gnu.org/archive/html/bug-bash> |
| List-Post | <mailto:bug-bash@gnu.org> |
| List-Help | <mailto:bug-bash-request@gnu.org?subject=help> |
| List-Subscribe | <https://lists.gnu.org/mailman/listinfo/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=subscribe> |
| X-Mailman-Original-Message-ID | <e988ae23-36bf-43c2-8c4c-a9abad806ce8@gigawatt.nl> |
| Xref | csiph.com gnu.bash.bug:16631 |
Show key headers only | View raw
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnux32
Compiler: gcc-10.1.0 -mx32
Compilation CFLAGS: -O2 -Wno-parentheses -Wno-format-security
uname output: Linux loucetios 5.7.9 #1 SMP @1590968955 x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnux32
Bash Version: 5.0
Patch Level: 18
Release Status: release
Description:
In lib/glob/smatch.c, there are two functions that are used to
check equivalence classes in patterns: collequiv, and
collequiv_wc. The former is used if the pattern does not contain
any multi-byte characters, the latter otherwise, With
exceptions that are not relevant to this bug. The two functions
do not give the same results: collequiv does not implement the
fnmatch() fallback code that collequiv_wc does implement,
leading to inconsistent matching for ASCII-only equivalence
classes.
(This is not something I encountered in a real script. I am
implementing equivalence class support myself, using fnmatch()
as the main check rather than as a fallback, and comparing the
results to those of other shells.)
Repeat-By:
case a in [[=A=]]) echo match 1 ;; esac
case aá in [[=A=]]á) echo match 2 ;; esac
In locales where A and a are not in the same equivalence class,
this should print nothing. glibc's ja_JP.UTF-8 is such a locale.
The C locale is such a locale as well, but it does not allow
for the á character, so may be bad for testing.
In locales where A and a are in the same equivalence class, this
should print "match 1" and "match 2". glibc's en_US.UTF-8 is
such a locale.
What actually happens in glibc's en_US.UTF-8 locale is that only
"match 2" is printed.
Fix:
Copy the FNMATCH_EQUIV_FALLBACK logic from collequiv_wc to
collequiv. _fnmatch_fallback_wc may be copied to create a non-wc
version of it, but it also works to have collequiv call
_fnmatch_fallback_wc by converting characters to wide
characters.
Back to gnu.bash.bug | Previous | Next | Find similar
Equivalence classes handled differently in mb vs non-mb patterns Harald van Dijk <harald@gigawatt.nl> - 2020-07-28 09:17 +0100
csiph-web