Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > gnu.bash.bug > #16631

Equivalence classes handled differently in mb vs non-mb patterns

Path csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail
From Harald van Dijk <harald@gigawatt.nl>
Newsgroups gnu.bash.bug
Subject Equivalence classes handled differently in mb vs non-mb patterns
Date Tue, 28 Jul 2020 09:17:57 +0100
Lines 51
Approved bug-bash@gnu.org
Message-ID <mailman.186.1595924290.2739.bug-bash@gnu.org> (permalink)
References <e988ae23-36bf-43c2-8c4c-a9abad806ce8@gigawatt.nl>
NNTP-Posting-Host lists.gnu.org
Mime-Version 1.0
Content-Type text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding 8bit
X-Trace usenet.stanford.edu 1595924290 5698 209.51.188.17 (28 Jul 2020 08:18:10 GMT)
X-Complaints-To action@cs.stanford.edu
To bug-bash@gnu.org
Envelope-to bug-bash@gnu.org
DKIM-Filter OpenDKIM Filter v2.11.0 mail.gigawatt.nl CC7A73C6
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gigawatt.nl; s=default; t=1595924279; bh=mAjddyF8EqOzSUyLAcIxEzhJjO56CCnA13HbzXzWAGE=; h=From:Subject:To:Date:From; b=UosXjw7O6RWigLg6arxSR2BKsSWJ2xfTkqU9kYuWimOe5qkKalaDfwbfhwFwGzoAS 2bns9VIxkpEeDeq4X7jMDDy6r6oBysCl6HtTfY7ngRQZR/Fn8NeCRSdxy9GwWsBHAn 9P4lEd8sHmt6vNV8+lsg6KUmqKqFToDkeO6WH3QM=
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:79.0) Gecko/20100101 Thunderbird/79.0
Content-Language en-US
Received-SPF pass client-ip=2001:41d0:801:2000::19e9; envelope-from=harald@gigawatt.nl; helo=mail.gigawatt.nl
X-detected-operating-system by eggs.gnu.org: No matching host in p0f cache. That's all we know.
X-Spam_score_int 12
X-Spam_score 1.2
X-Spam_bar +
X-Spam_report (1.2 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_SBL_CSS=3.335, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=no autolearn_force=no
X-Spam_action no action
X-BeenThere bug-bash@gnu.org
X-Mailman-Version 2.1.23
Precedence list
List-Id Bug reports for the GNU Bourne Again SHell <bug-bash.gnu.org>
List-Unsubscribe <https://lists.gnu.org/mailman/options/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=unsubscribe>
List-Archive <https://lists.gnu.org/archive/html/bug-bash>
List-Post <mailto:bug-bash@gnu.org>
List-Help <mailto:bug-bash-request@gnu.org?subject=help>
List-Subscribe <https://lists.gnu.org/mailman/listinfo/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=subscribe>
X-Mailman-Original-Message-ID <e988ae23-36bf-43c2-8c4c-a9abad806ce8@gigawatt.nl>
Xref csiph.com gnu.bash.bug:16631

Show key headers only | View raw


Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnux32
Compiler: gcc-10.1.0 -mx32
Compilation CFLAGS: -O2 -Wno-parentheses -Wno-format-security
uname output: Linux loucetios 5.7.9 #1 SMP @1590968955 x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnux32

Bash Version: 5.0
Patch Level: 18
Release Status: release

Description:
	In lib/glob/smatch.c, there are two functions that are used to
	check equivalence classes in patterns: collequiv, and
         collequiv_wc. The former is used if the pattern does not contain
         any multi-byte characters, the latter otherwise, With
         exceptions that are not relevant to this bug. The two functions
         do not give the same results: collequiv does not implement the
         fnmatch() fallback code that collequiv_wc does implement,
         leading to inconsistent matching for ASCII-only equivalence
         classes.

         (This is not something I encountered in a real script. I am
         implementing equivalence class support myself, using fnmatch()
         as the main check rather than as a fallback, and comparing the
         results to those of other shells.)

Repeat-By:
	case a  in [[=A=]])  echo match 1 ;; esac
	case aá in [[=A=]]á) echo match 2 ;; esac

	In locales where A and a are not in the same equivalence class,
         this should print nothing. glibc's ja_JP.UTF-8 is such a locale.
         The C locale is such a locale as well, but it does not allow
         for the á character, so may be bad for testing.

	In locales where A and a are in the same equivalence class, this
	should print "match 1" and "match 2". glibc's en_US.UTF-8 is
         such a locale.

	What actually happens in glibc's en_US.UTF-8 locale is that only
	"match 2" is printed.

Fix:
	Copy the FNMATCH_EQUIV_FALLBACK logic from collequiv_wc to
	collequiv. _fnmatch_fallback_wc may be copied to create a non-wc
	version of it, but it also works to have collequiv call
	_fnmatch_fallback_wc by converting characters to wide
         characters.

Back to gnu.bash.bug | Previous | Next | Find similar


Thread

Equivalence classes handled differently in mb vs non-mb patterns Harald van Dijk <harald@gigawatt.nl> - 2020-07-28 09:17 +0100

csiph-web