Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > gnu.bash.bug > #16631

Equivalence classes handled differently in mb vs non-mb patterns

From Harald van Dijk <harald@gigawatt.nl>
Newsgroups gnu.bash.bug
Subject Equivalence classes handled differently in mb vs non-mb patterns
Date 2020-07-28 09:17 +0100
Message-ID <mailman.186.1595924290.2739.bug-bash@gnu.org> (permalink)
References <e988ae23-36bf-43c2-8c4c-a9abad806ce8@gigawatt.nl>

Show all headers | View raw


Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnux32
Compiler: gcc-10.1.0 -mx32
Compilation CFLAGS: -O2 -Wno-parentheses -Wno-format-security
uname output: Linux loucetios 5.7.9 #1 SMP @1590968955 x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnux32

Bash Version: 5.0
Patch Level: 18
Release Status: release

Description:
	In lib/glob/smatch.c, there are two functions that are used to
	check equivalence classes in patterns: collequiv, and
         collequiv_wc. The former is used if the pattern does not contain
         any multi-byte characters, the latter otherwise, With
         exceptions that are not relevant to this bug. The two functions
         do not give the same results: collequiv does not implement the
         fnmatch() fallback code that collequiv_wc does implement,
         leading to inconsistent matching for ASCII-only equivalence
         classes.

         (This is not something I encountered in a real script. I am
         implementing equivalence class support myself, using fnmatch()
         as the main check rather than as a fallback, and comparing the
         results to those of other shells.)

Repeat-By:
	case a  in [[=A=]])  echo match 1 ;; esac
	case aá in [[=A=]]á) echo match 2 ;; esac

	In locales where A and a are not in the same equivalence class,
         this should print nothing. glibc's ja_JP.UTF-8 is such a locale.
         The C locale is such a locale as well, but it does not allow
         for the á character, so may be bad for testing.

	In locales where A and a are in the same equivalence class, this
	should print "match 1" and "match 2". glibc's en_US.UTF-8 is
         such a locale.

	What actually happens in glibc's en_US.UTF-8 locale is that only
	"match 2" is printed.

Fix:
	Copy the FNMATCH_EQUIV_FALLBACK logic from collequiv_wc to
	collequiv. _fnmatch_fallback_wc may be copied to create a non-wc
	version of it, but it also works to have collequiv call
	_fnmatch_fallback_wc by converting characters to wide
         characters.

Back to gnu.bash.bug | Previous | Next | Find similar


Thread

Equivalence classes handled differently in mb vs non-mb patterns Harald van Dijk <harald@gigawatt.nl> - 2020-07-28 09:17 +0100

csiph-web