Path: csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail From: Harald van Dijk Newsgroups: gnu.bash.bug Subject: Equivalence classes handled differently in mb vs non-mb patterns Date: Tue, 28 Jul 2020 09:17:57 +0100 Lines: 51 Approved: bug-bash@gnu.org Message-ID: References: NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Trace: usenet.stanford.edu 1595924290 5698 209.51.188.17 (28 Jul 2020 08:18:10 GMT) X-Complaints-To: action@cs.stanford.edu To: bug-bash@gnu.org Envelope-to: bug-bash@gnu.org DKIM-Filter: OpenDKIM Filter v2.11.0 mail.gigawatt.nl CC7A73C6 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gigawatt.nl; s=default; t=1595924279; bh=mAjddyF8EqOzSUyLAcIxEzhJjO56CCnA13HbzXzWAGE=; h=From:Subject:To:Date:From; b=UosXjw7O6RWigLg6arxSR2BKsSWJ2xfTkqU9kYuWimOe5qkKalaDfwbfhwFwGzoAS 2bns9VIxkpEeDeq4X7jMDDy6r6oBysCl6HtTfY7ngRQZR/Fn8NeCRSdxy9GwWsBHAn 9P4lEd8sHmt6vNV8+lsg6KUmqKqFToDkeO6WH3QM= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:79.0) Gecko/20100101 Thunderbird/79.0 Content-Language: en-US Received-SPF: pass client-ip=2001:41d0:801:2000::19e9; envelope-from=harald@gigawatt.nl; helo=mail.gigawatt.nl X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: 12 X-Spam_score: 1.2 X-Spam_bar: + X-Spam_report: (1.2 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_SBL_CSS=3.335, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: bug-bash@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports for the GNU Bourne Again SHell List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: Xref: csiph.com gnu.bash.bug:16631 Configuration Information [Automatically generated, do not change]: Machine: x86_64 OS: linux-gnux32 Compiler: gcc-10.1.0 -mx32 Compilation CFLAGS: -O2 -Wno-parentheses -Wno-format-security uname output: Linux loucetios 5.7.9 #1 SMP @1590968955 x86_64 GNU/Linux Machine Type: x86_64-pc-linux-gnux32 Bash Version: 5.0 Patch Level: 18 Release Status: release Description: In lib/glob/smatch.c, there are two functions that are used to check equivalence classes in patterns: collequiv, and collequiv_wc. The former is used if the pattern does not contain any multi-byte characters, the latter otherwise, With exceptions that are not relevant to this bug. The two functions do not give the same results: collequiv does not implement the fnmatch() fallback code that collequiv_wc does implement, leading to inconsistent matching for ASCII-only equivalence classes. (This is not something I encountered in a real script. I am implementing equivalence class support myself, using fnmatch() as the main check rather than as a fallback, and comparing the results to those of other shells.) Repeat-By: case a in [[=A=]]) echo match 1 ;; esac case aá in [[=A=]]á) echo match 2 ;; esac In locales where A and a are not in the same equivalence class, this should print nothing. glibc's ja_JP.UTF-8 is such a locale. The C locale is such a locale as well, but it does not allow for the á character, so may be bad for testing. In locales where A and a are in the same equivalence class, this should print "match 1" and "match 2". glibc's en_US.UTF-8 is such a locale. What actually happens in glibc's en_US.UTF-8 locale is that only "match 2" is printed. Fix: Copy the FNMATCH_EQUIV_FALLBACK logic from collequiv_wc to collequiv. _fnmatch_fallback_wc may be copied to create a non-wc version of it, but it also works to have collequiv call _fnmatch_fallback_wc by converting characters to wide characters.