Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > gnu.bash.bug > #15439

Re: Wildcard expansion can fail with nonprinting characters

Path csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail
From Geoff Kuenning <geoff@cs.hmc.edu>
Newsgroups gnu.bash.bug
Subject Re: Wildcard expansion can fail with nonprinting characters
Date Mon, 30 Sep 2019 17:39:18 -0700
Lines 86
Approved bug-bash@gnu.org
Message-ID <mailman.544.1569890367.2651.bug-bash@gnu.org> (permalink)
References <pnih84x47ql.fsf@bow.cs.hmc.edu> <9e9454a8-35db-c426-5388-7426169c4d63@case.edu> <pnisgodxps9.fsf@bow.cs.hmc.edu>
NNTP-Posting-Host lists.gnu.org
Mime-Version 1.0
Content-Type text/plain; format=flowed
X-Trace usenet.stanford.edu 1569890368 15172 209.51.188.17 (1 Oct 2019 00:39:28 GMT)
X-Complaints-To action@cs.stanford.edu
Cc bug-bash@gnu.org
To Chet Ramey <chet.ramey@case.edu>
Envelope-to bug-bash@gnu.org
User-Mail-Address geoff@cs.hmc.edu
In-Reply-To <9e9454a8-35db-c426-5388-7426169c4d63@case.edu> (Chet Ramey's message of "Mon, 30 Sep 2019 15:35:21 -0400")
User-Agent Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux)
X-detected-operating-system by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy]
X-Received-From 134.173.42.59
X-BeenThere bug-bash@gnu.org
X-Mailman-Version 2.1.23
Precedence list
List-Id Bug reports for the GNU Bourne Again SHell <bug-bash.gnu.org>
List-Unsubscribe <https://lists.gnu.org/mailman/options/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=unsubscribe>
List-Archive <https://lists.gnu.org/archive/html/bug-bash>
List-Post <mailto:bug-bash@gnu.org>
List-Help <mailto:bug-bash-request@gnu.org?subject=help>
List-Subscribe <https://lists.gnu.org/mailman/listinfo/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=subscribe>
X-Mailman-Original-Message-ID <pnisgodxps9.fsf@bow.cs.hmc.edu>
X-Mailman-Original-References <pnih84x47ql.fsf@bow.cs.hmc.edu> <9e9454a8-35db-c426-5388-7426169c4d63@case.edu>
Xref csiph.com gnu.bash.bug:15439

Show key headers only | View raw


$'\361' is a valid character in Latin-1, which is how it happened 
to arise in my case.  Also, I tested with the C locale, which 
should be agnostic to character encodings, and got the same 
result.

The general Unix philosophy, which in this case says "I'm not 
going to pass judgment on the weird things you do even though I 
don't understand them", argues for being able to handle any 
arbitrary sequence of bytes, at least on Linux.  That's one of the 
things that makes the Unix paradigm so powerful.  So I appreciate 
your willingness to fix this.

> On 9/27/19 7:52 PM, Geoff Kuenning wrote:
>> Version:
>> 
>> GNU bash, version 4.4.23(1)-release (x86_64-suse-linux-gnu)
>> Copyright (C) 2016 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later 
>> <http://gnu.org/licenses/gpl.html>
>> 
>> Behavior:
>> 
>> If a pathname contains nonprinting characters, and is expanded 
>> from a
>> variable name, wildcard expansion can sometimes fail.
>
> This is an interesting report. The $'\361' is a unicode 
> combining
> character, which ends up making the entire sequence of 
> characters an
> invalid wide character string in a bunch of different locales.
>
> Some file systems (Mac OS X APFS) don't allow you to create 
> files with
> invalid characters or character sequences in their names. Others 
> (Linux)
> don't have a problem with it.
>
> The code to dequote filenames that's needed for "$x" tries to 
> fall back to
> single-byte character operations in the presence of invalid 
> character or
> byte sequences, but that means you can't use any of the standard 
> wide
> character functions to check for valid and invalid wide 
> character strings.
>
> The change between bash-4.4 and bash-5.0 is that the globbing 
> code doesn't
> bother to try and convert to wide characters to do the dequoting 
> if there
> aren't any valid multibyte characters in the pathname, but uses 
> the single
> byte character code instead. That works for this case, but 
> doesn't work for
> pathnames that have both valid and invalid wide character 
> sequences.
>
> A better fix is to write a symmetric function that will take the 
> output of
> xdupmbstowcs2 (bash's replacement for mbstowcs that handles 
> zero-length
> wide character strings that aren't null wide characters) and 
> handle the
> invalid wide character strings that may result from it. I'll 
> make that fix
> for the next release.
>
> Chet
>
> -- 
> ``The lyf so short, the craft so long to lerne.'' - Chaucer
> 		 ``Ars longa, vita brevis'' - Hippocrates
> Chet Ramey, UTech, CWRU    chet@case.edu 
> http://tiswww.cwru.edu/~chet/
>

-- 
    Geoff Kuenning   geoff@cs.hmc.edu 
    http://www.cs.hmc.edu/~geoff/

Orchestra retrospectively extremely satisfied with symphony 
[No. 1] as
result of barrel of free beer.
	-- Gustav Mahler, post-premiere letter to Arnold Berliner

Back to gnu.bash.bug | Previous | Next | Find similar | Unroll thread


Thread

Re: Wildcard expansion can fail with nonprinting characters Geoff Kuenning <geoff@cs.hmc.edu> - 2019-09-30 17:39 -0700

csiph-web