Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > gnu.bash.bug > #14554

Re: built-in regex matches wrong character

From Eric Blake <eblake@redhat.com>
Newsgroups gnu.bash.bug
Subject Re: built-in regex matches wrong character
Date 2018-09-06 09:23 -0500
Organization Red Hat, Inc.
Message-ID <mailman.444.1536243821.1284.bug-bash@gnu.org> (permalink)
References <201809051850.w85IoClP001449@mamatb-laptop> <5d3e2655-9b29-563e-a3aa-f96f6563f9fc@redhat.com> <cdf3707d-9e10-4be3-94f9-4cb5f5d9b9ed@case.edu>

Show all headers | View raw


On 09/06/2018 09:17 AM, Chet Ramey wrote:
> On 9/5/18 4:39 PM, Eric Blake wrote:
> 
>> Or, you can use bash's 'shopt -s globasciiranges' which is
>> supposed to enable Rational Range Interpretation, where even in non-C
>> locales, a character range bounded by two ASCII characters takes on the C
>> locale definition of only the ASCII characters in that range, rather than
>> the locale's definition of whatever other characters might also be
>> equivalent (actually, while I know that shopt affects globbing, I don't
>> know if it also affects regex matching - but if it doesn't, that's probably
>> a bug that should be fixed).
> 
> Since bash uses the C library's regexp engine, and most C libraries don't
> implement RRI, much less expose it as a flags option available via
> regcomp(), there's no reason to expect that globasciiranges would have
> any effect on regular expression matching.

But bash could be taught to convert any regex that contains a range with 
both endpoints ASCII into a different bracket expression before handing 
things over to regcomp().  That is, if the user is matching against 
[a-d], bash hands [abcd] to regcomp() instead.  You don't need a flag in 
regcomp() to get RRI, just merely some pre-processing (and often memory 
allocation, as the expansion of a range into a non-range tends to 
require more characters).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Back to gnu.bash.bug | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Re: built-in regex matches wrong character Eric Blake <eblake@redhat.com> - 2018-09-06 09:23 -0500
  Re: built-in regex matches wrong character arnold@skeeve.com (Aharon Robbins) - 2018-09-06 17:39 +0000
    Re: built-in regex matches wrong character Eric Blake <eblake@redhat.com> - 2018-09-06 12:58 -0500

csiph-web