Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > gnu.bash.bug > #14551 > unrolled thread

Re: built-in regex matches wrong character

Started byEric Blake <eblake@redhat.com>
First post2018-09-05 15:39 -0500
Last post2018-09-05 15:39 -0500
Articles 1 — 1 participant

Back to article view | Back to gnu.bash.bug

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: built-in regex matches wrong character Eric Blake <eblake@redhat.com> - 2018-09-05 15:39 -0500

#14551 — Re: built-in regex matches wrong character

FromEric Blake <eblake@redhat.com>
Date2018-09-05 15:39 -0500
SubjectRe: built-in regex matches wrong character
Message-ID<mailman.416.1536179989.1284.bug-bash@gnu.org>
On 09/05/2018 01:50 PM, mamatb@mamatb-laptop wrote:

> Description:
> 	It seems like bash built-in regex matches some symbols that shouldn't. The following commands shows this:
> 		[[ 'º' =~ [o-p] ]] && [[ ! 'º' =~ o ]] && [[ ! 'º' =~ p ]] && echo 'º between o and p but none of them'
> 		[[ 'ª' =~ [a-b] ]] && [[ ! 'ª' =~ a ]] && [[ ! 'ª' =~ b ]] && echo 'ª between a and b but none of them'
> 
> Repeat-By:
> 	Actually found out this while developing a bigger bash script, but it can be reproduced with the previous lines. Would you reply me at amatbaeza@gmail.com to know if this was in fact a bug? Thanks.

Not a bug, but a property of your locale.

POSIX says that range expressions in regular expressions are 
implementation-defined except for in the C locale, which means [a-b] is 
free to match more than just the two ASCII characters 'a' and 'b', but 
rather anything that your current locale considers equivalent.

If you run your script with LC_ALL=C in the environment, you won't have 
that problem (because there, [a-b] is well-defined to be exactly two 
characters).  Or, you can use bash's 'shopt -s globasciiranges' which is 
supposed to enable Rational Range Interpretation, where even in non-C 
locales, a character range bounded by two ASCII characters takes on the 
C locale definition of only the ASCII characters in that range, rather 
than the locale's definition of whatever other characters might also be 
equivalent (actually, while I know that shopt affects globbing, I don't 
know if it also affects regex matching - but if it doesn't, that's 
probably a bug that should be fixed).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[toc] | [standalone]


Back to top | Article view | gnu.bash.bug


csiph-web