Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > gnu.bash.bug > #14551
| From | Eric Blake <eblake@redhat.com> |
|---|---|
| Newsgroups | gnu.bash.bug |
| Subject | Re: built-in regex matches wrong character |
| Date | 2018-09-05 15:39 -0500 |
| Organization | Red Hat, Inc. |
| Message-ID | <mailman.416.1536179989.1284.bug-bash@gnu.org> (permalink) |
| References | <201809051850.w85IoClP001449@mamatb-laptop> |
On 09/05/2018 01:50 PM, mamatb@mamatb-laptop wrote: > Description: > It seems like bash built-in regex matches some symbols that shouldn't. The following commands shows this: > [[ 'º' =~ [o-p] ]] && [[ ! 'º' =~ o ]] && [[ ! 'º' =~ p ]] && echo 'º between o and p but none of them' > [[ 'ª' =~ [a-b] ]] && [[ ! 'ª' =~ a ]] && [[ ! 'ª' =~ b ]] && echo 'ª between a and b but none of them' > > Repeat-By: > Actually found out this while developing a bigger bash script, but it can be reproduced with the previous lines. Would you reply me at amatbaeza@gmail.com to know if this was in fact a bug? Thanks. Not a bug, but a property of your locale. POSIX says that range expressions in regular expressions are implementation-defined except for in the C locale, which means [a-b] is free to match more than just the two ASCII characters 'a' and 'b', but rather anything that your current locale considers equivalent. If you run your script with LC_ALL=C in the environment, you won't have that problem (because there, [a-b] is well-defined to be exactly two characters). Or, you can use bash's 'shopt -s globasciiranges' which is supposed to enable Rational Range Interpretation, where even in non-C locales, a character range bounded by two ASCII characters takes on the C locale definition of only the ASCII characters in that range, rather than the locale's definition of whatever other characters might also be equivalent (actually, while I know that shopt affects globbing, I don't know if it also affects regex matching - but if it doesn't, that's probably a bug that should be fixed). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Back to gnu.bash.bug | Previous | Next | Find similar | Unroll thread
Re: built-in regex matches wrong character Eric Blake <eblake@redhat.com> - 2018-09-05 15:39 -0500
csiph-web