Path: csiph.com!goblin1!goblin.stu.neva.ru!usenet.stanford.edu!not-for-mail From: Eric Blake Newsgroups: gnu.bash.bug Subject: Re: built-in regex matches wrong character Date: Wed, 5 Sep 2018 15:39:01 -0500 Organization: Red Hat, Inc. Lines: 39 Approved: bug-bash@gnu.org Message-ID: References: <201809051850.w85IoClP001449@mamatb-laptop> NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Trace: usenet.stanford.edu 1536179989 12637 208.118.235.17 (5 Sep 2018 20:39:49 GMT) X-Complaints-To: action@cs.stanford.edu To: bug-bash@gnu.org, amatbaeza@gmail.com Envelope-to: bug-bash@gnu.org User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 In-Reply-To: <201809051850.w85IoClP001449@mamatb-laptop> Content-Language: en-US X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Wed, 05 Sep 2018 20:39:02 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Wed, 05 Sep 2018 20:39:02 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'eblake@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 66.187.233.73 X-BeenThere: bug-bash@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Bug reports for the GNU Bourne Again SHell List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com gnu.bash.bug:14551 On 09/05/2018 01:50 PM, mamatb@mamatb-laptop wrote: > Description: > It seems like bash built-in regex matches some symbols that shouldn't.= The following commands shows this: > [[ '=C2=BA' =3D~ [o-p] ]] && [[ ! '=C2=BA' =3D~ o ]] && [[ ! '=C2=BA'= =3D~ p ]] && echo '=C2=BA between o and p but none of them' > [[ '=C2=AA' =3D~ [a-b] ]] && [[ ! '=C2=AA' =3D~ a ]] && [[ ! '=C2=AA'= =3D~ b ]] && echo '=C2=AA between a and b but none of them' >=20 > Repeat-By: > Actually found out this while developing a bigger bash script, but it = can be reproduced with the previous lines. Would you reply me at amatbaez= a@gmail.com to know if this was in fact a bug? Thanks. Not a bug, but a property of your locale. POSIX says that range expressions in regular expressions are=20 implementation-defined except for in the C locale, which means [a-b] is=20 free to match more than just the two ASCII characters 'a' and 'b', but=20 rather anything that your current locale considers equivalent. If you run your script with LC_ALL=3DC in the environment, you won't have= =20 that problem (because there, [a-b] is well-defined to be exactly two=20 characters). Or, you can use bash's 'shopt -s globasciiranges' which is=20 supposed to enable Rational Range Interpretation, where even in non-C=20 locales, a character range bounded by two ASCII characters takes on the=20 C locale definition of only the ASCII characters in that range, rather=20 than the locale's definition of whatever other characters might also be=20 equivalent (actually, while I know that shopt affects globbing, I don't=20 know if it also affects regex matching - but if it doesn't, that's=20 probably a bug that should be fixed). --=20 Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org