Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > gnu.bash.bug > #14558

Re: built-in regex matches wrong character

Path csiph.com!goblin1!goblin.stu.neva.ru!usenet.stanford.edu!not-for-mail
From Eric Blake <eblake@redhat.com>
Newsgroups gnu.bash.bug
Subject Re: built-in regex matches wrong character
Date Thu, 6 Sep 2018 12:58:17 -0500
Organization Red Hat, Inc.
Lines 32
Approved bug-bash@gnu.org
Message-ID <mailman.454.1536256705.1284.bug-bash@gnu.org> (permalink)
References <201809051850.w85IoClP001449@mamatb-laptop> <5d3e2655-9b29-563e-a3aa-f96f6563f9fc@redhat.com> <cdf3707d-9e10-4be3-94f9-4cb5f5d9b9ed@case.edu> <mailman.444.1536243821.1284.bug-bash@gnu.org> <pmroop$tv8$1@dont-email.me>
NNTP-Posting-Host lists.gnu.org
Mime-Version 1.0
Content-Type text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding 7bit
X-Trace usenet.stanford.edu 1536256706 21640 208.118.235.17 (6 Sep 2018 17:58:26 GMT)
X-Complaints-To action@cs.stanford.edu
To Aharon Robbins <arnold@skeeve.com>, bug-bash@gnu.org
Envelope-to bug-bash@gnu.org
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1
In-Reply-To <pmroop$tv8$1@dont-email.me>
Content-Language en-US
X-Scanned-By MIMEDefang 2.78 on 10.11.54.6
X-Greylist Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Thu, 06 Sep 2018 17:58:20 +0000 (UTC)
X-Greylist inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Thu, 06 Sep 2018 17:58:20 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'eblake@redhat.com' RCPT:''
X-detected-operating-system by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From 66.187.233.73
X-BeenThere bug-bash@gnu.org
X-Mailman-Version 2.1.21
Precedence list
List-Id Bug reports for the GNU Bourne Again SHell <bug-bash.gnu.org>
List-Unsubscribe <https://lists.gnu.org/mailman/options/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=unsubscribe>
List-Archive <http://lists.gnu.org/archive/html/bug-bash/>
List-Post <mailto:bug-bash@gnu.org>
List-Help <mailto:bug-bash-request@gnu.org?subject=help>
List-Subscribe <https://lists.gnu.org/mailman/listinfo/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=subscribe>
Xref csiph.com gnu.bash.bug:14558

Show key headers only | View raw


On 09/06/2018 12:39 PM, Aharon Robbins wrote:
> In article <mailman.444.1536243821.1284.bug-bash@gnu.org>,
> Eric Blake  <eblake@redhat.com> wrote:
>> But bash could be taught to convert any regex that contains a range with
>> both endpoints ASCII into a different bracket expression before handing
>> things over to regcomp().  That is, if the user is matching against
>> [a-d], bash hands [abcd] to regcomp() instead.  You don't need a flag in
>> regcomp() to get RRI, just merely some pre-processing (and often memory
>> allocation, as the expansion of a range into a non-range tends to
>> require more characters).
> 
> This is easy and inexpensive for ASCII only.  Full RRI does the
> same thing for wide character sets as well, though, and there
> the possibility for using very large amounts of memory makes the
> rewrite-the-range idea less palatable.

Indeed. But the bash option is named 'globasciiranges', and I find far 
more use in having ranges with both endpoints in single-byte ASCII 
behaving sanely than I do for ranges with one or more ends resulting in 
a multibyte character (by the time my regex involves multibyte 
characters, I am already admitting that I am in locale-dependent 
territory, and RRI may no longer be the best action anyway).  That is, 
RRI makes the most sense when dealing with ASCII characters (< 128) in 
the first place, and that's a reasonable stopgap for immediate 
implementation, even if we don't get full RRI across all of Unicode 
(assuming that such might later become available via a new regcomp() flag).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Back to gnu.bash.bug | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

Re: built-in regex matches wrong character Eric Blake <eblake@redhat.com> - 2018-09-06 09:23 -0500
  Re: built-in regex matches wrong character arnold@skeeve.com (Aharon Robbins) - 2018-09-06 17:39 +0000
    Re: built-in regex matches wrong character Eric Blake <eblake@redhat.com> - 2018-09-06 12:58 -0500

csiph-web