Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > gnu.bash.bug > #14558
| From | Eric Blake <eblake@redhat.com> |
|---|---|
| Newsgroups | gnu.bash.bug |
| Subject | Re: built-in regex matches wrong character |
| Date | 2018-09-06 12:58 -0500 |
| Organization | Red Hat, Inc. |
| Message-ID | <mailman.454.1536256705.1284.bug-bash@gnu.org> (permalink) |
| References | <201809051850.w85IoClP001449@mamatb-laptop> <5d3e2655-9b29-563e-a3aa-f96f6563f9fc@redhat.com> <cdf3707d-9e10-4be3-94f9-4cb5f5d9b9ed@case.edu> <mailman.444.1536243821.1284.bug-bash@gnu.org> <pmroop$tv8$1@dont-email.me> |
On 09/06/2018 12:39 PM, Aharon Robbins wrote: > In article <mailman.444.1536243821.1284.bug-bash@gnu.org>, > Eric Blake <eblake@redhat.com> wrote: >> But bash could be taught to convert any regex that contains a range with >> both endpoints ASCII into a different bracket expression before handing >> things over to regcomp(). That is, if the user is matching against >> [a-d], bash hands [abcd] to regcomp() instead. You don't need a flag in >> regcomp() to get RRI, just merely some pre-processing (and often memory >> allocation, as the expansion of a range into a non-range tends to >> require more characters). > > This is easy and inexpensive for ASCII only. Full RRI does the > same thing for wide character sets as well, though, and there > the possibility for using very large amounts of memory makes the > rewrite-the-range idea less palatable. Indeed. But the bash option is named 'globasciiranges', and I find far more use in having ranges with both endpoints in single-byte ASCII behaving sanely than I do for ranges with one or more ends resulting in a multibyte character (by the time my regex involves multibyte characters, I am already admitting that I am in locale-dependent territory, and RRI may no longer be the best action anyway). That is, RRI makes the most sense when dealing with ASCII characters (< 128) in the first place, and that's a reasonable stopgap for immediate implementation, even if we don't get full RRI across all of Unicode (assuming that such might later become available via a new regcomp() flag). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Back to gnu.bash.bug | Previous | Next — Previous in thread | Find similar | Unroll thread
Re: built-in regex matches wrong character Eric Blake <eblake@redhat.com> - 2018-09-06 09:23 -0500
Re: built-in regex matches wrong character arnold@skeeve.com (Aharon Robbins) - 2018-09-06 17:39 +0000
Re: built-in regex matches wrong character Eric Blake <eblake@redhat.com> - 2018-09-06 12:58 -0500
csiph-web