Path: csiph.com!goblin1!goblin.stu.neva.ru!usenet.stanford.edu!not-for-mail From: L A Walsh Newsgroups: gnu.bash.bug Subject: Re: Unicode range and enumeration support. Date: Mon, 23 Dec 2019 12:57:47 -0800 Lines: 34 Approved: bug-bash@gnu.org Message-ID: References: <568aeaaa-22b3-c7b9-0e18-a92bef6d2ffb@iki.fi> <5DF2FE31.9070406@tlinx.org> <0ff3a920-94c2-b0c9-5631-0964955657aa@archlinux.org> <5DF3D78B.4090208@tlinx.org> <20191213184213.GO851@eeg.ccf.org> <5DF4BDF0.6000402@tlinx.org> <20191216163906.GV851@eeg.ccf.org> <5DFA7AE2.2060504@tlinx.org> <20191218194651.GH851@eeg.ccf.org> <5DFD68B9.3050202@tlinx.org> <20191223132049.GW851@eeg.ccf.org> <5E012A4B.1090304@tlinx.org> NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: usenet.stanford.edu 1577134680 11567 209.51.188.17 (23 Dec 2019 20:58:00 GMT) X-Complaints-To: action@cs.stanford.edu To: bug-bash Envelope-to: bug-bash@gnu.org User-Agent: Thunderbird In-Reply-To: <20191223132049.GW851@eeg.ccf.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] [fuzzy] X-Received-From: 173.164.175.65 X-BeenThere: bug-bash@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports for the GNU Bourne Again SHell List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: <5E012A4B.1090304@tlinx.org> X-Mailman-Original-References: <568aeaaa-22b3-c7b9-0e18-a92bef6d2ffb@iki.fi> <5DF2FE31.9070406@tlinx.org> <0ff3a920-94c2-b0c9-5631-0964955657aa@archlinux.org> <5DF3D78B.4090208@tlinx.org> <20191213184213.GO851@eeg.ccf.org> <5DF4BDF0.6000402@tlinx.org> <20191216163906.GV851@eeg.ccf.org> <5DFA7AE2.2060504@tlinx.org> <20191218194651.GH851@eeg.ccf.org> <5DFD68B9.3050202@tlinx.org> <20191223132049.GW851@eeg.ccf.org> Xref: csiph.com gnu.bash.bug:15770 On 2019/12/23 05:20, Greg Wooledge wrote: > On Fri, Dec 20, 2019 at 04:35:05PM -0800, L A Walsh wrote:= > > You can't simply translate $start and $end to single Unicode code point > values, enumerate the Unicode characters between those two points, > and translate those characters back to the user's locale. That doesn't > give you the correct answer. There will be extra characters in the > Unicode code point range that don't fit the solution, You would have to limit your enumeration to the locale range a well -- i.e. seeing if a character match the locale you wanted. But NOTE -- I never suggested doing locale matching. I just suggested Unicode code-point enumeration in Unicode CP order as a first delivered feature. I thought that would be much easier. > The only way to do it is to iterate over the ENTIRE code point space, > however many millions or billions of characters that is today. > It took less than a tenth of a second in perl, so probably a fraction of that in 'C'. > Is that what you are proposing bash should do, in order to get a working > brace expansion outside of the C locale? I don't believe this is an > acceptable solution. > I said I'd proably go with enumeration between two code points as a first step, but even going through the entire unicode code space is trivially fast on modern computers.