Path: csiph.com!goblin1!goblin.stu.neva.ru!usenet.stanford.edu!not-for-mail
From: L A Walsh <bash@tlinx.org>
Newsgroups: gnu.bash.bug
Subject: Re: Unicode range and enumeration support.
Date: Mon, 23 Dec 2019 12:57:47 -0800
Lines: 34
Approved: bug-bash@gnu.org
Message-ID: <mailman.1336.1577134680.1979.bug-bash@gnu.org>
References: <568aeaaa-22b3-c7b9-0e18-a92bef6d2ffb@iki.fi> <5DF2FE31.9070406@tlinx.org> <0ff3a920-94c2-b0c9-5631-0964955657aa@archlinux.org> <5DF3D78B.4090208@tlinx.org> <20191213184213.GO851@eeg.ccf.org> <5DF4BDF0.6000402@tlinx.org> <20191216163906.GV851@eeg.ccf.org> <5DFA7AE2.2060504@tlinx.org> <20191218194651.GH851@eeg.ccf.org> <5DFD68B9.3050202@tlinx.org> <20191223132049.GW851@eeg.ccf.org> <5E012A4B.1090304@tlinx.org>
NNTP-Posting-Host: lists.gnu.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: usenet.stanford.edu 1577134680 11567 209.51.188.17 (23 Dec 2019 20:58:00 GMT)
X-Complaints-To: action@cs.stanford.edu
To: bug-bash <bug-bash@gnu.org>
Envelope-to: bug-bash@gnu.org
User-Agent: Thunderbird
In-Reply-To: <20191223132049.GW851@eeg.ccf.org>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] [fuzzy]
X-Received-From: 173.164.175.65
X-BeenThere: bug-bash@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Bug reports for the GNU Bourne Again SHell <bug-bash.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/bug-bash>
List-Post: <mailto:bug-bash@gnu.org>
List-Help: <mailto:bug-bash-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=subscribe>
X-Mailman-Original-Message-ID: <5E012A4B.1090304@tlinx.org>
X-Mailman-Original-References: <568aeaaa-22b3-c7b9-0e18-a92bef6d2ffb@iki.fi> <5DF2FE31.9070406@tlinx.org> <0ff3a920-94c2-b0c9-5631-0964955657aa@archlinux.org> <5DF3D78B.4090208@tlinx.org> <20191213184213.GO851@eeg.ccf.org> <5DF4BDF0.6000402@tlinx.org> <20191216163906.GV851@eeg.ccf.org> <5DFA7AE2.2060504@tlinx.org> <20191218194651.GH851@eeg.ccf.org> <5DFD68B9.3050202@tlinx.org> <20191223132049.GW851@eeg.ccf.org>
Xref: csiph.com gnu.bash.bug:15770

On 2019/12/23 05:20, Greg Wooledge wrote:
> On Fri, Dec 20, 2019 at 04:35:05PM -0800, L A Walsh wrote:=
>   
> You can't simply translate $start and $end to single Unicode code point
> values, enumerate the Unicode characters between those two points,
> and translate those characters back to the user's locale.  That doesn't
> give you the correct answer.  There will be extra characters in the
> Unicode code point range that don't fit the solution, 
You would have to limit your enumeration to the locale range a well --
i.e. seeing if a character match the locale you wanted.

But NOTE -- I never suggested doing locale matching.

I just suggested Unicode code-point enumeration in Unicode CP order as
a first delivered feature.  I thought that would be much easier.
 

> The only way to do it is to iterate over the ENTIRE code point space,
> however many millions or billions of characters that is today.
>   
It took less than a tenth of a second in perl, so probably a fraction
of that in 'C'.

> Is that what you are proposing bash should do, in order to get a working
> brace expansion outside of the C locale?  I don't believe this is an
> acceptable solution.
>   
I said I'd proably go with enumeration between two code points as a
first step, but even going through the entire unicode code space
is trivially fast on modern computers.