Path: csiph.com!3.us.feeder.erje.net!feeder.erje.net!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail From: L A Walsh Newsgroups: gnu.bash.bug Subject: Re: Unicode range and enumeration support. Date: Mon, 23 Dec 2019 21:34:56 -0800 Lines: 73 Approved: bug-bash@gnu.org Message-ID: References: <0ff3a920-94c2-b0c9-5631-0964955657aa@archlinux.org> <5DF3D78B.4090208@tlinx.org> <20191213184213.GO851@eeg.ccf.org> <5DF4BDF0.6000402@tlinx.org> <20191216163906.GV851@eeg.ccf.org> <5DFA7AE2.2060504@tlinx.org> <20191218194651.GH851@eeg.ccf.org> <5DFD68B9.3050202@tlinx.org> <2334eff4-8a88-18ee-b086-4ba4e80af01b@archlinux.org> <5E0128F0.5000901@tlinx.org> <20191223205826.GY851@eeg.ccf.org> <5E01A380.8000706@tlinx.org> NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Trace: usenet.stanford.edu 1577165707 26631 209.51.188.17 (24 Dec 2019 05:35:07 GMT) X-Complaints-To: action@cs.stanford.edu To: Greg Wooledge , bug-bash Envelope-to: bug-bash@gnu.org User-Agent: Thunderbird In-Reply-To: <20191223205826.GY851@eeg.ccf.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] [fuzzy] X-Received-From: 173.164.175.65 X-BeenThere: bug-bash@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports for the GNU Bourne Again SHell List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: <5E01A380.8000706@tlinx.org> X-Mailman-Original-References: <0ff3a920-94c2-b0c9-5631-0964955657aa@archlinux.org> <5DF3D78B.4090208@tlinx.org> <20191213184213.GO851@eeg.ccf.org> <5DF4BDF0.6000402@tlinx.org> <20191216163906.GV851@eeg.ccf.org> <5DFA7AE2.2060504@tlinx.org> <20191218194651.GH851@eeg.ccf.org> <5DFD68B9.3050202@tlinx.org> <2334eff4-8a88-18ee-b086-4ba4e80af01b@archlinux.org> <5E0128F0.5000901@tlinx.org> <20191223205826.GY851@eeg.ccf.org> Xref: csiph.com gnu.bash.bug:15773 On 2019/12/23 12:58, Greg Wooledge wrote: > On Mon, Dec 23, 2019 at 12:52:00PM -0800, L A Walsh wrote: > =20 >> But it wasn't. It was about generating characters between two >> characters that were given. In unicode, that would be two code points= =2E >> Nothing about enumeration. >> =20 > > Please give an example, with a starting character and an ending > character, and the resulting output. Explain why a bash user who > uses your implementation to echo {=C3=B1..=F0=9F=92=A9} (N WITH TILDE t= o PILE OF > POO) or whatever will feel that your answer is correct and sensible. > =20 #!/usr/bin/perl use warnings;use strict; use v5.16; use utf8; use constant sch =3D> q(=E2=85=A0); # start at roman numeral one use constant ech =3D> q(=E2=85=BF); # end at small roman numeral 1000 use constant scp =3D> ord(sch); use constant ecp =3D> ord(ech); my %range; $range{$_} =3D $_ for scp .. ecp; my @range =3D sort {$a cmp $b} keys %range; my $cnt=3D1000000; my $RE=3Dqr{\pN}; for (1 .. $cnt) { my $out=3D""; for my $v (@range) { my $ch=3Dchr($v); $out .=3D $ch.q( ) if $ch =3D~ m{$RE}; #match unicode property = "is_num" } print $out."\n"; } ------------ 1 million runs of the central loop takes about 31s. So 1 run would be pretty fast. Unicode has 17 planes of 64K chars each =3D 1,114,112 chars of which abou= t 10% are currently used. (https://www.babelstone.co.uk/Unicode/HowMany.html). I'm not sure what you want me to say about the range you chose, other than it would be about 128,000 characters.=20 It would be about the same argument, for or against in using {241..128169}. I know you are trying to make some point, but I'm missing it. It would be helpful if one could use hex in the ranges, like {0x20..0x110000} to enumerate all code points, leaving out the control-chars area.=20 > =20 >> It is in unicode code point order. Which is what you would use >> for unicode. If you want to sort via unicode, use the -u switch. >> =20 > > That isn't what the sort -u option does, and you know it. I hope. > =20 Yeah...though, _I don't remember it_, _if_ I'm not entering it in a command to use it. I.e just as I'm about to type "|uniq", I remember that I'll remember that '-u' is a recently added 'shortcut' to get the same output.