Path: csiph.com!goblin1!goblin.stu.neva.ru!usenet.stanford.edu!not-for-mail From: Eli Schwartz Newsgroups: gnu.bash.bug Subject: Re: Unicode range and enumeration support. Date: Sun, 22 Dec 2019 01:38:13 -0500 Lines: 123 Approved: bug-bash@gnu.org Message-ID: References: <9dd3a388-39b1-c059-de99-813f1e411764@case.edu> <5DF2987E.5000309@tlinx.org> <568aeaaa-22b3-c7b9-0e18-a92bef6d2ffb@iki.fi> <5DF2FE31.9070406@tlinx.org> <0ff3a920-94c2-b0c9-5631-0964955657aa@archlinux.org> <5DF3D78B.4090208@tlinx.org> <20191213184213.GO851@eeg.ccf.org> <5DF4BDF0.6000402@tlinx.org> <20191216163906.GV851@eeg.ccf.org> <5DFA7AE2.2060504@tlinx.org> <20191218194651.GH851@eeg.ccf.org> <5DFD68B9.3050202@tlinx.org> <2334eff4-8a88-18ee-b086-4ba4e80af01b@archlinux.org> NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="zWc8LRWkuowOa8Iae44Kkh8rYdY4PGOQy" X-Trace: usenet.stanford.edu 1576996711 3116 209.51.188.17 (22 Dec 2019 06:38:31 GMT) X-Complaints-To: action@cs.stanford.edu To: bug-bash@gnu.org Envelope-to: bug-bash@gnu.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=archlinux.org; s=orion; t=1576996696; bh=KBP1FQLGhBN3rrrJsaSp/zsjVHJ+jgJ4l0FTgyGKU4I=; h=Subject:To:References:From:Date:In-Reply-To; b=Y8eCiZ2X7tnaj8o8YFmO/l7B0LOydW0ACCZYnGlS2y3K6sjoaZEHZSlQrzk0lo8ne TKi9/I4FLengEnU5HRZiUdJ65vHnSFVNYjYEIlUgpFU/KzZYtnb3Ziuoa0iKx0CjYe 2II11hCG+lgq7DjqVVutCR8zjYTh+uY3on7YOXcxnUwCbTFFwXFZI23kqHiNINeNjh 24TXi0V83ipJOdf9RBiUI+yzPiaAM6/9ZELJSuIgwOPlW/vPHAO6Hn38zTnT+J0g9n w/KNdaCjjH9Q2NNmO70qzT4bL2lYrm5EURkVKnpylxtJYvIiT4rCJhpHA0MrlFsTFP llQWlWebNm//z7QIYK6DxhIVWkkblcf9FM29VlvmluJLR3Br6UByACyjLqt0kg5YtR A7G9+8AAxaWCK1o+AGinGxjSFheMVG1ZNSndUbYWO0ZN5njMnli2VUV/QQkwlXWO1Q 2MuxJ+57o/v1TWPtplCudh3dbLEV7z1Zg1bhodrA6bvPjcAvDHdavoeqfY/hzBeoY2 UjQyRXFOV8+1zNPvLoQSbJnQIOWkVsfciC9EAb9AqA1i1SLniG3WKFy+8AeyCjI+ju sbs3l1/IQXrn5/ZOrdluv/4P4VXYQ4boz8UYh+LyKCPnXc0xe11X4xvGvb1OcU6yKC K1eqrs2Fxvj81Fruxqz5Aug0= X-Clacks-Overhead: GNU Terry Pratchett User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.3.0 In-Reply-To: <5DFD68B9.3050202@tlinx.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a01:4f8:160:6087::1 X-BeenThere: bug-bash@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports for the GNU Bourne Again SHell List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: <2334eff4-8a88-18ee-b086-4ba4e80af01b@archlinux.org> X-Mailman-Original-References: <9dd3a388-39b1-c059-de99-813f1e411764@case.edu> <5DF2987E.5000309@tlinx.org> <568aeaaa-22b3-c7b9-0e18-a92bef6d2ffb@iki.fi> <5DF2FE31.9070406@tlinx.org> <0ff3a920-94c2-b0c9-5631-0964955657aa@archlinux.org> <5DF3D78B.4090208@tlinx.org> <20191213184213.GO851@eeg.ccf.org> <5DF4BDF0.6000402@tlinx.org> <20191216163906.GV851@eeg.ccf.org> <5DFA7AE2.2060504@tlinx.org> <20191218194651.GH851@eeg.ccf.org> <5DFD68B9.3050202@tlinx.org> Xref: csiph.com gnu.bash.bug:15767 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --zWc8LRWkuowOa8Iae44Kkh8rYdY4PGOQy Content-Type: multipart/mixed; boundary="BhGBtvVc53m2u1Wi8D9HLbMQxr6Cym5qu" --BhGBtvVc53m2u1Wi8D9HLbMQxr6Cym5qu Content-Type: text/plain; charset=utf-8 Content-Language: en-US-large Content-Transfer-Encoding: quoted-printable On 12/20/19 7:35 PM, L A Walsh wrote: > On 2019/12/18 11:46, Greg Wooledge wrote: >> To put it another way: you can write code that determines whether >> an input character $c matches a glob or regex like [Z-a].=C2=A0 (Maybe= =2E) >> >> But, you CANNOT write code to generate all of the characters from Z to= a >> =C2=A0=20 > This generates characters from decimal 8300 - 8400 (because that range > includes raised and lowered digits which have the number and value > properties equivalent to 0-9. >=20 > ---- >=20 > No? 8300, 8400 arbitrary code points that contain raised and lowered > numbers > that have the number property (as does 0..9): >=20 > perl -we' use strict; use v5.16; > my $c; > for ($c=3D8300;$c<8400;++$c) { > my $o=3Dchr $c; > printf "%s", $o if $o=3D~/\pN/;=C2=A0=C2=A0 #match unicode property "is= _num" > };printf "\n"' > =E2=81=B0=E2=81=B4=E2=81=B5=E2=81=B6=E2=81=B7=E2=81=B8=E2=81=B9=E2=82=80= =E2=82=81=E2=82=82=E2=82=83=E2=82=84=E2=82=85=E2=82=86=E2=82=87=E2=82=88=E2= =82=89 >=20 > Q.E.D. >=20 >=20 > Is that sufficient proof? It's sufficient proof that you're wrong, yes. Given the discussion was about collation, not simply enumerating codepoints in order of their codepoint values, it would be helpful to actually, you know, collate them. Given your sample text range: $ printf %s\\n =E2=81=B0 =E2=81=B4 =E2=81=B5 =E2=81=B6 =E2=81=B7 =E2=81=B8= =E2=81=B9 =E2=82=80 =E2=82=81 =E2=82=82 =E2=82=83 =E2=82=84 =E2=82=85 =E2= =82=86 =E2=82=87 =E2=82=88 =E2=82=89 | sort =E2=81=B0 =E2=82=80 =E2=82=81 =E2=82=82 =E2=82=83 =E2=81=B4 =E2=82=84 =E2=81=B5 =E2=82=85 =E2=81=B6 =E2=82=86 =E2=81=B7 =E2=82=87 =E2=81=B8 =E2=82=88 =E2=81=B9 =E2=82=89 This is plainly not in byte order. Now you need to ask yourself the question: which locale do you want to sort according to? I used en_US.UTF-8. Please don't say "C.UTF-8", because that's not actually a thing. And the plain C locale won't work for obvious reasons... --=20 Eli Schwartz Arch Linux Bug Wrangler and Trusted User --BhGBtvVc53m2u1Wi8D9HLbMQxr6Cym5qu-- --zWc8LRWkuowOa8Iae44Kkh8rYdY4PGOQy Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEvSewel70XCra9w4EhIGKaBmvSpsFAl3/D1UACgkQhIGKaBmv SpsYJBAAiolwa2VN87eCZmt9AtIlz5VAz/+4c9QHUmWuWDjCklsxyUry4h31oznY yq6DX5cNhAo5hS6u1K2Q9cY/z10Kbj83BewgbQMdjziOr/XxNOghiWrLtxNzeHqN IUpzLR+6BiTWFNihMWnfxoY2BzFqFZzk29F8rRLbzzqB9p2lzbXxkBU4NxOS+vCY REAcvN8DTeVKB9z3FQ/Uscg9AXu/0UeQbZSBAo4hck4lEiVypx35DSo6xekdQpD8 D23D0IgEp8SwTNUN+5WYB64N3U3wO/QDqYBIpUcD/0Q0NsXgBXXQOXIrCDneurHV jJOa3BpJ2GNeIk9jb4SdtTFljyV8XD77t0JnLpggvLssnbSRT8PUlB499syTV78l V8yunSJBEyoVVdmfMqWi5sByNrI++VBuYcKhJ9N9Dc2HzsAWE/7+JtXzlb7lS3IY 3hPaDj58xgFN0SZLalQ3b/K0v6EegLS5Zw7GAzAZU/hFC9gKzj/euygr3nL2gPi9 izjbwMoNt44qpsuX96UTJxxlemioIR3TYa8Jm/mLE4cDZXSS7kX0izUaO14yIPqP EbniIQPNi0oti9tA1drKkjRvtbWP/1pnpZAhTIFH7LtmHd9oDt4Xhsjeghlw/a7E ouRjOesgjeRA8Nsgqj4+dvQz2DqEE/BcGtjXfTZk+rrwIKd7h4aJAjMEAQEKAB0W IQRgQRMEwJ02YoNA7v/OsWfvtXIr1gUCXf8PVQAKCRDOsWfvtXIr1l0eEACl/jd1 y+6nXbNe0u3iWBYIpaYmtca2Y5Q5xRfaCojZft9Ydxwxj1EqzoBDZEQq9cgkEDfB BP6pb+cpK5IrovlZ8vCF7cp31hOYfVKzqI6aUVfDu+rJYhOoJE/xdOR13WvGQ5Mi 2swn30QK1ja6tSdtxp74Y1Lz9dVySPtkSDLd8oUXJwQV/5c0KxbLpM2+GyAeJa2k +VnaB4xl86QpNxTho9kyYlhZfkmMUj/QkATyBZQZ+G5WqkMWxIYsZKJHzr/rnyo1 3IXqUOWRH/p29j5299dGU8jTHFS7itIJGVUpoy0njQras7d3RDHCi2479OnBUOjD S9Smjp47K4FKWM0w01s2kacHTiUqX+vTf8kkng/kYwazEIRbMXIOifOH/cwe67Vh UTlvo2ucWvgbiC1dG6RonjoKx41qJFlVtFopQ2khoVpxA8izRHGQm4Mkalio0w6P lz4O06lx3exgF65g7tTS2WFJgEvy6BHdvs6nhJvCrT/1CZ0/It4L9Lai6QHLsCB2 7/CvPpUslX3FEnAvhb+x+Yaa1rKOgK4ZrK09DhmJ/xXmkLE6dGqeJfMCS5rvynh8 8rqIqACwY95ClnM45YFYykSJWJA1PzgoeVNTHIoxAI7tlXG6N4JV+4YwypwgUdSI EAKUSffikHT69xNKu+CvNlUenLXd6fEOtupu/Q== =5cgw -----END PGP SIGNATURE----- --zWc8LRWkuowOa8Iae44Kkh8rYdY4PGOQy--