Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > gnu.bash.bug > #15767

Re: Unicode range and enumeration support.

Path csiph.com!goblin1!goblin.stu.neva.ru!usenet.stanford.edu!not-for-mail
From Eli Schwartz <eschwartz@archlinux.org>
Newsgroups gnu.bash.bug
Subject Re: Unicode range and enumeration support.
Date Sun, 22 Dec 2019 01:38:13 -0500
Lines 123
Approved bug-bash@gnu.org
Message-ID <mailman.1257.1576996710.1979.bug-bash@gnu.org> (permalink)
References <9dd3a388-39b1-c059-de99-813f1e411764@case.edu> <5DF2987E.5000309@tlinx.org> <568aeaaa-22b3-c7b9-0e18-a92bef6d2ffb@iki.fi> <5DF2FE31.9070406@tlinx.org> <0ff3a920-94c2-b0c9-5631-0964955657aa@archlinux.org> <5DF3D78B.4090208@tlinx.org> <20191213184213.GO851@eeg.ccf.org> <5DF4BDF0.6000402@tlinx.org> <20191216163906.GV851@eeg.ccf.org> <5DFA7AE2.2060504@tlinx.org> <20191218194651.GH851@eeg.ccf.org> <5DFD68B9.3050202@tlinx.org> <2334eff4-8a88-18ee-b086-4ba4e80af01b@archlinux.org>
NNTP-Posting-Host lists.gnu.org
Mime-Version 1.0
Content-Type multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="zWc8LRWkuowOa8Iae44Kkh8rYdY4PGOQy"
X-Trace usenet.stanford.edu 1576996711 3116 209.51.188.17 (22 Dec 2019 06:38:31 GMT)
X-Complaints-To action@cs.stanford.edu
To bug-bash@gnu.org
Envelope-to bug-bash@gnu.org
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/simple; d=archlinux.org; s=orion; t=1576996696; bh=KBP1FQLGhBN3rrrJsaSp/zsjVHJ+jgJ4l0FTgyGKU4I=; h=Subject:To:References:From:Date:In-Reply-To; b=Y8eCiZ2X7tnaj8o8YFmO/l7B0LOydW0ACCZYnGlS2y3K6sjoaZEHZSlQrzk0lo8ne TKi9/I4FLengEnU5HRZiUdJ65vHnSFVNYjYEIlUgpFU/KzZYtnb3Ziuoa0iKx0CjYe 2II11hCG+lgq7DjqVVutCR8zjYTh+uY3on7YOXcxnUwCbTFFwXFZI23kqHiNINeNjh 24TXi0V83ipJOdf9RBiUI+yzPiaAM6/9ZELJSuIgwOPlW/vPHAO6Hn38zTnT+J0g9n w/KNdaCjjH9Q2NNmO70qzT4bL2lYrm5EURkVKnpylxtJYvIiT4rCJhpHA0MrlFsTFP llQWlWebNm//z7QIYK6DxhIVWkkblcf9FM29VlvmluJLR3Br6UByACyjLqt0kg5YtR A7G9+8AAxaWCK1o+AGinGxjSFheMVG1ZNSndUbYWO0ZN5njMnli2VUV/QQkwlXWO1Q 2MuxJ+57o/v1TWPtplCudh3dbLEV7z1Zg1bhodrA6bvPjcAvDHdavoeqfY/hzBeoY2 UjQyRXFOV8+1zNPvLoQSbJnQIOWkVsfciC9EAb9AqA1i1SLniG3WKFy+8AeyCjI+ju sbs3l1/IQXrn5/ZOrdluv/4P4VXYQ4boz8UYh+LyKCPnXc0xe11X4xvGvb1OcU6yKC K1eqrs2Fxvj81Fruxqz5Aug0=
X-Clacks-Overhead GNU Terry Pratchett
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.3.0
In-Reply-To <5DFD68B9.3050202@tlinx.org>
X-detected-operating-system by eggs.gnu.org: Genre and OS details not recognized.
X-Received-From 2a01:4f8:160:6087::1
X-BeenThere bug-bash@gnu.org
X-Mailman-Version 2.1.23
Precedence list
List-Id Bug reports for the GNU Bourne Again SHell <bug-bash.gnu.org>
List-Unsubscribe <https://lists.gnu.org/mailman/options/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=unsubscribe>
List-Archive <https://lists.gnu.org/archive/html/bug-bash>
List-Post <mailto:bug-bash@gnu.org>
List-Help <mailto:bug-bash-request@gnu.org?subject=help>
List-Subscribe <https://lists.gnu.org/mailman/listinfo/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=subscribe>
X-Mailman-Original-Message-ID <2334eff4-8a88-18ee-b086-4ba4e80af01b@archlinux.org>
X-Mailman-Original-References <9dd3a388-39b1-c059-de99-813f1e411764@case.edu> <5DF2987E.5000309@tlinx.org> <568aeaaa-22b3-c7b9-0e18-a92bef6d2ffb@iki.fi> <5DF2FE31.9070406@tlinx.org> <0ff3a920-94c2-b0c9-5631-0964955657aa@archlinux.org> <5DF3D78B.4090208@tlinx.org> <20191213184213.GO851@eeg.ccf.org> <5DF4BDF0.6000402@tlinx.org> <20191216163906.GV851@eeg.ccf.org> <5DFA7AE2.2060504@tlinx.org> <20191218194651.GH851@eeg.ccf.org> <5DFD68B9.3050202@tlinx.org>
Xref csiph.com gnu.bash.bug:15767

Show key headers only | View raw


[Multipart message — attachments visible in raw view] - view raw

On 12/20/19 7:35 PM, L A Walsh wrote:
> On 2019/12/18 11:46, Greg Wooledge wrote:
>> To put it another way: you can write code that determines whether
>> an input character $c matches a glob or regex like [Z-a].  (Maybe.)
>>
>> But, you CANNOT write code to generate all of the characters from Z to a
>>   
> This generates characters from decimal 8300 - 8400 (because that range
> includes raised and lowered digits which have the number and value
> properties equivalent to 0-9.
> 
> ----
> 
> No? 8300, 8400 arbitrary code points that contain raised and lowered
> numbers
> that have the number property (as does 0..9):
> 
> perl -we' use strict; use v5.16;
> my $c;
> for ($c=8300;$c<8400;++$c) {
> my $o=chr $c;
> printf "%s", $o if $o=~/\pN/;   #match unicode property "is_num"
> };printf "\n"'
> ⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉
> 
> Q.E.D.
> 
> 
> Is that sufficient proof?

It's sufficient proof that you're wrong, yes.

Given the discussion was about collation, not simply enumerating
codepoints in order of their codepoint values, it would be helpful to
actually, you know, collate them.

Given your sample text range:

$ printf %s\\n ⁰ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹ ₀ ₁ ₂ ₃ ₄ ₅ ₆ ₇ ₈ ₉ | sort
⁰
₀
₁
₂
₃
⁴
₄
⁵
₅
⁶
₆
⁷
₇
⁸
₈
⁹
₉

This is plainly not in byte order.

Now you need to ask yourself the question: which locale do you want to
sort according to? I used en_US.UTF-8. Please don't say "C.UTF-8",
because that's not actually a thing. And the plain C locale won't work
for obvious reasons...

-- 
Eli Schwartz
Arch Linux Bug Wrangler and Trusted User

Back to gnu.bash.bug | Previous | Next | Find similar | Unroll thread


Thread

Re: Unicode range and enumeration support. Eli Schwartz <eschwartz@archlinux.org> - 2019-12-22 01:38 -0500

csiph-web