Path: csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail From: Eli Schwartz Newsgroups: gnu.bash.bug Subject: Re: Unicode range and enumeration support. Date: Wed, 18 Dec 2019 15:08:20 -0500 Lines: 85 Approved: bug-bash@gnu.org Message-ID: References: <9dd3a388-39b1-c059-de99-813f1e411764@case.edu> <5DF2987E.5000309@tlinx.org> <568aeaaa-22b3-c7b9-0e18-a92bef6d2ffb@iki.fi> <5DF2FE31.9070406@tlinx.org> <0ff3a920-94c2-b0c9-5631-0964955657aa@archlinux.org> <5DF3D78B.4090208@tlinx.org> <20191213184213.GO851@eeg.ccf.org> <5DF4BDF0.6000402@tlinx.org> <20191216163906.GV851@eeg.ccf.org> <5DFA7AE2.2060504@tlinx.org> <20191218194651.GH851@eeg.ccf.org> NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="MX5vqfzlCPLMw5KbOB6A1mObRcqtJSpmd" X-Trace: usenet.stanford.edu 1576699724 28882 209.51.188.17 (18 Dec 2019 20:08:44 GMT) X-Complaints-To: action@cs.stanford.edu To: bug-bash@gnu.org Envelope-to: bug-bash@gnu.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=archlinux.org; s=orion; t=1576699703; bh=oLixDTkiqbKl9872j3WhMl13DFSrG6f8x00AlUZO0Es=; h=Subject:To:References:From:Date:In-Reply-To; b=d70p3cfI1ChqhXjE2q/LbdqB6x8vC5IJqFjSAiBumFBEqobG9EZIUSxmhcOj7roF6 PF8xETyNF1b8+n6JCUrTbuGySiuOTH2+CJ2xSSfIWhsDaHJc+2DxXdpv0SfSbFhiDh n+Cu77SsiGuu2DvM1FntkV/rMGanDU0C5Twmlk9nf4/LPJcQ0EJAEra+HiNC8CPoy4 +jZ2yl9U8+yNvreVT2nPRTbg2Sh3N3PeQIecoMT2KZzw+OXxM/AjDQSuC9Tvqxoodw atG75kJc1f+P9u0ZIdOqp6nO9zZQoZQd2VAcIflBC1l4nS1egUcoKaXb0VmqHW1xlQ 8TOoPk+7NjNVz/YrVIUsdnpIM/l2pcW2rRv5q/U/kPCE3/jK8xxtjYOYj14Vp/3nMa tDB3Sz28yfys1cAuAr3efgndKUBisprDyVGYUNO2Or6++xYk4mmb2mo/yiTC2YS6Zh FyePLsqeJQg6yzfdvVbRG3WXeulbS5ZnPTOP2evhl/CRS25ar9DtR36R9gygt0U5T/ nVesjTOayhy9sZe7M2vHCQq7plJvAE1IX9ZO8dL7F5N82YBaEp3D5eq6k0zPYlxzyn xxbdd+vFXPxWGEPD635aH6A8YjzDvJnRnespIrZlM64cczbwyBGcy25jLxi4Jl/VhQ hwhX1U8XP8ADDSNtHcDPbFg0= X-Clacks-Overhead: GNU Terry Pratchett User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.3.0 In-Reply-To: <20191218194651.GH851@eeg.ccf.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a01:4f8:160:6087::1 X-BeenThere: bug-bash@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports for the GNU Bourne Again SHell List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: X-Mailman-Original-References: <9dd3a388-39b1-c059-de99-813f1e411764@case.edu> <5DF2987E.5000309@tlinx.org> <568aeaaa-22b3-c7b9-0e18-a92bef6d2ffb@iki.fi> <5DF2FE31.9070406@tlinx.org> <0ff3a920-94c2-b0c9-5631-0964955657aa@archlinux.org> <5DF3D78B.4090208@tlinx.org> <20191213184213.GO851@eeg.ccf.org> <5DF4BDF0.6000402@tlinx.org> <20191216163906.GV851@eeg.ccf.org> <5DFA7AE2.2060504@tlinx.org> <20191218194651.GH851@eeg.ccf.org> Xref: csiph.com gnu.bash.bug:15750 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --MX5vqfzlCPLMw5KbOB6A1mObRcqtJSpmd Content-Type: multipart/mixed; boundary="GfVgBeImRH3odqxwZ10si8RgvJGyMv5Rs" --GfVgBeImRH3odqxwZ10si8RgvJGyMv5Rs Content-Type: text/plain; charset=utf-8 Content-Language: en-US-large Content-Transfer-Encoding: quoted-printable On 12/18/19 2:46 PM, Greg Wooledge wrote: > Sorting these characters is also possible, once they have been generate= d. > This is (I think!) what allows things like [Z-a] to work at all: you > can check whether $c is >=3D 'Z' and <=3D 'a', without knowing what all= of > the characters in between are. But you can't ask "what comes after Z".= >=20 > wooledg:~$ for ((i=3D1; i<=3D200; i++)); do printf -v tmp %04x "$i"; pr= intf -v c "\\u$tmp"; if [[ $c =3D [[:alpha:]] ]]; then printf %s\\n "$c";= fi; done | sort | tr -d \\n; echo > aA=C2=AA=C3=81=C3=80=C3=82=C3=85=C3=84=C3=83=C3=86bBcC=C3=87dDeE=C3=88f= FgGhHiIjJkKlLmMnNoO=C2=BApPqQrRsStTuUvVwWxXyYzZ=C2=B5 >=20 > Again, this is only PART of the set, and is not intended to be a > complete enumeration of the :alpha: characters in my system's locale. There's no need to sort ASCII characters, though, since the collation order of [A-z] in the C locale is defined by their numeric codepoint order. That is a guarantee that doesn't follow through in other locales. So all bash needs to do to print {Z..a} is to take Z =3D=3D ASCII decimal= 90 and a =3D=3D ASCII decimal 97, then enumerate the numbers 90-97 and translate them into ascii. No locale awareness is needed, no heuristics, no invocation of the locale subsystem, you don't even need to hardcode the ASCII range in source code. And that's why bash can support enumerating a range of ASCII characters in LC_COLLATE=3DC order, when it cannot (easily) do so using other locale= s. --=20 Eli Schwartz Arch Linux Bug Wrangler and Trusted User --GfVgBeImRH3odqxwZ10si8RgvJGyMv5Rs-- --MX5vqfzlCPLMw5KbOB6A1mObRcqtJSpmd Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEvSewel70XCra9w4EhIGKaBmvSpsFAl36hzQACgkQhIGKaBmv SpsYww/+O83xWMLOajrADOLP+ou8YQFQEO3rRqvZ2ZKCfuMBp26jtz68wmn/zVkE S1/f3qNHQx2S3De3d+xTRsm41AlekP8s4WwIne7EArvS/OIZwlxXnf1yfBq3P29H 7N0WYg5REUxVl7BsVDXE7tnJwdGQ214715NvV7lSZ2LUncMcY/78Fo5K879gGT/o bZN/Al9ScspE1WnBxgNTRDU3bRVjAzUKhqWtIg/Nlfi5YU1YqLdCMhTF07XGVHYG /rCtWJo9CBhroakGZCYkQurN549sLeyEGX9a3IOrQH+4CblJYT5k2kWlUNykFGyg wa3RiBbqgWtb37HAEMJpcWPj7aOt49tZJ/Z1KA5uQf/SPVJh2nVA2nPuHhVOqfV5 jfTwqRhwPLwwLF3ACWniKAHbfBmbtZFCn92X8nm4dymibGNXZCC1ymYfuQfD2Hmk 90z0R9dwRF9Vzh+Y4ppTLk9o+dprQt+1UZPWbazs9QCeozwYwGXIMvpYEI2oMpEm EzNf7vd0ZNamY3QfwOzH8VN1580QhxbIgmVEACsey4k6U4SIXy2vJy7hGQhLPb5x NrLz5rTCau7LF6mgNEgSHNJ+tQpC0aOJgaNupKMMV0b47Vl/YqDlUB/3gzJcqsUF F1GeZzmp2Z4EzeGFSmofQCPBySC5tNqdjumcdtrivdUoOCTfw4SJAjMEAQEKAB0W IQRgQRMEwJ02YoNA7v/OsWfvtXIr1gUCXfqHNAAKCRDOsWfvtXIr1i5TD/9Pcnpo Yku9QFNT9QVOiy2ujau7O9AMjg7Bhah2yxIJC879IsDScbn2l50DlNnaMoOULi3H D84CDbXFSpi8nEif0ke3wUW/+XXC6KjdhTzDd2dV+/6iChwQFUuA1H6hsrbDAI88 YD8lyrlwTmfkgjg/eOQ+XMUkqQHHQFSbVgOJDBrJ3Mtc5NpUc6oBlawHjiseFo4n 4cGeBrJCt0GHvF6orus11nTV6frdorNd2AjJ1TRVVmxua3sP8uMRNxFCbbfBDi8m XkKz1q4qxyvEDvGlMXjkcgqcnRXmk/prrC98qKIx3PstqJ2q0hnH0eovogEn7X7j ISS8PZu0BJizFBltex5c8V5jQmXMkTguS5Q0psJ24XQlZDWCp1WomuHEzwJUTzOm BjB63u26yF1tYdwrUwz5JQ/JndlBX/2ESzsEJ+hAJvIzWqivQqSp8EDsIiNSrNB1 l6Ven0u3xtsYd+QoRLNW3b/AHCswpIRYXgF3oh7VXTVLvkbMhTIedbZxNRhxO3Xb mZfCwnkaxs4t/ct9R5dV32+pnVD25FjuK/Hg8kHPSW7bq2aoJuosmzAMaNUg0Qfp TL3DF9+XzRJ4RTMapagPLX2VrOuLNKyUOpiOwClLpw92zrWjcHNiqn60yx+eiFbj 3JtLahh7qmFPw0s0saQPQHOJB2Efjg7gqtAYpQ== =MQ2b -----END PGP SIGNATURE----- --MX5vqfzlCPLMw5KbOB6A1mObRcqtJSpmd--