Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > gnu.bash.bug > #15750

Re: Unicode range and enumeration support.

From Eli Schwartz <eschwartz@archlinux.org>
Newsgroups gnu.bash.bug
Subject Re: Unicode range and enumeration support.
Date 2019-12-18 15:08 -0500
Message-ID <mailman.1101.1576699723.1979.bug-bash@gnu.org> (permalink)
References (7 earlier) <5DF4BDF0.6000402@tlinx.org> <20191216163906.GV851@eeg.ccf.org> <5DFA7AE2.2060504@tlinx.org> <20191218194651.GH851@eeg.ccf.org> <c1565d25-c4fc-ff50-4112-c38cee8d2f80@archlinux.org>

Show all headers | View raw


[Multipart message — attachments visible in raw view] - view raw

On 12/18/19 2:46 PM, Greg Wooledge wrote:
> Sorting these characters is also possible, once they have been generated.
> This is (I think!) what allows things like [Z-a] to work at all: you
> can check whether $c is >= 'Z' and <= 'a', without knowing what all of
> the characters in between are.  But you can't ask "what comes after Z".
> 
> wooledg:~$ for ((i=1; i<=200; i++)); do printf -v tmp %04x "$i"; printf -v c "\\u$tmp"; if [[ $c = [[:alpha:]] ]]; then printf %s\\n "$c"; fi; done | sort | tr -d \\n; echo
> aAªÁÀÂÅÄÃÆbBcCÇdDeEÈfFgGhHiIjJkKlLmMnNoOºpPqQrRsStTuUvVwWxXyYzZµ
> 
> Again, this is only PART of the set, and is not intended to be a
> complete enumeration of the :alpha: characters in my system's locale.

There's no need to sort ASCII characters, though, since the collation
order of [A-z] in the C locale is defined by their numeric codepoint
order. That is a guarantee that doesn't follow through in other locales.

So all bash needs to do to print {Z..a} is to take Z == ASCII decimal 90
and a == ASCII decimal 97, then enumerate the numbers 90-97 and
translate them into ascii. No locale awareness is needed, no heuristics,
no invocation of the locale subsystem, you don't even need to hardcode
the ASCII range in source code.

And that's why bash can support enumerating a range of ASCII characters
in LC_COLLATE=C order, when it cannot (easily) do so using other locales.

-- 
Eli Schwartz
Arch Linux Bug Wrangler and Trusted User

Back to gnu.bash.bug | Previous | Next | Find similar | Unroll thread


Thread

Re: Unicode range and enumeration support. Eli Schwartz <eschwartz@archlinux.org> - 2019-12-18 15:08 -0500

csiph-web