Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #52382
| References | <mailman.468.1376201912.1251.python-list@python.org> <520754d7$0$30000$c3e8da3$5496439d@news.astraweb.com> <mailman.474.1376214330.1251.python-list@python.org> <5207722c$0$30000$c3e8da3$5496439d@news.astraweb.com> |
|---|---|
| From | Joshua Landau <joshua@landau.ws> |
| Date | 2013-08-11 12:59 +0100 |
| Subject | Re: Could you verify this, Oh Great Unicode Experts of the Python-List? |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.480.1376222399.1251.python-list@python.org> (permalink) |
On 11 August 2013 12:14, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > On Sun, 11 Aug 2013 10:44:40 +0100, Joshua Landau wrote: > >> On 11 August 2013 10:09, Steven D'Aprano >> <steve+comp.lang.python@pearwood.info> wrote: >>> The reason some accented letters have single code point forms is to >>> support legacy charsets; the reason some only exist as combining >>> characters is due to the combinational explosion. Some languages allow >>> you to add up to five or six different accent on any of dozens of >>> different letters. If each combination needed its own unique code >>> point, there wouldn't be enough code points. For bonus points, if there >>> are five accents that can be placed in any combination of zero or more >>> on any of four characters, how many code points would be needed? >> >> 52? > > More than double that. > > Consider a single character. It can have 0 to 5 accents, in any > combination. Order doesn't matter, and there are no duplicates, so there > are: > > 0 accent: take 0 from 5 = 1 combination; > 1 accent: take 1 from 5 = 5 combinations; > 2 accents: take 2 from 5 = 5!/(2!*3!) = 10 combinations; > 3 accents: take 3 from 5 = 5!/(3!*2!) = 10 combinations; > 4 accents: take 4 from 5 = 5 combinations; > 5 accents: take 5 from 5 = 1 combination > > giving a total of 32 combinations for a single character. Since there are > four characters in this hypothetical language that take accents, that > gives a total of 4*32 = 128 distinct code points needed. I didn't see "four characters", and I did (1 + 5 + 10) * 2 and came up with 52... Maybe I should get more sleep.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Could you verify this, Oh Great Unicode Experts of the Python-List? Joshua Landau <joshua@landau.ws> - 2013-08-11 07:17 +0100
Re: Could you verify this, Oh Great Unicode Experts of the Python-List? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-11 09:09 +0000
Re: Could you verify this, Oh Great Unicode Experts of the Python-List? Joshua Landau <joshua@landau.ws> - 2013-08-11 10:44 +0100
Re: Could you verify this, Oh Great Unicode Experts of the Python-List? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-11 11:14 +0000
Re: Could you verify this, Oh Great Unicode Experts of the Python-List? Chris Angelico <rosuav@gmail.com> - 2013-08-11 12:45 +0100
Re: Could you verify this, Oh Great Unicode Experts of the Python-List? Joshua Landau <joshua@landau.ws> - 2013-08-11 12:59 +0100
Re: Could you verify this, Oh Great Unicode Experts of the Python-List? Joshua Landau <joshua@landau.ws> - 2013-08-13 09:40 +0100
Re: Could you verify this, Oh Great Unicode Experts of the Python-List? wxjmfauth@gmail.com - 2013-08-11 05:51 -0700
Re: Could you verify this, Oh Great Unicode Experts of the Python-List? Joshua Landau <joshua@landau.ws> - 2013-08-11 14:07 +0100
csiph-web