Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.034 X-Spam-Evidence: '*H*': 0.93; '*S*': 0.00; 'subject:Python': 0.06; '128': 0.09; 'cc:addr:python-list': 0.11; "wouldn't": 0.14; 'character.': 0.16; 'combinations': 0.16; 'distinct': 0.16; 'hypothetical': 0.16; 'letters.': 0.16; 'subject:Could': 0.16; 'subject:Unicode': 0.16; 'to:addr:pearwood.info': 0.16; 'to:addr:steve+comp.lang.python': 0.16; "to:name:steven d'aprano": 0.16; 'language': 0.16; 'sender:addr:gmail.com': 0.17; 'wrote:': 0.18; '>>>': 0.22; 'aug': 0.22; 'bonus': 0.22; 'cc:addr:python.org': 0.22; 'cc:2**0': 0.24; 'header:In-Reply- To:1': 0.27; 'point': 0.28; 'points': 0.29; "doesn't": 0.30; 'characters': 0.30; 'needed.': 0.30; 'message-id:@mail.gmail.com': 0.30; 'gives': 0.31; 'code': 0.31; 'that.': 0.31; '+0100,': 0.31; "d'aprano": 0.31; 'steven': 0.31; 'languages': 0.32; 'are:': 0.33; 'maybe': 0.34; 'subject:the': 0.34; 'received:google.com': 0.35; 'add': 0.35; 'there': 0.35; 'combination': 0.36; 'subject:List': 0.36; "didn't": 0.36; 'subject:?': 0.36; 'should': 0.36; 'needed': 0.38; 'enough': 0.39; 'how': 0.40; 'letters': 0.60; 'august': 0.61; 'more': 0.64; 'total': 0.65; 'different': 0.65; 'due': 0.66; 'six': 0.68; 'combining': 0.68; 'subject:this': 0.83; 'characters,': 0.84; 'dozens': 0.84; 'points,': 0.84; 'subject:you': 0.87; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=xJDF63qRp4F3b7P9J/jUmR34CZmy52FpWDYnFUY8pUs=; b=rXAQU6d05RNmcUxlEE0VSaY7ikijSXl2OfCUWPNZW031Wr4ZOl3b4BF4SOyiu4fWNU tS3RsN/9Bo42IPrShzRmuWw2MaI8RZk3zD5Db/ktEXZeWxtp7sDdEhDs78zicmq1iSRP b8TLYgeQyOFSw4+jtBj7Vzjt9pNqPY7fHwjPyEKNVAMF71BByZ/9UKUyKPaa6aV5z+RI X9tCo83tyu7QKrvjMKflQd2TTPQh1X2TFad4ej5yBMWPvvXUZCDKjXZ8qsJOQjpr9zm0 kpffMCmmx3uMEwREBwU/eQhxPZSVuE83cmswiKjPFeBgWktfHBVJr9/K1DNkl6y67P5p t9Og== X-Received: by 10.112.61.199 with SMTP id s7mr7701594lbr.53.1376222389890; Sun, 11 Aug 2013 04:59:49 -0700 (PDT) MIME-Version: 1.0 Sender: joshua.landau.ws@gmail.com In-Reply-To: <5207722c$0$30000$c3e8da3$5496439d@news.astraweb.com> References: <520754d7$0$30000$c3e8da3$5496439d@news.astraweb.com> <5207722c$0$30000$c3e8da3$5496439d@news.astraweb.com> From: Joshua Landau Date: Sun, 11 Aug 2013 12:59:09 +0100 X-Google-Sender-Auth: RRsylt1-_npmU34R2mOf37Y9eUY Subject: Re: Could you verify this, Oh Great Unicode Experts of the Python-List? To: "Steven D'Aprano" Content-Type: text/plain; charset=UTF-8 Cc: python-list X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 37 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1376222399 news.xs4all.nl 15947 [2001:888:2000:d::a6]:58823 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:52382 On 11 August 2013 12:14, Steven D'Aprano wrote: > On Sun, 11 Aug 2013 10:44:40 +0100, Joshua Landau wrote: > >> On 11 August 2013 10:09, Steven D'Aprano >> wrote: >>> The reason some accented letters have single code point forms is to >>> support legacy charsets; the reason some only exist as combining >>> characters is due to the combinational explosion. Some languages allow >>> you to add up to five or six different accent on any of dozens of >>> different letters. If each combination needed its own unique code >>> point, there wouldn't be enough code points. For bonus points, if there >>> are five accents that can be placed in any combination of zero or more >>> on any of four characters, how many code points would be needed? >> >> 52? > > More than double that. > > Consider a single character. It can have 0 to 5 accents, in any > combination. Order doesn't matter, and there are no duplicates, so there > are: > > 0 accent: take 0 from 5 = 1 combination; > 1 accent: take 1 from 5 = 5 combinations; > 2 accents: take 2 from 5 = 5!/(2!*3!) = 10 combinations; > 3 accents: take 3 from 5 = 5!/(3!*2!) = 10 combinations; > 4 accents: take 4 from 5 = 5 combinations; > 5 accents: take 5 from 5 = 1 combination > > giving a total of 32 combinations for a single character. Since there are > four characters in this hypothetical language that take accents, that > gives a total of 4*32 = 128 distinct code points needed. I didn't see "four characters", and I did (1 + 5 + 10) * 2 and came up with 52... Maybe I should get more sleep.