Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.007 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'subject:Python': 0.06; '128': 0.09; 'bits': 0.09; 'meaningful': 0.09; 'method,': 0.09; 'oh,': 0.09; 'character.': 0.16; 'combination,': 0.16; 'combinations': 0.16; 'distinct': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'hypothetical': 0.16; 'notation,': 0.16; 'subject:Could': 0.16; 'subject:Unicode': 0.16; 'them...': 0.16; 'language': 0.16; 'wrote:': 0.18; 'aug': 0.22; 'unicode': 0.24; 'header:In-Reply-To:1': 0.27; 'character': 0.29; 'points': 0.29; "doesn't": 0.30; 'characters': 0.30; 'needed.': 0.30; 'message-id:@mail.gmail.com': 0.30; 'gives': 0.31; 'code': 0.31; "d'aprano": 0.31; 'steven': 0.31; 'are:': 0.33; 'subject:the': 0.34; 'could': 0.34; 'received:google.com': 0.35; 'there': 0.35; 'really': 0.36; 'combination': 0.36; 'subject:List': 0.36; 'possible': 0.36; 'subject:?': 0.36; 'too': 0.37; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'to:addr:python.org': 0.39; 'either': 0.39; 'even': 0.60; 'easy': 0.60; 'most': 0.60; 'simply': 0.61; 'total': 0.65; 'combining': 0.68; 'subject:this': 0.83; 'subject:you': 0.87; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=9fSJn1HXK+f0snrAD9ZMAh4a6PubNnT93b3HyYMkDv0=; b=o3JToMnJLBIAW65rl5LHfNQL5n7dL+ozUAOrvawfHmGEpS5/VBQnPZDm1KrLM6Ubyu /r5EbmzQDHZXq8gHw04x7Yptmnvr3Y+opzikHKyfuHsrocMHET19nxSDlQwgcvYieVXm vdVvu5SzE0RYss9nC/1JgSVTWgFWlKyewZP2N/G8pXB4TeoDFVtuZpa/EgHWYqeTqgSG ED8q/oGpqhGCCozuwN3P5twpS202M7ywWTQR+s57GnSELKoX/eYTuhZ+LfxJYeooKdEx 2bGJyu2rxa0NdgPyP6TKL44ZUmkghTyZSVdlKcplSZ42RYXeqjru6N32OnUgFH+Dg8A+ Wxlw== MIME-Version: 1.0 X-Received: by 10.52.34.40 with SMTP id w8mr3528826vdi.7.1376221541636; Sun, 11 Aug 2013 04:45:41 -0700 (PDT) In-Reply-To: <5207722c$0$30000$c3e8da3$5496439d@news.astraweb.com> References: <520754d7$0$30000$c3e8da3$5496439d@news.astraweb.com> <5207722c$0$30000$c3e8da3$5496439d@news.astraweb.com> Date: Sun, 11 Aug 2013 12:45:41 +0100 Subject: Re: Could you verify this, Oh Great Unicode Experts of the Python-List? From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 37 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1376221550 news.xs4all.nl 15864 [2001:888:2000:d::a6]:51556 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:52380 On Sun, Aug 11, 2013 at 12:14 PM, Steven D'Aprano wrote: > Consider a single character. It can have 0 to 5 accents, in any > combination. Order doesn't matter, and there are no duplicates, so there > are: > > 0 accent: take 0 from 5 = 1 combination; > 1 accent: take 1 from 5 = 5 combinations; > 2 accents: take 2 from 5 = 5!/(2!*3!) = 10 combinations; > 3 accents: take 3 from 5 = 5!/(3!*2!) = 10 combinations; > 4 accents: take 4 from 5 = 5 combinations; > 5 accents: take 5 from 5 = 1 combination > > giving a total of 32 combinations for a single character. Since there are > four characters in this hypothetical language that take accents, that > gives a total of 4*32 = 128 distinct code points needed. There's an easy way to calculate it. Instead of the "take N from 5" notation, simply look at it as a set of independent bits - each of your accents may be either present or absent. So it's 1<<5 combinations for a single character, which is the same 32 figure you came up with, but easier to work with in the ridiculous case. > In reality, Unicode has currently code points U+0300 to U+036F (112 code > points) to combining characters. It's not really meaningful to combine > all 112 of them, or even most of 112 of them... If you *were* to use literally ANY combination, that would be 1<<112 which is... uhh... five billion yottacombinations. Don't bother working that one out by the "take N" method, it'll take you too long :) Oh, and that's 1<<112 possible combining character combinations, so you then need to multiply that by the number of base characters you could use.... ChrisA