Path: csiph.com!usenet.pasdenom.info!goblin2!goblin.stu.neva.ru!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.003 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'say,': 0.05; 'sufficient': 0.05; 'subject:Python': 0.06; 'subject: -- ': 0.07; 'utf-8': 0.07; 'encode': 0.09; 'cc:addr:python-list': 0.11; '(easier': 0.16; '5:00': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'reedy': 0.16; 'statement.': 0.16; 'surrogate': 0.16; 'wrote:': 0.18; 'wed,': 0.18; 'cc:addr:python.org': 0.22; 'accommodate': 0.24; 'bytes': 0.24; 'char': 0.24; 'unicode': 0.24; 'cc:2**0': 0.24; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; 'character': 0.29; 'message-id:@mail.gmail.com': 0.30; 'code': 0.31; 'subject:skip:i 10': 0.31; 'problem': 0.35; 'received:google.com': 0.35; 'scheme': 0.36; 'whatever': 0.38; 'pm,': 0.38; 'that,': 0.38; 'even': 0.60; 'ian': 0.60; 'entire': 0.61; 'total': 0.65; 'believe': 0.68; 'grow': 0.77; 'understand,': 0.84; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=mOEfUYG0eOHQtWxMPq/ZrWH5KYexTmtG9Cw4bQoUnz4=; b=tFhXZiPjokl0YO2+B0BDEdChma63KTQeXStE+dv/iJ2H2K7OUOi4aB26ME7SeISugQ 0TjLwIP7iDdE2LGnyFvyGMEuP0fqqu8rXjtUSs7e5Voa0i1IquPsnIsLZesWUgWY1TSx yqM3ue/ICErhkP8XMS5qFjL/RA+p3eYDKLthv8Z57lwx2/AWw1yhwNx4/JunKhKUP2wg rWkz0UXMvs/UqQnTE64IUASCdIXNlYFw5MI17JdLMhMEr/i/2docUV6R6B1739wfwB5L 9RZn67SVE0UY9I4F3qkjBfMn5uYUTA6R826bJ7d7+yvL2dT+A3LKj+8uNtZYLN3oAvJD QDMw== MIME-Version: 1.0 X-Received: by 10.221.27.8 with SMTP id ro8mr2015355vcb.30.1401865834794; Wed, 04 Jun 2014 00:10:34 -0700 (PDT) In-Reply-To: References: <20140603194949.3147497d@x34f> <44acd692-5dcd-4e5f-8238-7fbe0de4db2a@googlegroups.com> <538eac94$0$11109$c3e8da3@news.astraweb.com> Date: Wed, 4 Jun 2014 17:10:34 +1000 Subject: Re: Micro Python -- a lean and efficient implementation of Python 3 From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 26 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1401865837 news.xs4all.nl 2877 [2001:888:2000:d::a6]:34399 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:72605 On Wed, Jun 4, 2014 at 5:00 PM, Terry Reedy wrote: > On 6/4/2014 1:55 AM, Ian Kelly wrote: >> >> >> On Jun 3, 2014 11:27 PM, "Steven D'Aprano" > > wrote: >> > For technical reasons which I don't fully understand, Unicode only >> > uses 21 of those 32 bits, giving a total of 1114112 available code >> > points. >> >> I think mainly it's to accommodate UTF-16. The surrogate pair scheme is >> sufficient to encode up to 16 supplementary planes, so if Unicode were >> allowed to grow any larger than that, UTF-16 would no longer be able to >> encode all codepoints. > > > I believe the original utf-8 used up to 6 bytes per char to encode 2**32 > potential chars. Just 4 bytes limits to 2**21 and for whatever reason > (easier decoding?), utf-8 was revised down (unusual ;-). I understood it to be UTF-16's fault, per Ian's statement. That is to say, the entire Unicode standard was warped around the problem that some people were going around thinking "a character is 16 bits", even though that's just as fallacious as "a character is 8 bits". ChrisA