Path: csiph.com!usenet.pasdenom.info!news.albasani.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.031 X-Spam-Evidence: '*H*': 0.94; '*S*': 0.00; 'mess': 0.09; 'cc:addr :python-list': 0.11; 'changes': 0.15; '(just': 0.16; '2.0:': 0.16; '65536': 0.16; 'cause.': 0.16; 'enough.': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'goal,': 0.16; 'scripts.': 0.16; 'subject:Unicode': 0.16; 'unhelpful': 0.16; 'wrote:': 0.18; 'cc:addr:python.org': 0.22; "aren't": 0.24; 'unicode': 0.24; "world's": 0.24; 'cc:2**0': 0.24; 'define': 0.26; 'header:In-Reply-To:1': 0.27; "we'd": 0.29; 'character': 0.29; "doesn't": 0.30; 'change,': 0.30; 'message- id:@mail.gmail.com': 0.30; 'universal': 0.31; 'languages': 0.32; 'fri,': 0.33; 'but': 0.35; 'received:google.com': 0.35; 'doubt': 0.36; 'being': 0.38; 'represent': 0.38; 'pm,': 0.38; 'little': 0.38; 'expect': 0.39; 'called': 0.40; 'space': 0.40; 'users': 0.40; 'how': 0.40; 'skip:u 10': 0.60; 'future': 0.60; 'simply': 0.61; 'simple': 0.61; 'such': 0.63; 'pick': 0.64; 'more': 0.64; "it'd": 0.84; 'exposing': 0.91; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=ettdJPhlhh7nWcPd44OUybBFobsPjxRL3iKr5Ou+rZk=; b=VzkuMb/upcxmaRX/0DfSf1yFcF5YMPPqb7MskwF5yUVY31e24hY+ekqEcN6r3qZZH+ 3D3RAuUp0pdVscTyQjMSfmLQIDDIUiqL10hPNkqc2no0LPww4FRT2pCgqgg4BQ/uF4Zl 0wiouAGkOZe4CmVCRI4IGRLoYLrwc2BH3P/cG+RZirLaM0ZQ+UxbC2hkw0g46PeOuCD+ oao4/bwF92K5PDFN4qLbK7HcJ0DXv/pQIBi/WuWaIBCHQ4dhLRpP2YoywzJCsWZEQ660 kp7dfkdWHUT4J8162z03ZabFVp0oL04+qO6D+kHy5qzbglIcNAw9mLcuvOSZobm3fm6W L3HQ== MIME-Version: 1.0 X-Received: by 10.58.1.5 with SMTP id 5mr11826529vei.8.1399006459594; Thu, 01 May 2014 21:54:19 -0700 (PDT) In-Reply-To: <1c08837e-496b-4fb2-8ff9-f8a495b67d67@googlegroups.com> References: <5361d4f9$0$11109$c3e8da3@news.astraweb.com> <82067b83-a6f5-4b16-b012-385535ea5607@googlegroups.com> <5362D9C1.9000108@mrabarnett.plus.com> <1c08837e-496b-4fb2-8ff9-f8a495b67d67@googlegroups.com> Date: Fri, 2 May 2014 14:54:19 +1000 Subject: Re: Unicode 7 From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 22 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1399006467 news.xs4all.nl 2951 [2001:888:2000:d::a6]:42522 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:70847 On Fri, May 2, 2014 at 2:42 PM, Rustom Mody wrote: > Unicode consortium's going from old BMP to current (6.0) SMPs to who-knows-what > in the future is similar. Unicode 1.0: "Let's make a single universal character set that can represent all the world's scripts. We'll define 65536 codepoints to do that with." Unicode 2.0: "Oh. That's not enough. Okay, let's define some more." It's not a fundamental change, nor is it unhelpful to Unicode's cause. It's simply an acknowledgement that 64K codepoints aren't enough. Yes, that gave us the mess of UTF-16 being called "Unicode" (if it hadn't been for Unicode 1.0, I doubt we'd now have so many languages using and exposing UTF-16 - it'd be a simple judgment call, pick UTF-8/UTF-16/UTF-32 based on what you expect your users to want to use), but it doesn't change Unicode's goal, and it also doesn't indicate that there's likely to be any more such changes in the future. (Just look at how little of the Unicode space is allocated so far.) ChrisA