Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'string.': 0.04; 'subject:Python': 0.05; 'ascii': 0.07; 'received:verizon.net': 0.07; 'skipping': 0.07; 'terry': 0.07; 'pages.': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'utf-8': 0.09; 'am,': 0.12; '*before*': 0.16; 'complexity,': 0.16; 'disadvantage': 0.16; 'mixture': 0.16; 'reedy': 0.16; 'set,': 0.16; 'subject:usage': 0.16; 'three.': 0.16; 'url:unicode': 0.16; 'versus': 0.16; 'mon,': 0.16; 'wrote:': 0.18; 'specifies': 0.18; 'jan': 0.19; "doesn't": 0.22; 'header:In-Reply-To:1': 0.22; 'feb': 0.22; 'subject:numbers': 0.23; 'byte': 0.24; 'code': 0.26; 'users.': 0.28; 'avoiding': 0.29; 'unicode': 0.29; 'pm,': 0.29; '(and': 0.30; 'differently': 0.30; 'chris': 0.30; 'pretty': 0.31; 'idea': 0.32; 'list': 0.32; 'there': 0.33; 'header:User-Agent:1': 0.33; 'header:X-Complaints-To:1': 0.34; 'character': 0.34; 'rather': 0.34; 'builds': 0.34; 'platforms.': 0.34; 'to:addr :python-list': 0.35; 'sets': 0.35; 'received:org': 0.36; 'encoding': 0.37; 'with.': 0.37; 'but': 0.37; 'using': 0.37; 'replace': 0.38; 'could': 0.38; 'some': 0.38; 'should': 0.38; 'problems': 0.38; 'url:org': 0.39; 'to:addr:python.org': 0.40; 'more': 0.61; 'hope': 0.61; 'course,': 0.62; 'details': 0.64; 'here': 0.64; 'url:0': 0.67; 'storage': 0.70; 'care': 0.71; 'encoding,': 0.84; '3.3': 0.91 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Terry Reedy Subject: Re: Python usage numbers Date: Sun, 12 Feb 2012 22:09:50 -0500 References: <4F36E2F5.9000505@gmail.com> <4f37229b$0$29986$c3e8da3$5496439d@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Gmane-NNTP-Posting-Host: pool-74-109-121-73.phlapa.fios.verizon.net User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0) Gecko/20111105 Thunderbird/8.0 In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 38 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1329102603 news.xs4all.nl 6907 [2001:888:2000:d::a6]:36709 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:20323 On 2/12/2012 5:14 PM, Chris Angelico wrote: > On Mon, Feb 13, 2012 at 9:07 AM, Terry Reedy wrote: >> The situation before ascii is like where we ended up *before* unicode. >> Unicode aims to replace all those byte encoding and character sets with >> *one* byte encoding for *one* character set, which will be a great >> simplification. It is the idea of ascii applied on a global rather that >> local basis. > > Unicode doesn't deal with byte encodings; UTF-8 is an encoding, The Unicode Standard specifies 3 UTF storage formats* and 8 UTF byte-oriented transmission formats. UTF-8 is the most common of all encodings for web pages. (And ascii pages are utf-8 also.) It is the only one of the 8 most of us need to much bother with. Look here for the list http://www.unicode.org/glossary/#U and for details look in various places in http://www.unicode.org/versions/Unicode6.1.0/ch03.pdf > but so are UTF-16, UTF-32. > and as many more as you could hope for. All the non-UTF 'as many more as you could hope for' encodings are not part of Unicode. * The new internal unicode scheme for 3.3 is pretty much a mixture of the 3 storage formats (I am of course, skipping some details) by using the widest one needed for each string. The advantage is avoiding problems with each of the three. The disadvantage is greater internal complexity, but that should be hidden from users. They will not need to care about the internals. They will be able to forget about 'narrow' versus 'wide' builds and the possible requirement to code differently for each. There will only be one scheme that works the same on all platforms. Most apps should require less space and about the same time. -- Terry Jan Reedy