Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!goblin3!goblin1!goblin.stu.neva.ru!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'languages,': 0.04; 'subject:Python': 0.05; 'chunk': 0.07; 'executed': 0.07; 'python': 0.09; '(int': 0.09; 'chunks': 0.09; 'differently.': 0.09; 'integers': 0.09; 'pep': 0.09; 'sep': 0.09; 'storage.': 0.09; 'subset': 0.09; 'thread,': 0.09; 'underlying': 0.09; "wouldn't": 0.11; 'both,': 0.16; 'conforming': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'illustrates': 0.16; 'insofar': 0.16; 'is;': 0.16; 'python-list,': 0.16; 'scientist': 0.16; 'string': 0.17; 'wrote:': 0.17; 'unicode': 0.17; 'windows': 0.19; 'received:209.85.214.174': 0.21; '3.2': 0.22; 'header:In-Reply- To:1': 0.25; 'fit': 0.26; 'values': 0.26; 'am,': 0.27; 'implemented': 0.27; 'wonder': 0.27; 'message-id:@mail.gmail.com': 0.27; "doesn't": 0.28; 'surprised': 0.29; 'thinks': 0.29; 'that.': 0.30; 'fri,': 0.30; '(and': 0.32; 'aside': 0.32; 'picking': 0.33; 'to:addr:python-list': 0.33; 'languages': 0.33; 'another': 0.33; 'that,': 0.34; 'received:google.com': 0.34; 'wrong': 0.34; 'faster': 0.35; 'posting': 0.35; 'received:209.85': 0.35; 'something': 0.35; 'there': 0.35; 'created': 0.36; 'except': 0.36; 'but': 0.36; 'characters': 0.36; 'should': 0.36; 'received:209': 0.37; 'far': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'some': 0.38; 'nothing': 0.38; 'performance': 0.39; 'to:addr:python.org': 0.39; 'received:209.85.214': 0.39; 'skip:" 10': 0.40; 'header:Received:5': 0.40; 'think': 0.40; 'range': 0.60; 'skip:u 10': 0.60; 'mentioned': 0.63; 'different': 0.63; 'to,': 0.65; 'believe': 0.69; 'gain': 0.79; 'divide': 0.84; 'improvement': 0.84; 'divided': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=REBzvqC0rMHCqKgi39j8jFXmwSSnnVQ+DV5nzyGcpn4=; b=fRDFK7wj1AKDdVUNKFxRTlQ9Os+5NYmHiBDWaQVTIhAQmfFTKbLpefm0VRVo9FtWlY q/oPYPqOWBaLzuCZ7F46gafBOHtH1aw1jfpSB9wX7m4LUvUqHmctpPP9w4G2g7/sNQnH 7jz5GJGT1V3YbFRNjbm2mk6/C+5ZpUohR1Dsy1tzkrKMbj39xqkH8oOo1LqzwgMKJfrY RiNDxbeZQfmK0orXIv8H3W4FLV4LoTdF+B8fN2RaMwdSfY+PQ1GetMWxl7dD3u7fnemk 0/UFIXyXxHaDdyYaO0qeKBWabROZK1ThCZRYpOD8xFhFdAomPUTcExocIMaX8Nq3fJW+ ul8Q== MIME-Version: 1.0 In-Reply-To: <051dde5c-5293-4a9a-85c8-aa6714db4f69@googlegroups.com> References: <5062ad83$0$29997$c3e8da3$5496439d@news.astraweb.com> <693ac61b-b1d3-4192-9e50-5166fd119278@googlegroups.com> <447851a9-bc63-4711-a4e6-bff565e28f1f@googlegroups.com> <2b2d20f5-2807-4a61-b284-8075e900db22@googlegroups.com> <50641d6d$0$29997$c3e8da3$5496439d@news.astraweb.com> <50642DE0.8030102@mweb.co.za> <051dde5c-5293-4a9a-85c8-aa6714db4f69@googlegroups.com> Date: Fri, 28 Sep 2012 08:00:16 +1000 Subject: Re: Article on the future of Python From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 36 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1348783220 news.xs4all.nl 6941 [2001:888:2000:d::a6]:45221 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:30331 You're posting to both comp.lang.python and python-list, are you aware that that's redundant? On Fri, Sep 28, 2012 at 5:09 AM, wrote: > This flexible string representation is wrong by design. > Expecting to divide "Unicode" in chunks and to gain something > is an illusion. > It has been created by a computer scientist who thinks "bytes" > when on that field one has to think "bytes" and usage of the > characters at the same time. There's another range of numbers that, in some languages, is divided for efficiency's sake: Integers below 1<<[bit size]. In Python 2, such numbers were an entirely different data type (int vs long); other languages let you use the same data type for both, but "(1<<5)+1" will be executed much faster than "(1<<500)+1". (And far as I know, a conforming Python 3 implementation should be allowed to do that; 3.2 on Windows doesn't seem to, though.) That's all PEP 393 is; it's a performance improvement for a particular subset of values that happens to fit conveniently into the underlying machine's data storage. If Python were implemented on a 9-bit computer, I wouldn't be surprised if the PEP 393 optimizations were applied differently. It's nothing to do with Latin-1, except insofar as the narrowest form of string _happens_ to contain everything that's in Latin-1. Go blame the Unicode consortium for picking that. > The latin-1 chunk illustrates this wonderfully. Aside from replace(), as mentioned in this thread, are there any other ways that this is so wonderfully illustrated? Or is it "wonderfully" as in "I wonder if people will believe me if I keep spouting unsubstantiated claims"? ChrisA