Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!eternal-september.org!feeder.eternal-september.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'encoded': 0.05; 'result,': 0.05; 'strings.': 0.07; '*is*': 0.09; 'effect.': 0.09; 'subject:string': 0.09; 'to:addr:comp.lang.python': 0.09; 'unicode,': 0.09; 'weak': 0.09; 'cc:addr:python-list': 0.10; 'aug': 0.13; '"code': 0.16; 'enough.': 0.16; 'mistake.': 0.16; 'subject:3.3': 0.16; 'unusable.': 0.16; 'wrote:': 0.17; 'bytes': 0.17; 'cc:2**0': 0.23; 'cc:no real name:2**0': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'header :User-Agent:1': 0.26; 'common': 0.26; 'coding': 0.27; 'correct': 0.28; 'chris': 0.28; 'accidentally': 0.29; 'character.': 0.29; 'lies': 0.29; 'scheme.': 0.29; 'character': 0.29; 'no,': 0.29; 'push': 0.30; 'basic': 0.30; 'sense': 0.31; 'point': 0.31; 'european': 0.33; 'problem': 0.33; 'received:google.com': 0.34; 'pm,': 0.35; 'received:209.85.220': 0.35; 'received:209.85': 0.35; 'should': 0.36; 'uses': 0.37; 'subject:New': 0.37; 'rather': 0.37; 'received:209': 0.37; 'far': 0.37; 'subject:: ': 0.38; 'store': 0.38; 'fact': 0.38; 'think': 0.40; 'your': 0.60; 'from:no real name:2**0': 0.60; 'most': 0.61; 'side': 0.61; 'kind': 0.61; 'effects.': 0.91; 'received:209.85.220.184': 0.91 Newsgroups: comp.lang.python Date: Sun, 19 Aug 2012 05:14:21 -0700 (PDT) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=83.79.89.25; posting-account=ung4FAoAAAC46zhHJ0Nsnuox7M5gDvs_ References: <308df2af-abe7-4043-b199-0a39f440e0ab@googlegroups.com> <502f8a2a$0$29978$c3e8da3$5496439d@news.astraweb.com> <4c62a649-bc21-4e47-9c0f-acb1b1e70e36@googlegroups.com> <5030891f$0$29978$c3e8da3$5496439d@news.astraweb.com> <5030aa44$0$29978$c3e8da3$5496439d@news.astraweb.com> <11931ec9-1858-4ae8-8a61-1d154d105229@googlegroups.com> User-Agent: G2/1.0 X-Google-Web-Client: true X-Google-IP: 83.79.89.25 MIME-Version: 1.0 Subject: Re: New internal string format in 3.3 From: wxjmfauth@gmail.com To: comp.lang.python@googlegroups.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Message-ID: Lines: 32 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1345378464 news.xs4all.nl 6883 [2001:888:2000:d::a6]:59388 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:27383 Le dimanche 19 ao=FBt 2012 12:26:44 UTC+2, Chris Angelico a =E9crit=A0: > On Sun, Aug 19, 2012 at 8:19 PM, wrote: >=20 > > This is precicely the weak point of this flexible >=20 > > representation. It uses latin-1 and latin-1 is for >=20 > > most users simply unusable. >=20 >=20 >=20 > No, it uses Unicode, and as an optimization, attempts to store the >=20 > codepoints in less than four bytes for most strings. The fact that a >=20 > one-byte storage format happens to look like latin-1 is rather >=20 > coincidental. >=20 And this this is the common basic mistake. You do not push your argumentation far enough. A character may "fall" accidentally in a latin-1. The problem lies in these european characters, which can not fall in this coding. This *is* the cause of the negative side effects. If you are using a correct coding scheme, like cp1252, mac-roman or iso-8859-15, you will never see such a negative side effect. Again, the problem is not the result, the encoded character. The critical part is the character which may cause this side effect. You should think "character set" and not encoded "code point", considering this kind of expression has a sense in 8-bits coding scheme. jmf