Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'python.': 0.02; 'memory.': 0.05; 'ascii': 0.07; 'bytes.': 0.07; 'users,': 0.07; 'subject:How': 0.09; 'python': 0.09; 'definition,': 0.09; 'regression': 0.09; 'subject:()': 0.09; 'subject:string': 0.09; 'subject:using': 0.09; 'aug': 0.13; 'sat,': 0.15; 'reproduce': 0.16; 'subject:unicode': 0.16; 'subject:variable': 0.16; 'ucs-4': 0.16; 'unicode.': 0.16; '\xe9crit': 0.16; 'users.': 0.16; 'string': 0.17; 'wrote:': 0.17; 'unicode': 0.17; 'memory': 0.18; 'code.': 0.20; 'mostly': 0.20; 'supposed': 0.21; 'recognize': 0.22; 'non': 0.24; 'header:In-Reply-To:1': 0.25; 'am,': 0.27; 'primarily': 0.27; 'message-id:@mail.gmail.com': 0.27; "d'aprano": 0.29; 'steven': 0.29; 'character': 0.29; "i'm": 0.29; '(including': 0.30; 'that.': 0.30; '(and': 0.32; 'anybody': 0.32; 'cases,': 0.33; 'zero': 0.33; 'problem': 0.33; 'to:addr:python- list': 0.33; 'point.': 0.33; 'that,': 0.34; "can't": 0.34; 'received:google.com': 0.34; 'list': 0.35; 'compared': 0.35; 'received:209.85': 0.35; 'there': 0.35; 'tool': 0.36; 'but': 0.36; 'be.': 0.36; 'characters': 0.36; 'possible': 0.37; 'does': 0.37; 'option': 0.37; 'why': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'some': 0.38; 'nothing': 0.38; 'to:addr:python.org': 0.39; 'build': 0.39; 'header:Received:5': 0.40; 'most': 0.61; "you've": 0.61; 'real': 0.61; 'containing': 0.61; 'wide': 0.62; 'is.': 0.62; 'everybody': 0.69; 'benefit': 0.70; 'designers': 0.75; 'samedi': 0.84; 'subject:value': 0.84; 'to:name:python': 0.84; 'song.': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=0JZwmk2t6JTQiSr1JKMFvT+IEzakVPMuEgHiNzjI6Jw=; b=HPSrbLiYRgc97o7KcPHLhzFUqwQi1RXZeLL4L+3NXJ1bkPl3B8Ere/nsuFdUr1JPJN S6/t5chIgwalGox4nmB0dXmOJAPdLfxXDHxyfEgI/mUxr9X92cVupcY+lr0k6rnbC8q8 XFtpXkgP2bEULfg6G/DoDJEHLnoBliX0gemJfck6Yicc0UIyvN+FOjsxBn1UzyD+I0w9 oYbM9GJ0EHSPLocGRti9ClYrxHHcmcXgJtJovdcLB95+wushetV95iKM7XdN5XEqrvGf Xfm+Nt2RybfgDDhbTJZNbn37ftj+swgN9sL4W0zae1xOrrKBhnKSAgGlJtW2/Bi/ld/O 0+4g== MIME-Version: 1.0 In-Reply-To: References: <308df2af-abe7-4043-b199-0a39f440e0ab@googlegroups.com> <502f8a2a$0$29978$c3e8da3$5496439d@news.astraweb.com> From: Ian Kelly Date: Sat, 18 Aug 2012 09:51:37 -0600 Subject: Re: How do I display unicode value stored in a string variable using ord() To: Python Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 41 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1345305136 news.xs4all.nl 6955 [2001:888:2000:d::a6]:39459 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:27304 On Sat, Aug 18, 2012 at 9:07 AM, wrote: > Le samedi 18 ao=FBt 2012 14:27:23 UTC+2, Steven D'Aprano a =E9crit : >> [...] >> The problem with UCS-4 is that every character requires four bytes. >> [...] > > I'm aware of this (and all the blah blah blah you are > explaining). This always the same song. Memory. > > Let me ask. Is Python an 'american" product for us-users > or is it a tool for everybody [*]? > Is there any reason why non ascii users are somehow penalized > compared to ascii users? The change does not just benefit ASCII users. It primarily benefits anybody using a wide unicode build with strings mostly containing only BMP characters. Even for narrow build users, there is the benefit that with approximately the same amount of memory usage in most cases, they no longer have to worry about non-BMP characters sneaking in and breaking their code. There is some additional benefit for Latin-1 users, but this has nothing to do with Python. If Python is going to have the option of a 1-byte representation (and as long as we have the flexible representation, I can see no reason not to), then it is going to be Latin-1 by definition, because that's what 1-byte Unicode (UCS-1, if you will) is. If you have an issue with that, take it up with the designers of Unicode. > > This flexible string representation is a regression (ascii users > or not). > > I recognize in practice the real impact is for many users > closed to zero (including me) but I have shown (I think) that > this flexible representation is, by design, not as optimal > as it is supposed to be. This is in my mind the relevant point. You've shown nothing of the sort. You've demonstrated only one out of many possible benchmarks, and other users on this list can't even reproduce that.