Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <d575737d-c1e3-47db-9c7b-10fe0300cba7@googlegroups.com>
References: <f801e06f-f7b2-4aca-b352-66856a939746@googlegroups.com> <308df2af-abe7-4043-b199-0a39f440e0ab@googlegroups.com> <502f8a2a$0$29978$c3e8da3$5496439d@news.astraweb.com> <d575737d-c1e3-47db-9c7b-10fe0300cba7@googlegroups.com>
From: Ian Kelly <ian.g.kelly@gmail.com>
Date: Sat, 18 Aug 2012 09:51:37 -0600
Subject: Re: How do I display unicode value stored in a string variable using ord()
To: Python <python-list@python.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.3457.1345305136.4697.python-list@python.org>
Lines: 41
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:27304

On Sat, Aug 18, 2012 at 9:07 AM,  <wxjmfauth@gmail.com> wrote:
> Le samedi 18 ao=FBt 2012 14:27:23 UTC+2, Steven D'Aprano a =E9crit :
>> [...]
>> The problem with UCS-4 is that every character requires four bytes.
>> [...]
>
> I'm aware of this (and all the blah blah blah you are
> explaining). This always the same song. Memory.
>
> Let me ask. Is Python an 'american" product for us-users
> or is it a tool for everybody [*]?
> Is there any reason why non ascii users are somehow penalized
> compared to ascii users?

The change does not just benefit ASCII users.  It primarily benefits
anybody using a wide unicode build with strings mostly containing only
BMP characters.  Even for narrow build users, there is the benefit
that with approximately the same amount of memory usage in most cases,
they no longer have to worry about non-BMP characters sneaking in and
breaking their code.

There is some additional benefit for Latin-1 users, but this has
nothing to do with Python.  If Python is going to have the option of a
1-byte representation (and as long as we have the flexible
representation, I can see no reason not to), then it is going to be
Latin-1 by definition, because that's what 1-byte Unicode (UCS-1, if
you will) is.  If you have an issue with that, take it up with the
designers of Unicode.

>
> This flexible string representation is a regression (ascii users
> or not).
>
> I recognize in practice the real impact is for many users
> closed to zero (including me) but I have shown (I think) that
> this flexible representation is, by design, not as optimal
> as it is supposed to be. This is in my mind the relevant point.

You've shown nothing of the sort.  You've demonstrated only one out of
many possible benchmarks, and other users on this list can't even
reproduce that.