Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsreader4.netcologne.de!news.netcologne.de!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <890ee58d-e93e-42ac-b17e-59b05c6ecacb@googlegroups.com>
References: <2adb4a25-8ea3-441f-b8c0-ee6c87e4b19f@googlegroups.com> <mailman.1051.1355928746.29569.python-list@python.org> <890ee58d-e93e-42ac-b17e-59b05c6ecacb@googlegroups.com>
From: Ian Kelly <ian.g.kelly@gmail.com>
Date: Wed, 19 Dec 2012 14:23:15 -0700
Subject: Re: Py 3.3, unicode / upper()
To: Python <python-list@python.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.1074.1355952227.29569.python-list@python.org>
Lines: 53
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:35159

On Wed, Dec 19, 2012 at 1:55 PM,  <wxjmfauth@gmail.com> wrote:
> Yes, it is correct (or can be considered as correct).
> I do not wish to discuss the typographical problematic
> of "Das Grosse Eszett". The web is full of pages on the
> subject. However, I never succeeded to find an "official
> position" from Unicode. The best information I found seem
> to indicate (to converge), U+1E9E is now the "supported"
> uppercase form of U+00DF. (see DIN).

Is this link not official?

http://unicode.org/cldr/utility/character.jsp?a=3D00DF

That defines a full uppercase mapping to SS and a simple uppercase
mapping to U+00DF itself, not U+1E9E.  My understanding of the simple
mapping is that it is not allowed to map to multiple characters,
whereas the full mapping is so allowed.

> What is bothering me, is more the implementation. The Unicode
> documentation says roughly this: if something can not be
> honoured, there is no harm, but do not implement a workaroud.
> In that case, I'm not sure Python is doing the best.

But this behavior is per the specification, not a workaround.  I think
the worst thing we could do in this regard would be to start diverging
from the specification because we think we know better than the
Unicode Consortium.


> If "wrong", this can be considered as programmatically correct
> or logically acceptable (Py3.2)
>
>>>> 'Stra=DFe'.upper().lower().capitalize() =3D=3D 'Stra=DFe'
> True
>
> while this will *always* be problematic (Py3.3)
>
>>>> 'Stra=DFe'.upper().lower().capitalize() =3D=3D 'Stra=DFe'
> False

On the other hand (Py3.2):

>>> 'Stra=DFe'.upper().isupper()
False

vs. Py3.3:

>>> 'Stra=DFe'.upper().isupper()
True

There is probably no one clearly correct way to handle the problem,
but personally this contradiction bothers me more than the example
that you posted.