Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <a6c030b2-25da-47a2-97b5-1e349394d762@googlegroups.com>
References: <f801e06f-f7b2-4aca-b352-66856a939746@googlegroups.com> <mailman.3406.1345161591.4697.python-list@python.org> <a6c030b2-25da-47a2-97b5-1e349394d762@googlegroups.com>
Date: Fri, 17 Aug 2012 14:21:34 -0400
Subject: Re: How do I display unicode value stored in a string variable using ord()
From: Jerry Hill <malaclypse2@gmail.com>
To: python-list@python.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.3422.1345227697.4697.python-list@python.org>
Lines: 51
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:27254

On Fri, Aug 17, 2012 at 1:49 PM,  <wxjmfauth@gmail.com> wrote:
> The character '=E2=80=A6', Unicode name 'HORIZONTAL ELLIPSIS',
> is one of these characters existing in the cp1252, mac-roman
> coding schemes and not in iso-8859-1 (latin-1) and obviously
> not in ascii. It causes Py3.3 to work a few 100% slower
> than Py<3.3 versions due to the flexible string representation
> (ascii/latin-1/ucs-2/ucs-4) (I found cases up to 1000%).
>
>>>> '=E2=80=A6'.encode('cp1252')
> b'\x85'
>>>> '=E2=80=A6'.encode('mac-roman')
> b'\xc9'
>>>> '=E2=80=A6'.encode('iso-8859-1') # latin-1
> Traceback (most recent call last):
>   File "<eta last command>", line 1, in <module>
> UnicodeEncodeError: 'latin-1' codec can't encode character '\u2026'
> in position 0: ordinal not in range(256)
>
> If one could neglect this (typographically important) glyph, what
> to say about the characters of the European scripts (languages)
> present in cp1252 or in mac-roman but not in latin-1 (eg. the
> French script/language)?

So... python should change the longstanding definition of the latin-1
character set?  This isn't some sort of python limitation, it's just
the reality of legacy encodings that actually exist in the real world.


> Very nice. Python 2 was built for ascii user, now Python 3 is
> *optimized* for, let say, ascii user!
>
> The future is bright for Python. French users are better
> served with Apple or MS products, simply because these
> corporates know you can not write French with iso-8859-1.
>
> PS When "TeX" moved from the ascii encoding to iso-8859-1
> and the so called Cork encoding, "they" know this and provided
> all the complementary packages to circumvent this. It was
> in 199? (Python was not even born).
>
> Ditto for the foundries (Adobe, Linotype, ...)


I don't understand what any of this has to do with Python.  Just
output your text in UTF-8 like any civilized person in the 21st
century, and none of that is a problem at all.  Python make that easy.
 It also makes it easy to interoperate with older encodings if you
have to.

--=20
Jerry