Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Newsgroups: comp.lang.python
Date: Fri, 17 Aug 2012 11:45:02 -0700 (PDT)
In-Reply-To: <mailman.3422.1345227697.4697.python-list@python.org>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=83.78.180.128; posting-account=ung4FAoAAAC46zhHJ0Nsnuox7M5gDvs_
References: <f801e06f-f7b2-4aca-b352-66856a939746@googlegroups.com> <mailman.3406.1345161591.4697.python-list@python.org> <a6c030b2-25da-47a2-97b5-1e349394d762@googlegroups.com> <mailman.3422.1345227697.4697.python-list@python.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Subject: Re: How do I display unicode value stored in a string variable using ord()
From: wxjmfauth@gmail.com
To: comp.lang.python@googlegroups.com
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
Cc: python-list@python.org
Precedence: list
Message-ID: <mailman.3423.1345229106.4697.python-list@python.org>
Lines: 107
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:27256

Le vendredi 17 ao=FBt 2012 20:21:34 UTC+2, Jerry Hill a =E9crit=A0:
> On Fri, Aug 17, 2012 at 1:49 PM,  <wxjmfauth@gmail.com> wrote:
>=20
> > The character '=85', Unicode name 'HORIZONTAL ELLIPSIS',
>=20
> > is one of these characters existing in the cp1252, mac-roman
>=20
> > coding schemes and not in iso-8859-1 (latin-1) and obviously
>=20
> > not in ascii. It causes Py3.3 to work a few 100% slower
>=20
> > than Py<3.3 versions due to the flexible string representation
>=20
> > (ascii/latin-1/ucs-2/ucs-4) (I found cases up to 1000%).
>=20
> >
>=20
> >>>> '=85'.encode('cp1252')
>=20
> > b'\x85'
>=20
> >>>> '=85'.encode('mac-roman')
>=20
> > b'\xc9'
>=20
> >>>> '=85'.encode('iso-8859-1') # latin-1
>=20
> > Traceback (most recent call last):
>=20
> >   File "<eta last command>", line 1, in <module>
>=20
> > UnicodeEncodeError: 'latin-1' codec can't encode character '\u2026'
>=20
> > in position 0: ordinal not in range(256)
>=20
> >
>=20
> > If one could neglect this (typographically important) glyph, what
>=20
> > to say about the characters of the European scripts (languages)
>=20
> > present in cp1252 or in mac-roman but not in latin-1 (eg. the
>=20
> > French script/language)?
>=20
>=20
>=20
> So... python should change the longstanding definition of the latin-1
>=20
> character set?  This isn't some sort of python limitation, it's just
>=20
> the reality of legacy encodings that actually exist in the real world.
>=20
>=20
>=20
>=20
>=20
> > Very nice. Python 2 was built for ascii user, now Python 3 is
>=20
> > *optimized* for, let say, ascii user!
>=20
> >
>=20
> > The future is bright for Python. French users are better
>=20
> > served with Apple or MS products, simply because these
>=20
> > corporates know you can not write French with iso-8859-1.
>=20
> >
>=20
> > PS When "TeX" moved from the ascii encoding to iso-8859-1
>=20
> > and the so called Cork encoding, "they" know this and provided
>=20
> > all the complementary packages to circumvent this. It was
>=20
> > in 199? (Python was not even born).
>=20
> >
>=20
> > Ditto for the foundries (Adobe, Linotype, ...)
>=20
>=20
>=20
>=20
>=20
> I don't understand what any of this has to do with Python.  Just
>=20
> output your text in UTF-8 like any civilized person in the 21st
>=20
> century, and none of that is a problem at all.  Python make that easy.
>=20
>  It also makes it easy to interoperate with older encodings if you
>=20
> have to.
>=20

Sorry, you missed the point.

My comment had nothing to do with the code source coding,
the coding of a Python "string" in the code source or with
the display of a Python3 <str>.
I wrote about the *internal* Python "coding", the
way Python keeps "strings" in memory. See PEP 393.

jmf