Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #35001
| Date | 2012-12-17 13:07 -0500 |
|---|---|
| From | Dave Angel <d@davea.name> |
| Subject | Re: Unicode |
| References | (6 earlier) <CAKhY55P3x9-WS52D5i+E+rJ7y2osGHnqTZwB2TpBK4zUSe0ouw@mail.gmail.com> <CAHzaPEPigppu_OGO+oujXJXNxtq++GeQcPAqOPnwfegC8dE+Tg@mail.gmail.com> <CAKhY55PdHrbqOw=3Gp3Pva57sAYrHuh5pFwy5mOSLFnV0ekoig@mail.gmail.com> <CAHzaPEMECAgRhgWro0mMn4UtimYL6NQX1FFASEvm5dqr5krXyA@mail.gmail.com> <CAKhY55MHefj=RjThxacsdsd8NJoC7WmVzYkahObrgCuA2SMuFg@mail.gmail.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.979.1355767688.29569.python-list@python.org> (permalink) |
On 12/17/2012 12:43 PM, Anatoli Hristov wrote:
>> Hi,
>> I don't know, what the product ID would look like, for this page, but
>> assuming, the catalog pages are also utf-8 encoded as well as the
>> error page I get, it should work ok; cf.:
> You are right, I get it work on Windows too, but not in Linux. I
> changed the codec of linux, but still I don't get it
>
> Here is what I get from Linux:
>
>>>> import urllib
>>>> opener = urllib.FancyURLopener({})
>>>> ffr = opener.open("http://prf.icecat.biz/index.cgi?product_id=%s;mi=start;smi=product;shopname=openICEcat-url;lang=fr" % (14688538))
>>>> src = ffr.read()
>>>> print src.decode("utf-8")
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2122'
> in position 17167: ordinal not in range(256)
I can tell you what's happening, but maybe not how to fix it.
src.decode() is creating a unicode string. The error is not happening
there. But when print is used with a unicode string, it has to encode
the data. And for whatever reason, yours is using latin-1, and you have
a character in there which is not in the latin-1 encoding.
My python 2.7 uses utf-8 everywhere (on Linux Ubuntu 11.04).
--
DaveA
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-16 22:10 +0100
Re: Unicode Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-17 06:06 +0000
Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 09:59 +0100
Re: Unicode Benjamin Kaplan <benjamin.kaplan@case.edu> - 2012-12-17 01:28 -0800
Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 10:45 +0100
Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 11:02 +0100
Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 11:17 +0100
Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 11:55 +0100
Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 12:14 +0100
Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 12:56 +0100
Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 18:43 +0100
Re: Unicode Dave Angel <d@davea.name> - 2012-12-17 13:07 -0500
Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 19:36 +0100
Re: Unicode Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-18 00:07 +0000
Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 20:55 +0100
Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 21:00 +0100
Re: Unicode Dave Angel <d@davea.name> - 2012-12-17 16:09 -0500
Re: Unicode Hans Mulder <hansmu@xs4all.nl> - 2012-12-17 23:02 +0100
Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 23:33 +0100
Re: Unicode Terry Reedy <tjreedy@udel.edu> - 2012-12-17 17:03 -0500
Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 23:31 +0100
csiph-web