Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #35001

Re: Unicode

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <d@davea.name>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.003
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; 'string.': 0.04; 'encoded': 0.05; 'linux,': 0.05; 'reason,': 0.07; 'utf-8': 0.07; 'python': 0.09; 'encode': 0.09; 'encoding.': 0.09; 'linux.': 0.09; 'url:%s': 0.09; 'cc:addr:python-list': 0.10; '2.7': 0.13; 'codec': 0.16; 'ordinal': 0.16; 'subject:Unicode': 0.16; 'url:mi': 0.16; 'urllib': 0.16; 'wrote:': 0.17; 'fix': 0.17; 'string,': 0.17; 'unicode': 0.17; 'windows': 0.19; 'import': 0.21; '"",': 0.22; '(on': 0.22; 'cc:2**0': 0.23; 'cc:no real name:2**0': 0.24; 'linux': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'creating': 0.26; '(most': 0.27; 'there.': 0.28; '>>>>': 0.29; 'src': 0.29; 'character': 0.29; 'maybe': 0.29; 'error': 0.30; 'file': 0.32; 'print': 0.32; 'traceback': 0.33; 'ubuntu': 0.33; 'hi,': 0.33; "can't": 0.34; 'changed': 0.34; 'whatever': 0.35; 'pm,': 0.35; 'there': 0.35; 'skip:u 20': 0.36; 'but': 0.36; 'data.': 0.36; 'should': 0.36; 'uses': 0.37; 'subject:: ': 0.38; 'page': 0.38; 'received:192': 0.39; 'received:192.168': 0.40; 'skip:u 10': 0.60; 'url:index': 0.61; 'here': 0.65; 'url:cgi': 0.65; 'header:Reply-To:1': 0.68; 'received:74.208': 0.71; 'reply-to:no real name:2**0': 0.72; 'opener': 0.84; 'url:lang': 0.84; 'yours': 0.88; 'url:biz': 0.91; 'to:none': 0.93; 'catalog': 0.93; 'url:fr': 0.95
Date Mon, 17 Dec 2012 13:07:46 -0500
From Dave Angel <d@davea.name>
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121011 Thunderbird/16.0.1
MIME-Version 1.0
CC python-list@python.org
Subject Re: Unicode
References <mailman.941.1355692240.29569.python-list@python.org> <50ceb674$0$29868$c3e8da3$5496439d@news.astraweb.com> <CAKhY55MLBeT-xLwqy59gusU3H2o_pceLKDxY-8XVifU_Ns2yrg@mail.gmail.com> <CAMuTYXis2vH5xjmHAgrESquPDQsAYWkzFnWGfDeyE9K5-Nwiww@mail.gmail.com> <CAKhY55OE+FdjR-EyXToE5TuWEMLwAi4n-NeuFhtKnOtZ=ey2DA@mail.gmail.com> <CAHzaPEM_sp=0aEtbxVPYYvea=_DuE36P9ZwtNGAVjnXCaykNaw@mail.gmail.com> <CAKhY55P3x9-WS52D5i+E+rJ7y2osGHnqTZwB2TpBK4zUSe0ouw@mail.gmail.com> <CAHzaPEPigppu_OGO+oujXJXNxtq++GeQcPAqOPnwfegC8dE+Tg@mail.gmail.com> <CAKhY55PdHrbqOw=3Gp3Pva57sAYrHuh5pFwy5mOSLFnV0ekoig@mail.gmail.com> <CAHzaPEMECAgRhgWro0mMn4UtimYL6NQX1FFASEvm5dqr5krXyA@mail.gmail.com> <CAKhY55MHefj=RjThxacsdsd8NJoC7WmVzYkahObrgCuA2SMuFg@mail.gmail.com>
In-Reply-To <CAKhY55MHefj=RjThxacsdsd8NJoC7WmVzYkahObrgCuA2SMuFg@mail.gmail.com>
Content-Type text/plain; charset=ISO-8859-1
Content-Transfer-Encoding 7bit
X-Provags-ID V02:K0:ItixEmeiKC346Zmi97y/A3ECH6zbTkIfw5Q+5/q4Edb ZVpEWK6BxOm4UCwPeR7jhGCWPBYsiVSt54+8ah9IIgJn5X+jx2 Q1WlWTJ+q9ZRPLSrqKOhpiJlHRFjm8qr0OkCApPbngO7jTxk5U w9bmjRBybZA7qw0ZXOBvOcjZsG5YnOICr2EVAZYcN3BVrxmH2B rDiA49d2XlftDzgufmv+y4jLjLAU2Cz59mgvkvrtDZ3KvuORW9 fGpMyJWAt8TuBAklNa2tMR0puQNw+poa2tdW0K21H2I3gVZGWY QIYEiXTsbejxYeA2/cSoBcHQ6ycbBZpGJ+kFaZP7BRpmZCEWg= =
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
Reply-To d@davea.name
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.979.1355767688.29569.python-list@python.org> (permalink)
Lines 34
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1355767688 news.xs4all.nl 6955 [2001:888:2000:d::a6]:49250
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:35001

Show key headers only | View raw


On 12/17/2012 12:43 PM, Anatoli Hristov wrote:
>> Hi,
>> I don't know, what the product ID would look like, for this page, but
>> assuming, the catalog pages are also utf-8 encoded as well as the
>> error page I get, it should work ok; cf.:
> You are right, I get it work on Windows too, but not in Linux. I
> changed the codec of linux, but still I don't get it
>
> Here is what I get from Linux:
>
>>>> import urllib
>>>> opener = urllib.FancyURLopener({})
>>>> ffr = opener.open("http://prf.icecat.biz/index.cgi?product_id=%s;mi=start;smi=product;shopname=openICEcat-url;lang=fr" % (14688538))
>>>> src = ffr.read()
>>>> print src.decode("utf-8")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2122'
> in position 17167: ordinal not in range(256)

I can tell you what's happening, but maybe not how to fix it.

src.decode() is creating a unicode string.  The error is not happening
there.  But when print is used with a unicode string, it has to encode
the data.  And for whatever reason, yours is using latin-1, and you have
a character in there which is not in the latin-1 encoding.

My python 2.7 uses utf-8 everywhere (on Linux Ubuntu 11.04).


-- 

DaveA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-16 22:10 +0100
  Re: Unicode Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-17 06:06 +0000
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 09:59 +0100
    Re: Unicode Benjamin Kaplan <benjamin.kaplan@case.edu> - 2012-12-17 01:28 -0800
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 10:45 +0100
    Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 11:02 +0100
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 11:17 +0100
    Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 11:55 +0100
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 12:14 +0100
    Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 12:56 +0100
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 18:43 +0100
    Re: Unicode Dave Angel <d@davea.name> - 2012-12-17 13:07 -0500
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 19:36 +0100
      Re: Unicode Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-18 00:07 +0000
    Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 20:55 +0100
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 21:00 +0100
    Re: Unicode Dave Angel <d@davea.name> - 2012-12-17 16:09 -0500
      Re: Unicode Hans Mulder <hansmu@xs4all.nl> - 2012-12-17 23:02 +0100
        Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 23:33 +0100
    Re: Unicode Terry Reedy <tjreedy@udel.edu> - 2012-12-17 17:03 -0500
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 23:31 +0100

csiph-web