Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.mixmin.net!feed.xsnews.nl!border-3.ams.xsnews.nl!xlned.com!feeder7.xlned.com!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <CAKhY55PdHrbqOw=3Gp3Pva57sAYrHuh5pFwy5mOSLFnV0ekoig@mail.gmail.com>
References: <mailman.941.1355692240.29569.python-list@python.org> <50ceb674$0$29868$c3e8da3$5496439d@news.astraweb.com> <CAKhY55MLBeT-xLwqy59gusU3H2o_pceLKDxY-8XVifU_Ns2yrg@mail.gmail.com> <CAMuTYXis2vH5xjmHAgrESquPDQsAYWkzFnWGfDeyE9K5-Nwiww@mail.gmail.com> <CAKhY55OE+FdjR-EyXToE5TuWEMLwAi4n-NeuFhtKnOtZ=ey2DA@mail.gmail.com> <CAHzaPEM_sp=0aEtbxVPYYvea=_DuE36P9ZwtNGAVjnXCaykNaw@mail.gmail.com> <CAKhY55P3x9-WS52D5i+E+rJ7y2osGHnqTZwB2TpBK4zUSe0ouw@mail.gmail.com> <CAHzaPEPigppu_OGO+oujXJXNxtq++GeQcPAqOPnwfegC8dE+Tg@mail.gmail.com> <CAKhY55PdHrbqOw=3Gp3Pva57sAYrHuh5pFwy5mOSLFnV0ekoig@mail.gmail.com>
Date: Mon, 17 Dec 2012 12:56:06 +0100
Subject: Re: Unicode
From: Vlastimil Brom <vlastimil.brom@gmail.com>
To: python-list@python.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.957.1355745371.29569.python-list@python.org>
Lines: 71
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:34972

2012/12/17 Anatoli Hristov <tolidtm@gmail.com>:
>> this seems to be an encoding error of your terminal on printing.
>> You may need to describe (or better post the respective parts of the
>> source) where the text is coming from (external text file, database
>> entry, harcoded in the python source ...), how it is stored, retrieved
>> and possibly manipulated before you insert it to the database.
>>
> Here is how I get the data using the urllib opener:
>
> def GetSpecsFR(icecat_prod_id):
>     opener =3D urllib.FancyURLopener({})
>     ffr =3D opener.open("http://prf.icecat.biz/index.cgi?product_id=3D%s;=
mi=3Dstart;smi=3Dproduct;shopname=3DopenICEcat-url;lang=3Dfr"
> % icecat_prod_id)
>     specsfr =3D ffr.read()
>     #specsfr =3D specsfr.decode('utf-8')
>     specsfr =3D RemoveHTML(specsfr)
>     ##specsfr =3D "%r" % specsfr
> ##    if specsfr:
> ##        try:
> ##            specsfr =3D str(specsfr)
> ##        except UnicodeEncodeError:
> ##            specsfr =3D str(specsfr.encode('utf-16'))
>     return specsfr

Hi,
I don't know, what the product ID would look like, for this page, but
assuming, the catalog pages are also utf-8 encoded as well as the
error page I get, it should work ok; cf.:

>>> import urllib
>>> opener =3D urllib.FancyURLopener({})
>>> ffr =3D opener.open("http://prf.icecat.biz/index.cgi?product_id=3D%s;mi=
=3Dstart;smi=3Dproduct;shopname=3DopenICEcat-url;lang=3Dfr" % (1234,))
>>> src =3D ffr.read()
>>> print src.decode("utf-8")


<!-- This Icecat template is used as head of all pages in Product finder --=
>


<HTML>
<HEAD>

[... - shortened]

<div align=3D"center">"D=E9sol=E9, pour ce produit, nous n'avons pas trouv=
=E9
d'autres informations produit.<br>Si vous n'=EAtes pas redirig=E9s
automatiquement, veuillez cliquer" <a href=3D"#" style=3D"font-size:80%"
onclick=3D"history.back()">ici</a>
</div>
<!--
            <td bgcolor=3D"" width=3D"230" align=3D"center"><img
src=3D"/imgs/logo.gif" width=3D"180" height=3D"58"></td>
-->



>>>

Printing on an unicode-capable shell works ok (wx PyShell in my case),
inserting to the database should be straightforward too (although I
don't have experiences with the specific db you are using.

Are you getting another unicode errors in other parts of the process,
or do the above steps work differently on your computer?

hth,
  vbr