Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #34972
| Path | csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.mixmin.net!feed.xsnews.nl!border-3.ams.xsnews.nl!xlned.com!feeder7.xlned.com!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <vlastimil.brom@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.016 |
| X-Spam-Evidence | '*H*': 0.97; '*S*': 0.00; 'encoded': 0.05; 'retrieved': 0.05; 'differently': 0.07; 'try:': 0.07; 'utf-8': 0.07; 'python': 0.09; '(although': 0.09; 'url:%s': 0.09; 'def': 0.10; 'encoding': 0.15; 'file,': 0.15; 'inserting': 0.16; 'printing.': 0.16; 'pyshell': 0.16; 'subject:Unicode': 0.16; 'url:mi': 0.16; 'urllib': 0.16; 'unicode': 0.17; 'shell': 0.18; '>>>': 0.18; '(or': 0.18; 'respective': 0.20; 'import': 0.21; 'errors': 0.23; 'insert': 0.23; 'seems': 0.23; 'header:In-Reply- To:1': 0.25; 'possibly': 0.27; 'experiences': 0.27; 'message- id:@mail.gmail.com': 0.27; 'post': 0.28; 'src': 0.29; 'source': 0.29; 'error': 0.30; 'received:209.85.160.46': 0.32; 'print': 0.32; 'getting': 0.33; 'to:addr:python-list': 0.33; 'hi,': 0.33; 'another': 0.33; 'received:google.com': 0.34; 'text': 0.34; 'process,': 0.35; 'received:209.85': 0.35; 'except': 0.36; 'skip:u 20': 0.36; 'but': 0.36; 'should': 0.36; 'too': 0.36; 'received:209': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'page': 0.38; 'to:addr:python.org': 0.39; 'where': 0.40; 'header:Received:5': 0.40; 'your': 0.60; 'skip:u 10': 0.60; 'vous': 0.61; 'url:index': 0.61; 'here': 0.65; 'url:cgi': 0.65; 'informations': 0.75; 'nous': 0.75; 'entry,': 0.84; 'opener': 0.84; 'stored,': 0.84; 'trouv\xe9': 0.84; 'url:lang': 0.84; 'veuillez': 0.84; 'ici': 0.91; 'url:biz': 0.91; 'catalog': 0.93; 'url:fr': 0.95 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=pyCp8HpMm5EarWlAox48/NAQe5lgmJm6fur2j6KrD+8=; b=lQSlUAkWRHe1tcnMzDSkzPJngjsf0mDylH8SW0EJtfaOwv6UyWAZAFfsBeOjV8yXCJ 8VbeA7s6h8Bf4QbnLIFd4pxmHVidkyoUaT/gQ/72sigt0HTdWSgebOxCRQJzwBr7g6fc l/2GD5z5Hhi4hYZ9BCcjrnosdCtMpXZKhQIHdgbX9O2cn4C77NBoNL8gZeLJ0eP4/wyX CgmEgwCgCMyP+uE1ASnX5CrSgVWNe/uKn0p6sQhQqFOQ0TlfRpaotDxd7abfryBW8Qme pmuJhHVEKT9dtPQwSZealpdZSQTwwVJpMK7pTBzU4UbN+g1Jq6ueCUufUjZMbRSxSuPD aplg== |
| MIME-Version | 1.0 |
| In-Reply-To | <CAKhY55PdHrbqOw=3Gp3Pva57sAYrHuh5pFwy5mOSLFnV0ekoig@mail.gmail.com> |
| References | <mailman.941.1355692240.29569.python-list@python.org> <50ceb674$0$29868$c3e8da3$5496439d@news.astraweb.com> <CAKhY55MLBeT-xLwqy59gusU3H2o_pceLKDxY-8XVifU_Ns2yrg@mail.gmail.com> <CAMuTYXis2vH5xjmHAgrESquPDQsAYWkzFnWGfDeyE9K5-Nwiww@mail.gmail.com> <CAKhY55OE+FdjR-EyXToE5TuWEMLwAi4n-NeuFhtKnOtZ=ey2DA@mail.gmail.com> <CAHzaPEM_sp=0aEtbxVPYYvea=_DuE36P9ZwtNGAVjnXCaykNaw@mail.gmail.com> <CAKhY55P3x9-WS52D5i+E+rJ7y2osGHnqTZwB2TpBK4zUSe0ouw@mail.gmail.com> <CAHzaPEPigppu_OGO+oujXJXNxtq++GeQcPAqOPnwfegC8dE+Tg@mail.gmail.com> <CAKhY55PdHrbqOw=3Gp3Pva57sAYrHuh5pFwy5mOSLFnV0ekoig@mail.gmail.com> |
| Date | Mon, 17 Dec 2012 12:56:06 +0100 |
| Subject | Re: Unicode |
| From | Vlastimil Brom <vlastimil.brom@gmail.com> |
| To | python-list@python.org |
| Content-Type | text/plain; charset=ISO-8859-1 |
| Content-Transfer-Encoding | quoted-printable |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.957.1355745371.29569.python-list@python.org> (permalink) |
| Lines | 71 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1355745371 news.xs4all.nl 6972 [2001:888:2000:d::a6]:43899 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:34972 |
Show key headers only | View raw
2012/12/17 Anatoli Hristov <tolidtm@gmail.com>:
>> this seems to be an encoding error of your terminal on printing.
>> You may need to describe (or better post the respective parts of the
>> source) where the text is coming from (external text file, database
>> entry, harcoded in the python source ...), how it is stored, retrieved
>> and possibly manipulated before you insert it to the database.
>>
> Here is how I get the data using the urllib opener:
>
> def GetSpecsFR(icecat_prod_id):
> opener = urllib.FancyURLopener({})
> ffr = opener.open("http://prf.icecat.biz/index.cgi?product_id=%s;mi=start;smi=product;shopname=openICEcat-url;lang=fr"
> % icecat_prod_id)
> specsfr = ffr.read()
> #specsfr = specsfr.decode('utf-8')
> specsfr = RemoveHTML(specsfr)
> ##specsfr = "%r" % specsfr
> ## if specsfr:
> ## try:
> ## specsfr = str(specsfr)
> ## except UnicodeEncodeError:
> ## specsfr = str(specsfr.encode('utf-16'))
> return specsfr
Hi,
I don't know, what the product ID would look like, for this page, but
assuming, the catalog pages are also utf-8 encoded as well as the
error page I get, it should work ok; cf.:
>>> import urllib
>>> opener = urllib.FancyURLopener({})
>>> ffr = opener.open("http://prf.icecat.biz/index.cgi?product_id=%s;mi=start;smi=product;shopname=openICEcat-url;lang=fr" % (1234,))
>>> src = ffr.read()
>>> print src.decode("utf-8")
<!-- This Icecat template is used as head of all pages in Product finder -->
<HTML>
<HEAD>
[... - shortened]
<div align="center">"Désolé, pour ce produit, nous n'avons pas trouvé
d'autres informations produit.<br>Si vous n'êtes pas redirigés
automatiquement, veuillez cliquer" <a href="#" style="font-size:80%"
onclick="history.back()">ici</a>
</div>
<!--
<td bgcolor="" width="230" align="center"><img
src="/imgs/logo.gif" width="180" height="58"></td>
-->
>>>
Printing on an unicode-capable shell works ok (wx PyShell in my case),
inserting to the database should be straightforward too (although I
don't have experiences with the specific db you are using.
Are you getting another unicode errors in other parts of the process,
or do the above steps work differently on your computer?
hth,
vbr
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-16 22:10 +0100
Re: Unicode Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-17 06:06 +0000
Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 09:59 +0100
Re: Unicode Benjamin Kaplan <benjamin.kaplan@case.edu> - 2012-12-17 01:28 -0800
Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 10:45 +0100
Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 11:02 +0100
Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 11:17 +0100
Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 11:55 +0100
Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 12:14 +0100
Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 12:56 +0100
Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 18:43 +0100
Re: Unicode Dave Angel <d@davea.name> - 2012-12-17 13:07 -0500
Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 19:36 +0100
Re: Unicode Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-18 00:07 +0000
Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 20:55 +0100
Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 21:00 +0100
Re: Unicode Dave Angel <d@davea.name> - 2012-12-17 16:09 -0500
Re: Unicode Hans Mulder <hansmu@xs4all.nl> - 2012-12-17 23:02 +0100
Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 23:33 +0100
Re: Unicode Terry Reedy <tjreedy@udel.edu> - 2012-12-17 17:03 -0500
Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 23:31 +0100
csiph-web