Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.mixmin.net!feed.xsnews.nl!border-3.ams.xsnews.nl!xlned.com!feeder7.xlned.com!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.016 X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'encoded': 0.05; 'retrieved': 0.05; 'differently': 0.07; 'try:': 0.07; 'utf-8': 0.07; 'python': 0.09; '(although': 0.09; 'url:%s': 0.09; 'def': 0.10; 'encoding': 0.15; 'file,': 0.15; 'inserting': 0.16; 'printing.': 0.16; 'pyshell': 0.16; 'subject:Unicode': 0.16; 'url:mi': 0.16; 'urllib': 0.16; 'unicode': 0.17; 'shell': 0.18; '>>>': 0.18; '(or': 0.18; 'respective': 0.20; 'import': 0.21; 'errors': 0.23; 'insert': 0.23; 'seems': 0.23; 'header:In-Reply- To:1': 0.25; 'possibly': 0.27; 'experiences': 0.27; 'message- id:@mail.gmail.com': 0.27; 'post': 0.28; 'src': 0.29; 'source': 0.29; 'error': 0.30; 'received:209.85.160.46': 0.32; 'print': 0.32; 'getting': 0.33; 'to:addr:python-list': 0.33; 'hi,': 0.33; 'another': 0.33; 'received:google.com': 0.34; 'text': 0.34; 'process,': 0.35; 'received:209.85': 0.35; 'except': 0.36; 'skip:u 20': 0.36; 'but': 0.36; 'should': 0.36; 'too': 0.36; 'received:209': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'page': 0.38; 'to:addr:python.org': 0.39; 'where': 0.40; 'header:Received:5': 0.40; 'your': 0.60; 'skip:u 10': 0.60; 'vous': 0.61; 'url:index': 0.61; 'here': 0.65; 'url:cgi': 0.65; 'informations': 0.75; 'nous': 0.75; 'entry,': 0.84; 'opener': 0.84; 'stored,': 0.84; 'trouv\xe9': 0.84; 'url:lang': 0.84; 'veuillez': 0.84; 'ici': 0.91; 'url:biz': 0.91; 'catalog': 0.93; 'url:fr': 0.95 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=pyCp8HpMm5EarWlAox48/NAQe5lgmJm6fur2j6KrD+8=; b=lQSlUAkWRHe1tcnMzDSkzPJngjsf0mDylH8SW0EJtfaOwv6UyWAZAFfsBeOjV8yXCJ 8VbeA7s6h8Bf4QbnLIFd4pxmHVidkyoUaT/gQ/72sigt0HTdWSgebOxCRQJzwBr7g6fc l/2GD5z5Hhi4hYZ9BCcjrnosdCtMpXZKhQIHdgbX9O2cn4C77NBoNL8gZeLJ0eP4/wyX CgmEgwCgCMyP+uE1ASnX5CrSgVWNe/uKn0p6sQhQqFOQ0TlfRpaotDxd7abfryBW8Qme pmuJhHVEKT9dtPQwSZealpdZSQTwwVJpMK7pTBzU4UbN+g1Jq6ueCUufUjZMbRSxSuPD aplg== MIME-Version: 1.0 In-Reply-To: References: <50ceb674$0$29868$c3e8da3$5496439d@news.astraweb.com> Date: Mon, 17 Dec 2012 12:56:06 +0100 Subject: Re: Unicode From: Vlastimil Brom To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 71 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1355745371 news.xs4all.nl 6972 [2001:888:2000:d::a6]:43899 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:34972 2012/12/17 Anatoli Hristov : >> this seems to be an encoding error of your terminal on printing. >> You may need to describe (or better post the respective parts of the >> source) where the text is coming from (external text file, database >> entry, harcoded in the python source ...), how it is stored, retrieved >> and possibly manipulated before you insert it to the database. >> > Here is how I get the data using the urllib opener: > > def GetSpecsFR(icecat_prod_id): > opener =3D urllib.FancyURLopener({}) > ffr =3D opener.open("http://prf.icecat.biz/index.cgi?product_id=3D%s;= mi=3Dstart;smi=3Dproduct;shopname=3DopenICEcat-url;lang=3Dfr" > % icecat_prod_id) > specsfr =3D ffr.read() > #specsfr =3D specsfr.decode('utf-8') > specsfr =3D RemoveHTML(specsfr) > ##specsfr =3D "%r" % specsfr > ## if specsfr: > ## try: > ## specsfr =3D str(specsfr) > ## except UnicodeEncodeError: > ## specsfr =3D str(specsfr.encode('utf-16')) > return specsfr Hi, I don't know, what the product ID would look like, for this page, but assuming, the catalog pages are also utf-8 encoded as well as the error page I get, it should work ok; cf.: >>> import urllib >>> opener =3D urllib.FancyURLopener({}) >>> ffr =3D opener.open("http://prf.icecat.biz/index.cgi?product_id=3D%s;mi= =3Dstart;smi=3Dproduct;shopname=3DopenICEcat-url;lang=3Dfr" % (1234,)) >>> src =3D ffr.read() >>> print src.decode("utf-8") >>> Printing on an unicode-capable shell works ok (wx PyShell in my case), inserting to the database should be straightforward too (although I don't have experiences with the specific db you are using. Are you getting another unicode errors in other parts of the process, or do the above steps work differently on your computer? hth, vbr