Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!goblin2!goblin.stu.neva.ru!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'encoded': 0.05; 'retrieved': 0.05; 'badly': 0.07; 'processing.': 0.07; 'utf-8': 0.07; 'python': 0.09; 'encode': 0.09; 'encoding.': 0.09; 'inserted': 0.09; 'portable': 0.09; 'encoding': 0.15; 'file,': 0.15; 'skip:p 40': 0.15; 'codec': 0.16; 'decode': 0.16; 'mangled': 0.16; 'ordinal': 0.16; 'partly': 0.16; 'printing.': 0.16; 'subject:Unicode': 0.16; 'settings': 0.16; 'string': 0.17; 'unicode': 0.17; '(or': 0.18; 'respective': 0.20; '"",': 0.22; 'insert': 0.23; 'seems': 0.23; 'tried': 0.25; 'header:In-Reply- To:1': 0.25; '(most': 0.27; 'handling': 0.27; 'possibly': 0.27; 'i.e.': 0.27; 'see,': 0.27; 'message-id:@mail.gmail.com': 0.27; 'post': 0.28; '>>>>': 0.29; 'ansi': 0.29; 'character': 0.29; 'source': 0.29; 'error': 0.30; 'file': 0.32; 'print': 0.32; 'traceback': 0.33; 'to:addr:python-list': 0.33; 'likely': 0.33; 'hi,': 0.33; "can't": 0.34; 'received:google.com': 0.34; 'text': 0.34; 'doing': 0.35; 'received:209.85.220': 0.35; 'received:209.85': 0.35; 'something': 0.35; 'but': 0.36; 'thank': 0.36; 'problems': 0.36; 'previous': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'some': 0.38; 'to:addr:python.org': 0.39; 'where': 0.40; 'header:Received:5': 0.40; 'your': 0.60; 'skip:u 10': 0.60; 'results': 0.65; 'receive': 0.71; 'entry,': 0.84; 'stored,': 0.84; 'capability': 0.91; 'step.': 0.91; 'cause,': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=URotRQ4gxlNmqGWtZVK08aPO59jgzgW3UDh0mHUCb8E=; b=jQ9u8IvAeFhMSayM4eSbOGUwX0Ov8r5CFZs3VYZhIBqaawVdWE69ltmG5N84DOPR55 7C06ks6Ws7FJ0PQp/kVt7eqGTp1YQ3hPA5LQr4G6Uo5Tsm9YhpT7NxbZn1PA5mzbRknl jCzuMsm+IKZXZteCanSFfXHDxCj2+LMQTedRXx4BAgSZj91SaQaB1gHP6G5isp/P4Sji wScixx/aLdQ8MLBrAturQUp8ZQu2fVBE6YhsJSTGz81T4IHWZLcUSzq+eRPvyr+Fhrej jubrUBJBdr5T2ZgCwopUs4Anpr68bRzn14nTZGtLpKZ1gRJiDnTiXZ3kARdAlxW2rkdQ ey6Q== MIME-Version: 1.0 In-Reply-To: References: <50ceb674$0$29868$c3e8da3$5496439d@news.astraweb.com> Date: Mon, 17 Dec 2012 11:55:10 +0100 Subject: Re: Unicode From: Vlastimil Brom To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 45 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1355741714 news.xs4all.nl 6889 [2001:888:2000:d::a6]:59325 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:34970 2012/12/17 Anatoli Hristov : >> if you only see encoding problems on printing results to your >> terminal, its settings or unicode capability might be the cause, >> however, if you also get badly encoding items in the database, you are >> likely using an inappropriate encoding in some step. > > I get badly encoding into my DB > >> you seem to be doing something like the following (explicitly or >> partly implicitly, based on your system defaults): >> >>>>> print u"=E9troits, en utilisant un portable extr=EAmement puissant".e= ncode("utf-8").decode("windows-1252") >> =C3=A9troits, en utilisant un portable extr=C3=AAmement puissant >>>>> >> >> i.e. encode a text using utf-8 and handling it like windows-1252 >> afterwards (or take an already encoded text and decode it with the >> inappropriate ANSI encoding. > > Thank you Vlastimil, > > I tried to print it as you sholed mr, but I receive an erro: >>>> print u"=E9troits, en utilisant un portable extr=EAmement puissant".en= code("utf-8").decode("windows-1252") > Traceback (most recent call last): > File "", line 1, in ? > UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0192' > in position 1: ordinal not in range(256) >>>> Hi, this seems to be an encoding error of your terminal on printing. You may need to describe (or better post the respective parts of the source) where the text is coming from (external text file, database entry, harcoded in the python source ...), how it is stored, retrieved and possibly manipulated before you insert it to the database. You may try to print a repr(...) of the string to be inserted to the database to see, whether it isn't already mangled in some previous part of the processing. hth, vbr