Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #34970

Re: Unicode

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!goblin2!goblin.stu.neva.ru!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <vlastimil.brom@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.001
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'encoded': 0.05; 'retrieved': 0.05; 'badly': 0.07; 'processing.': 0.07; 'utf-8': 0.07; 'python': 0.09; 'encode': 0.09; 'encoding.': 0.09; 'inserted': 0.09; 'portable': 0.09; 'encoding': 0.15; 'file,': 0.15; 'skip:p 40': 0.15; 'codec': 0.16; 'decode': 0.16; 'mangled': 0.16; 'ordinal': 0.16; 'partly': 0.16; 'printing.': 0.16; 'subject:Unicode': 0.16; 'settings': 0.16; 'string': 0.17; 'unicode': 0.17; '(or': 0.18; 'respective': 0.20; '"",': 0.22; 'insert': 0.23; 'seems': 0.23; 'tried': 0.25; 'header:In-Reply- To:1': 0.25; '(most': 0.27; 'handling': 0.27; 'possibly': 0.27; 'i.e.': 0.27; 'see,': 0.27; 'message-id:@mail.gmail.com': 0.27; 'post': 0.28; '>>>>': 0.29; 'ansi': 0.29; 'character': 0.29; 'source': 0.29; 'error': 0.30; 'file': 0.32; 'print': 0.32; 'traceback': 0.33; 'to:addr:python-list': 0.33; 'likely': 0.33; 'hi,': 0.33; "can't": 0.34; 'received:google.com': 0.34; 'text': 0.34; 'doing': 0.35; 'received:209.85.220': 0.35; 'received:209.85': 0.35; 'something': 0.35; 'but': 0.36; 'thank': 0.36; 'problems': 0.36; 'previous': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'some': 0.38; 'to:addr:python.org': 0.39; 'where': 0.40; 'header:Received:5': 0.40; 'your': 0.60; 'skip:u 10': 0.60; 'results': 0.65; 'receive': 0.71; 'entry,': 0.84; 'stored,': 0.84; 'capability': 0.91; 'step.': 0.91; 'cause,': 0.93
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=URotRQ4gxlNmqGWtZVK08aPO59jgzgW3UDh0mHUCb8E=; b=jQ9u8IvAeFhMSayM4eSbOGUwX0Ov8r5CFZs3VYZhIBqaawVdWE69ltmG5N84DOPR55 7C06ks6Ws7FJ0PQp/kVt7eqGTp1YQ3hPA5LQr4G6Uo5Tsm9YhpT7NxbZn1PA5mzbRknl jCzuMsm+IKZXZteCanSFfXHDxCj2+LMQTedRXx4BAgSZj91SaQaB1gHP6G5isp/P4Sji wScixx/aLdQ8MLBrAturQUp8ZQu2fVBE6YhsJSTGz81T4IHWZLcUSzq+eRPvyr+Fhrej jubrUBJBdr5T2ZgCwopUs4Anpr68bRzn14nTZGtLpKZ1gRJiDnTiXZ3kARdAlxW2rkdQ ey6Q==
MIME-Version 1.0
In-Reply-To <CAKhY55P3x9-WS52D5i+E+rJ7y2osGHnqTZwB2TpBK4zUSe0ouw@mail.gmail.com>
References <mailman.941.1355692240.29569.python-list@python.org> <50ceb674$0$29868$c3e8da3$5496439d@news.astraweb.com> <CAKhY55MLBeT-xLwqy59gusU3H2o_pceLKDxY-8XVifU_Ns2yrg@mail.gmail.com> <CAMuTYXis2vH5xjmHAgrESquPDQsAYWkzFnWGfDeyE9K5-Nwiww@mail.gmail.com> <CAKhY55OE+FdjR-EyXToE5TuWEMLwAi4n-NeuFhtKnOtZ=ey2DA@mail.gmail.com> <CAHzaPEM_sp=0aEtbxVPYYvea=_DuE36P9ZwtNGAVjnXCaykNaw@mail.gmail.com> <CAKhY55P3x9-WS52D5i+E+rJ7y2osGHnqTZwB2TpBK4zUSe0ouw@mail.gmail.com>
Date Mon, 17 Dec 2012 11:55:10 +0100
Subject Re: Unicode
From Vlastimil Brom <vlastimil.brom@gmail.com>
To python-list@python.org
Content-Type text/plain; charset=ISO-8859-1
Content-Transfer-Encoding quoted-printable
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.955.1355741714.29569.python-list@python.org> (permalink)
Lines 45
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1355741714 news.xs4all.nl 6889 [2001:888:2000:d::a6]:59325
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:34970

Show key headers only | View raw


2012/12/17 Anatoli Hristov <tolidtm@gmail.com>:
>> if you only see encoding problems on printing results to your
>> terminal, its settings or unicode capability might be the cause,
>> however, if you also get badly encoding items in the database, you are
>> likely using an inappropriate encoding in some step.
>
> I get badly encoding into my DB
>
>> you seem to be doing something like the following (explicitly or
>> partly implicitly, based on your system defaults):
>>
>>>>> print u"étroits, en utilisant un portable extrêmement puissant".encode("utf-8").decode("windows-1252")
>> étroits, en utilisant un portable extrêmement puissant
>>>>>
>>
>> i.e. encode a text using utf-8 and handling it like windows-1252
>> afterwards (or take an already encoded text and decode it with the
>> inappropriate ANSI encoding.
>
> Thank you Vlastimil,
>
> I tried to print it as you sholed mr, but I receive an erro:
>>>> print u"étroits, en utilisant un portable extrêmement puissant".encode("utf-8").decode("windows-1252")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0192'
> in position 1: ordinal not in range(256)
>>>>

Hi,
this seems to be an encoding error of your terminal on printing.
You may need to describe (or better post the respective parts of the
source) where the text is coming from (external text file, database
entry, harcoded in the python source ...), how it is stored, retrieved
and possibly manipulated before you insert it to the database.

You may try to print a repr(...) of the string to be inserted to the
database to see, whether it isn't already mangled in some previous
part of the processing.

hth,

    vbr

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-16 22:10 +0100
  Re: Unicode Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-17 06:06 +0000
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 09:59 +0100
    Re: Unicode Benjamin Kaplan <benjamin.kaplan@case.edu> - 2012-12-17 01:28 -0800
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 10:45 +0100
    Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 11:02 +0100
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 11:17 +0100
    Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 11:55 +0100
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 12:14 +0100
    Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 12:56 +0100
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 18:43 +0100
    Re: Unicode Dave Angel <d@davea.name> - 2012-12-17 13:07 -0500
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 19:36 +0100
      Re: Unicode Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-18 00:07 +0000
    Re: Unicode Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-17 20:55 +0100
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 21:00 +0100
    Re: Unicode Dave Angel <d@davea.name> - 2012-12-17 16:09 -0500
      Re: Unicode Hans Mulder <hansmu@xs4all.nl> - 2012-12-17 23:02 +0100
        Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 23:33 +0100
    Re: Unicode Terry Reedy <tjreedy@udel.edu> - 2012-12-17 17:03 -0500
    Re: Unicode Anatoli Hristov <tolidtm@gmail.com> - 2012-12-17 23:31 +0100

csiph-web