Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Oscar Benjamin Newsgroups: comp.lang.python Subject: Re: Unicode failure Date: Mon, 07 Dec 2015 10:48:35 +0000 Lines: 49 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: news.uni-berlin.de ce0iRuF/Nq2/fKg3fE7WWQdoLwh9bDHF4anGD9HtaxGQ== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'sys': 0.05; '-*-': 0.07; 'skip:/ 10': 0.07; 'utf-8': 0.07; 'coding:': 0.09; 'encode': 0.09; 'stdout': 0.09; 'python': 0.10; '>>>': 0.15; 'encoding': 0.15; 'explicitly': 0.15; '2.7.3': 0.16; 'codec': 0.16; 'interesting:': 0.16; 'ordinal': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'stuff.': 0.16; 'subject:Unicode': 0.16; 'wrote:': 0.16; 'script.': 0.18; '>': 0.18; '>>>': 0.20; '2015': 0.20; 'to:2**1': 0.21; '"",': 0.22; 'ascii': 0.22; 'seems': 0.23; 'dec': 0.23; 'import': 0.24; '(most': 0.24; 'header :In-Reply-To:1': 0.24; 'fri,': 0.27; 'message-id:@mail.gmail.com': 0.27; '-0500,': 0.29; 'python2.7': 0.29; 'character': 0.29; 'print': 0.30; 'call.': 0.30; "can't": 0.32; 'traceback': 0.33; 'open': 0.33; 'file': 0.34; 'skip:& 20': 0.35; 'received:google.com': 0.35; 'unicode': 0.35; 'skip:p 30': 0.35; 'but': 0.36; 'received:209.85': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'skip:& 10': 0.37; 'thought': 0.37; 'associated': 0.38; 'received:209': 0.38; 'skip:p 20': 0.38; 'to:addr:python.org': 0.40; 'mark': 0.40; 'still': 0.40; 'skip:u 10': 0.61; 'default': 0.61; 'more': 0.63; 'here': 0.66; '3.4': 0.84; 'oscar': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-type; bh=T2bGtwsOFxRavF054rMJ6+VL/0unfFn/4Dx3+3XzGr4=; b=tj2VHl9rqhYaigItdSFXwWLBgD/oXd+EVLDbS3jSW6U3SW4Xd1HBTmpNRzWBC2i+bm +79R5XiS8PocqvgqnxXkbfUyKH3WWL0rkBb3gB23Iqe/HKYjw7tH3DJZz91rpkKg967G ZcWz8i51yP1Yj3fUW9RJPCtHQtLkZhJu4dbuk7oobSu7quclrdNdrMt2DTh0abdBm4Bq 0eMRWuyqhpyi+pOJn8NM+TF6dWDqjEfL3H4JxBZEufNZ+dFLCbe/wPkQ5zmhIz1TFSTT EYDnZSpz1I2K7XgGhm1/BtPiokVxTwyEEfoyiMHgCEAhgS+ASHQbBhT7laHUlNEWdQPx gAHA== X-Received: by 10.112.137.132 with SMTP id qi4mr13604880lbb.120.1449485325681; Mon, 07 Dec 2015 02:48:45 -0800 (PST) In-Reply-To: X-Content-Filtered-By: Mailman/MimeDel 2.1.20+ X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:100081 On Sun, 6 Dec 2015 at 23:11 Quivis wrote: > On Fri, 04 Dec 2015 13:07:38 -0500, D'Arcy J.M. Cain wrote: > > > I thought that going to Python 3.4 would solve my Unicode issues but it > > seems I still don't understand this stuff. Here is my script. > > > > #! /usr/bin/python3 # -*- coding: UTF-8 -*- > > import sys print(sys.getdefaultencoding()) > > print(u"\N{TRADE MARK SIGN}") > > > > And here is my output. > > > > utf-8 Traceback (most recent call last): > > File "./g", line 5, in > > print(u"\N{TRADE MARK SIGN}") > > UnicodeEncodeError: 'ascii' codec can't encode character '\u2122' in > > position 0: ordinal not in range(128) > > Hmmmm, interesting: > > Python 2.7.3 (default, Jun 22 2015, 19:43:34) > [GCC 4.6.3] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import sys > >>> print sys.getdefaultencoding() > ascii > >>> print u'\N{TRADE MARK SIGN}' > =E2=84=A2 > > sys.getdefaultencoding() returns the default encoding used when opening a file if an encoding is not explicitly given in the open call. What matters here is the encoding associated with stdout which is sys.stdout.encoding. $ python2.7 -c 'import sys; print(sys.stdout.encoding); print(u"\u2122")' UTF-8 =E2=84=A2 $ LANG=3DC python2.7 -c 'import sys; print(sys.stdout.encoding); print(u"\u2122")' ANSI_X3.4-1968 Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 0: ordinal not in range(128) -- Oscar