Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Oscar Benjamin Newsgroups: comp.lang.python Subject: Re: Unicode failure Date: Fri, 4 Dec 2015 22:54:49 +0000 Lines: 29 Message-ID: References: <20151204130738.76313c43@imp> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: news.uni-berlin.de h8FNJKhMNQhhQd4OYkcGBgQoL/Am8dDfatDmb0Evm2/g== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'sys': 0.05; '-*-': 0.07; 'skip:/ 10': 0.07; 'utf-8': 0.07; 'cc:addr:python-list': 0.09; 'coding:': 0.09; 'encode': 0.09; 'lang': 0.09; 'python': 0.10; 'encoding': 0.15; 'cc:name:python list': 0.16; 'codec': 0.16; 'missing?': 0.16; 'ordinal': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'stuff.': 0.16; 'subject:Unicode': 0.16; 'wrote:': 0.16; 'script.': 0.18; '>': 0.18; '2015': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'trying': 0.22; 'seems': 0.23; 'dec': 0.23; 'import': 0.24; '(most': 0.24; 'unix': 0.24; 'header:In-Reply-To:1': 0.24; 'message-id:@mail.gmail.com': 0.27; 'environment': 0.29; 'character': 0.29; 'print': 0.30; "can't": 0.32; 'traceback': 0.33; 'file': 0.34; 'received:google.com': 0.35; 'unicode': 0.35; 'skip:p 30': 0.35; 'but': 0.36; 'received:209.85': 0.36; 'subject:: ': 0.37; 'skip:& 10': 0.37; 'thought': 0.37; 'associated': 0.38; 'received:209': 0.38; 'skip:p 20': 0.38; 'mark': 0.40; 'still': 0.40; 'skip:u 10': 0.61; 'default': 0.61; 'is.': 0.63; 'here': 0.66; '3.4': 0.84; 'oscar': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=TDTPnkeLENPITCIx0gp+LLmh+4SE9sogbsoB5XSj+Hg=; b=qT3ugHXrLGLib5gCoH4j331azfh2goL7upv773bqS9oamJm/G/xJVaSaWFDzmPAFFi tXThTZKz7ULw0e5CGUNxaEvqY3kbz/itCr0xKRZ86REmRzsVTFQT7VIRWzj7a07Sv1Zt iDs6ivisqQCOBJ6Mx3ZTyz/d4DUxc76U0pRG3U268UyFggXhxKuulvajwCmFLoIvf+hP qJ6G2Jv2NFm2dgGN8e/LLeu2Nz/rXEqPpw3o3uD0OgvFmPjFVPdYbZRWfZ8XR1u3vBUM 1400/zo7Rcnmx98Wpc+hBWStEyI+AFivXHCaVE6/f7IgXE0J5uZ/cC+J94FcB2l/1iKi j6Cw== X-Received: by 10.112.151.67 with SMTP id uo3mr7719732lbb.43.1449269690091; Fri, 04 Dec 2015 14:54:50 -0800 (PST) In-Reply-To: <20151204130738.76313c43@imp> X-Content-Filtered-By: Mailman/MimeDel 2.1.20+ X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:100019 On 4 Dec 2015 22:34, "D'Arcy J.M. Cain" wrote: > > I thought that going to Python 3.4 would solve my Unicode issues but it > seems I still don't understand this stuff. Here is my script. > > #! /usr/bin/python3 > # -*- coding: UTF-8 -*- > import sys > print(sys.getdefaultencoding()) > print(u"\N{TRADE MARK SIGN}") > > And here is my output. > > utf-8 > Traceback (most recent call last): > File "./g", line 5, in > print(u"\N{TRADE MARK SIGN}") > UnicodeEncodeError: 'ascii' codec can't encode character '\u2122' in > position 0: ordinal not in range(128) > > What am I missing? The important thing is not the default encoding but the encoding associated with stdout. Try printing sys.stdout.encoding to see what that is. It may depend what terminal you're trying to print out in. Are you using cmd.exe? If on Unix what's the value of LANG environment variable? -- Oscar