Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #100081

Re: Unicode failure

Path csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From Oscar Benjamin <oscar.j.benjamin@gmail.com>
Newsgroups comp.lang.python
Subject Re: Unicode failure
Date Mon, 07 Dec 2015 10:48:35 +0000
Lines 49
Message-ID <mailman.12.1449485333.12405.python-list@python.org> (permalink)
References <mailman.205.1449268365.14615.python-list@python.org> <ye39y.824840$FM6.212312@fx42.am4>
Mime-Version 1.0
Content-Type text/plain; charset=UTF-8
Content-Transfer-Encoding quoted-printable
X-Trace news.uni-berlin.de ce0iRuF/Nq2/fKg3fE7WWQdoLwh9bDHF4anGD9HtaxGQ==
Return-Path <oscar.j.benjamin@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'sys': 0.05; '-*-': 0.07; 'skip:/ 10': 0.07; 'utf-8': 0.07; 'coding:': 0.09; 'encode': 0.09; 'stdout': 0.09; 'python': 0.10; '&gt;&gt;&gt;': 0.15; 'encoding': 0.15; 'explicitly': 0.15; '2.7.3': 0.16; 'codec': 0.16; 'interesting:': 0.16; 'ordinal': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'stuff.': 0.16; 'subject:Unicode': 0.16; 'wrote:': 0.16; 'script.': 0.18; '&gt;': 0.18; '>>>': 0.20; '2015': 0.20; 'to:2**1': 0.21; '"",': 0.22; 'ascii': 0.22; 'seems': 0.23; 'dec': 0.23; 'import': 0.24; '(most': 0.24; 'header :In-Reply-To:1': 0.24; 'fri,': 0.27; 'message-id:@mail.gmail.com': 0.27; '-0500,': 0.29; 'python2.7': 0.29; 'character': 0.29; 'print': 0.30; 'call.': 0.30; "can't": 0.32; 'traceback': 0.33; 'open': 0.33; 'file': 0.34; 'skip:& 20': 0.35; 'received:google.com': 0.35; 'unicode': 0.35; 'skip:p 30': 0.35; 'but': 0.36; 'received:209.85': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'skip:& 10': 0.37; 'thought': 0.37; 'associated': 0.38; 'received:209': 0.38; 'skip:p 20': 0.38; 'to:addr:python.org': 0.40; 'mark': 0.40; 'still': 0.40; 'skip:u 10': 0.61; 'default': 0.61; 'more': 0.63; 'here': 0.66; '3.4': 0.84; 'oscar': 0.84
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-type; bh=T2bGtwsOFxRavF054rMJ6+VL/0unfFn/4Dx3+3XzGr4=; b=tj2VHl9rqhYaigItdSFXwWLBgD/oXd+EVLDbS3jSW6U3SW4Xd1HBTmpNRzWBC2i+bm +79R5XiS8PocqvgqnxXkbfUyKH3WWL0rkBb3gB23Iqe/HKYjw7tH3DJZz91rpkKg967G ZcWz8i51yP1Yj3fUW9RJPCtHQtLkZhJu4dbuk7oobSu7quclrdNdrMt2DTh0abdBm4Bq 0eMRWuyqhpyi+pOJn8NM+TF6dWDqjEfL3H4JxBZEufNZ+dFLCbe/wPkQ5zmhIz1TFSTT EYDnZSpz1I2K7XgGhm1/BtPiokVxTwyEEfoyiMHgCEAhgS+ASHQbBhT7laHUlNEWdQPx gAHA==
X-Received by 10.112.137.132 with SMTP id qi4mr13604880lbb.120.1449485325681; Mon, 07 Dec 2015 02:48:45 -0800 (PST)
In-Reply-To <ye39y.824840$FM6.212312@fx42.am4>
X-Content-Filtered-By Mailman/MimeDel 2.1.20+
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.20+
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Xref csiph.com comp.lang.python:100081

Show key headers only | View raw


On Sun, 6 Dec 2015 at 23:11 Quivis <quivis@domain.invalid> wrote:

> On Fri, 04 Dec 2015 13:07:38 -0500, D'Arcy J.M. Cain wrote:
>
> > I thought that going to Python 3.4 would solve my Unicode issues but it
> > seems I still don't understand this stuff.  Here is my script.
> >
> > #! /usr/bin/python3 # -*- coding: UTF-8 -*-
> > import sys print(sys.getdefaultencoding())
> > print(u"\N{TRADE MARK SIGN}")
> >
> > And here is my output.
> >
> > utf-8 Traceback (most recent call last):
> >   File "./g", line 5, in <module>
> >     print(u"\N{TRADE MARK SIGN}")
> > UnicodeEncodeError: 'ascii' codec can't encode character '\u2122' in
> > position 0: ordinal not in range(128)
>
> Hmmmm, interesting:
>
> Python 2.7.3 (default, Jun 22 2015, 19:43:34)
> [GCC 4.6.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import sys
> >>> print sys.getdefaultencoding()
> ascii
> >>> print u'\N{TRADE MARK SIGN}'
> ™
>
>
sys.getdefaultencoding() returns the default encoding used when opening a
file if an encoding is not explicitly given in the open call. What matters
here is the encoding associated with stdout which is sys.stdout.encoding.

$ python2.7 -c 'import sys; print(sys.stdout.encoding); print(u"\u2122")'
UTF-8
™

$ LANG=C python2.7 -c 'import sys; print(sys.stdout.encoding);
print(u"\u2122")'
ANSI_X3.4-1968
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in
position 0: ordinal not in range(128)

--
Oscar

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

Unicode failure "D'Arcy J.M. Cain" <darcy@VybeNetworks.com> - 2015-12-04 13:07 -0500
  Re: Unicode failure Dave Farrance <df@see.replyto.invalid> - 2015-12-06 09:06 +0000
    Re: Unicode failure Dave Farrance <df@see.replyto.invalid> - 2015-12-06 09:16 +0000
    Re: Unicode failure Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-12-06 09:34 +0000
    Re: Unicode failure Random832 <random832@fastmail.com> - 2015-12-06 15:36 -0500
  Re: Unicode failure Quivis <quivis@domain.invalid> - 2015-12-06 23:09 +0000
    Re: Unicode failure Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2015-12-07 10:48 +0000

csiph-web