Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #100013 > unrolled thread

Unicode failure

Started by"D'Arcy J.M. Cain" <darcy@VybeNetworks.com>
First post2015-12-04 13:07 -0500
Last post2015-12-07 10:48 +0000
Articles 7 — 6 participants

Back to article view | Back to comp.lang.python


Contents

  Unicode failure "D'Arcy J.M. Cain" <darcy@VybeNetworks.com> - 2015-12-04 13:07 -0500
    Re: Unicode failure Dave Farrance <df@see.replyto.invalid> - 2015-12-06 09:06 +0000
      Re: Unicode failure Dave Farrance <df@see.replyto.invalid> - 2015-12-06 09:16 +0000
      Re: Unicode failure Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-12-06 09:34 +0000
      Re: Unicode failure Random832 <random832@fastmail.com> - 2015-12-06 15:36 -0500
    Re: Unicode failure Quivis <quivis@domain.invalid> - 2015-12-06 23:09 +0000
      Re: Unicode failure Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2015-12-07 10:48 +0000

#100013 — Unicode failure

From"D'Arcy J.M. Cain" <darcy@VybeNetworks.com>
Date2015-12-04 13:07 -0500
SubjectUnicode failure
Message-ID<mailman.205.1449268365.14615.python-list@python.org>
I thought that going to Python 3.4 would solve my Unicode issues but it
seems I still don't understand this stuff.  Here is my script.

#! /usr/bin/python3
# -*- coding: UTF-8 -*-
import sys 
print(sys.getdefaultencoding()) 
print(u"\N{TRADE MARK SIGN}") 

And here is my output.

utf-8
Traceback (most recent call last):
  File "./g", line 5, in <module>
    print(u"\N{TRADE MARK SIGN}")
UnicodeEncodeError: 'ascii' codec can't encode character '\u2122' in
position 0: ordinal not in range(128)

What am I missing?

TIA.

-- 
D'Arcy J.M. Cain
Vybe Networks Inc.
http://www.VybeNetworks.com/
IM:darcy@Vex.Net VoIP: sip:darcy@VybeNetworks.com

[toc] | [next] | [standalone]


#100052

FromDave Farrance <df@see.replyto.invalid>
Date2015-12-06 09:06 +0000
Message-ID<69u76b9spvoql5eeh4h2686pmhigfvmivv@4ax.com>
In reply to#100013
"D'Arcy J.M. Cain" <darcy@VybeNetworks.com> wrote:

>...
>utf-8
>Traceback (most recent call last):
>  File "./g", line 5, in <module>
>    print(u"\N{TRADE MARK SIGN}")
>UnicodeEncodeError: 'ascii' codec can't encode character '\u2122' in
>position 0: ordinal not in range(128)

I *presume* that you're using Linux since you've got a hashbang, so...

You can *check* that it's the local environment that's the issue with
the *test* of setting the PYTHONIOENCODING environment variable. But if
that works, then it tells you must then fix the underlying environment's
character encoding to give a permanent fix.

$ PYTHONIOENCODING=UTF-8 python3 -c 'print(u"\u00A9")'
©

$ PYTHONIOENCODING=ascii python3 -c 'print(u"\u00A9")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\xa9' in
position 0: ordinal not in range(128)

[toc] | [prev] | [next] | [standalone]


#100053

FromDave Farrance <df@see.replyto.invalid>
Date2015-12-06 09:16 +0000
Message-ID<r1v76b92hj164djoc8eqfvqefqhsh6egnh@4ax.com>
In reply to#100052
I was taking it for granted that you knew how to set environment
variables, but just in case you don't: In the shell, (are you using
BASH?), put this:

export PYTHONIOENCODING=UTF-8

...then run your script.

Remember that this is *not* a permanent fix.

[toc] | [prev] | [next] | [standalone]


#100054

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2015-12-06 09:34 +0000
Message-ID<mailman.1.1449394477.2247.python-list@python.org>
In reply to#100052
On 06/12/2015 09:06, Dave Farrance wrote:
> "D'Arcy J.M. Cain" <darcy@VybeNetworks.com> wrote:
>
>> ...
>> utf-8
>> Traceback (most recent call last):
>>   File "./g", line 5, in <module>
>>     print(u"\N{TRADE MARK SIGN}")
>> UnicodeEncodeError: 'ascii' codec can't encode character '\u2122' in
>> position 0: ordinal not in range(128)
>
> I *presume* that you're using Linux since you've got a hashbang, so...
>

Not really a good presumption as the hashbang has been used in Python 
scripts on Windows ever since "PEP 397 -- Python launcher for Windows", 
see https://www.python.org/dev/peps/pep-0397/

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#100070

FromRandom832 <random832@fastmail.com>
Date2015-12-06 15:36 -0500
Message-ID<mailman.4.1449434224.12405.python-list@python.org>
In reply to#100052
Mark Lawrence <breamoreboy@yahoo.co.uk> writes:
> On 06/12/2015 09:06, Dave Farrance wrote:
>> "D'Arcy J.M. Cain" <darcy@VybeNetworks.com> wrote:
>>> utf-8
>>> Traceback (most recent call last):
>>>   File "./g", line 5, in <module>
>>>     print(u"\N{TRADE MARK SIGN}")
>>> UnicodeEncodeError: 'ascii' codec can't encode character '\u2122' in
>>> position 0: ordinal not in range(128)
>>
>> I *presume* that you're using Linux since you've got a hashbang, so...
>
> Not really a good presumption as the hashbang has been used in Python
> scripts on Windows ever since "PEP 397 -- Python launcher for
> Windows", see https://www.python.org/dev/peps/pep-0397/

However, on windows it would typically be codepage 437, 850, or the
like, and the error message would call it a 'charmap' codec. The 'ascii'
codec error is associated with being in a UNIX environment with an unset
(or "C" or "POSIX") locale.

[toc] | [prev] | [next] | [standalone]


#100073

FromQuivis <quivis@domain.invalid>
Date2015-12-06 23:09 +0000
Message-ID<ye39y.824840$FM6.212312@fx42.am4>
In reply to#100013
On Fri, 04 Dec 2015 13:07:38 -0500, D'Arcy J.M. Cain wrote:

> I thought that going to Python 3.4 would solve my Unicode issues but it
> seems I still don't understand this stuff.  Here is my script.
> 
> #! /usr/bin/python3 # -*- coding: UTF-8 -*-
> import sys print(sys.getdefaultencoding())
> print(u"\N{TRADE MARK SIGN}")
> 
> And here is my output.
> 
> utf-8 Traceback (most recent call last):
>   File "./g", line 5, in <module>
>     print(u"\N{TRADE MARK SIGN}")
> UnicodeEncodeError: 'ascii' codec can't encode character '\u2122' in
> position 0: ordinal not in range(128)

Hmmmm, interesting:

Python 2.7.3 (default, Jun 22 2015, 19:43:34) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print sys.getdefaultencoding()
ascii
>>> print u'\N{TRADE MARK SIGN}'
™

-- 
  _____  __ __ __ __ __ __   __
 ((   )) || || || \\ // ||  ((
  \\_/X| \\_// ||  \V/  || \_))
   Omnia paratus  *~*~*~*~*~*~*

[toc] | [prev] | [next] | [standalone]


#100081

FromOscar Benjamin <oscar.j.benjamin@gmail.com>
Date2015-12-07 10:48 +0000
Message-ID<mailman.12.1449485333.12405.python-list@python.org>
In reply to#100073
On Sun, 6 Dec 2015 at 23:11 Quivis <quivis@domain.invalid> wrote:

> On Fri, 04 Dec 2015 13:07:38 -0500, D'Arcy J.M. Cain wrote:
>
> > I thought that going to Python 3.4 would solve my Unicode issues but it
> > seems I still don't understand this stuff.  Here is my script.
> >
> > #! /usr/bin/python3 # -*- coding: UTF-8 -*-
> > import sys print(sys.getdefaultencoding())
> > print(u"\N{TRADE MARK SIGN}")
> >
> > And here is my output.
> >
> > utf-8 Traceback (most recent call last):
> >   File "./g", line 5, in <module>
> >     print(u"\N{TRADE MARK SIGN}")
> > UnicodeEncodeError: 'ascii' codec can't encode character '\u2122' in
> > position 0: ordinal not in range(128)
>
> Hmmmm, interesting:
>
> Python 2.7.3 (default, Jun 22 2015, 19:43:34)
> [GCC 4.6.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import sys
> >>> print sys.getdefaultencoding()
> ascii
> >>> print u'\N{TRADE MARK SIGN}'
> ™
>
>
sys.getdefaultencoding() returns the default encoding used when opening a
file if an encoding is not explicitly given in the open call. What matters
here is the encoding associated with stdout which is sys.stdout.encoding.

$ python2.7 -c 'import sys; print(sys.stdout.encoding); print(u"\u2122")'
UTF-8
™

$ LANG=C python2.7 -c 'import sys; print(sys.stdout.encoding);
print(u"\u2122")'
ANSI_X3.4-1968
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in
position 0: ordinal not in range(128)

--
Oscar

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web