Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #36419

Re: Problem with Unicode char in Python 3.3.0

From Terry Reedy <tjreedy@udel.edu>
Subject Re: Problem with Unicode char in Python 3.3.0
Date 2013-01-08 03:40 -0500
References <nobody-672426.17430906012013@news.free.fr> <mailman.175.1357492817.2939.python-list@python.org> <nobody-940F87.13572507012013@news.free.fr> <kcehls$9gc$1@ger.gmane.org>
Newsgroups comp.lang.python
Message-ID <mailman.266.1357634500.2939.python-list@python.org> (permalink)

Show all headers | View raw


On 1/7/2013 8:12 AM, Terry Reedy wrote:
> On 1/7/2013 7:57 AM, Franck Ditter wrote:
>
>> <<< print('\U0001d11e')
>> Traceback (most recent call last):
>>    File "<pyshell#1>", line 1, in <module>
>>      print('\U0001d11e')
>> UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e'
>> in position 0: Non-BMP character not supported in Tk
>
> The message comes from printing to a tk text widget (the IDLE shell),
> not from creating the 1 char string. c = '\U0001d11e' works fine. When
> you have problems with creating and printing unicode, *separate*
> creating from printing to see where the problem is. (I do not know if
> the brand new tcl/tk 8.6 is any better.)
>
> The windows console also chokes, but with a different message.
>
>  >>> c='\U0001d11e'
>  >>> print(c)
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
>    File "C:\Programs\Python33\lib\encodings\cp437.py", line 19, in encode
>      return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\U0001d11e'
> in posit
> ion 0: character maps to <undefined>
>
> Yes, this is very annoying, especially in Win 7.

The above is in 3.3, in which '\U0001d11e' is actually translated to a 
length 1 string. In 3.2-, that literal is translated (on 3.2- narrow 
builds, as on Windows) to a length 2 string surrogate pair (in the BMP). 
On printing, the pair of surrogates got translated to a square box used 
for all characters for which the font does not have a glyph.  ๐„žWhen cut 
and pasted, it shows in this mail composer as a weird music sign with 
peculiar behavior.
3 -s, 3 spaces, paste, 3 spaces, 3 -s, but it may disappear.
---   ๐„ž   ---
So 3.3 is the first Windows version to get the UnicodeEncodeError on 
printing.

-- 
Terry Jan Reedy

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

Problem with Unicode char in Python 3.3.0 Franck Ditter <nobody@nowhere.org> - 2013-01-06 17:43 +0100
  Re: Problem with Unicode char in Python 3.3.0 Peter Otten <__peter__@web.de> - 2013-01-06 18:03 +0100
  Re: Problem with Unicode char in Python 3.3.0 marduk <marduk@python.net> - 2013-01-06 12:10 -0500
    Re: Problem with Unicode char in Python 3.3.0 Franck Ditter <nobody@nowhere.org> - 2013-01-07 13:57 +0100
      Re: Problem with Unicode char in Python 3.3.0 Chris Angelico <rosuav@gmail.com> - 2013-01-08 00:04 +1100
      Re: Problem with Unicode char in Python 3.3.0 Terry Reedy <tjreedy@udel.edu> - 2013-01-07 08:12 -0500
      Re: Problem with Unicode char in Python 3.3.0 Terry Reedy <tjreedy@udel.edu> - 2013-01-08 03:40 -0500

csiph-web