Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #36267 > unrolled thread

Problem with Unicode char in Python 3.3.0

Started byFranck Ditter <nobody@nowhere.org>
First post2013-01-06 17:43 +0100
Last post2013-01-08 03:40 -0500
Articles 7 — 5 participants

Back to article view | Back to comp.lang.python


Contents

  Problem with Unicode char in Python 3.3.0 Franck Ditter <nobody@nowhere.org> - 2013-01-06 17:43 +0100
    Re: Problem with Unicode char in Python 3.3.0 Peter Otten <__peter__@web.de> - 2013-01-06 18:03 +0100
    Re: Problem with Unicode char in Python 3.3.0 marduk <marduk@python.net> - 2013-01-06 12:10 -0500
      Re: Problem with Unicode char in Python 3.3.0 Franck Ditter <nobody@nowhere.org> - 2013-01-07 13:57 +0100
        Re: Problem with Unicode char in Python 3.3.0 Chris Angelico <rosuav@gmail.com> - 2013-01-08 00:04 +1100
        Re: Problem with Unicode char in Python 3.3.0 Terry Reedy <tjreedy@udel.edu> - 2013-01-07 08:12 -0500
        Re: Problem with Unicode char in Python 3.3.0 Terry Reedy <tjreedy@udel.edu> - 2013-01-08 03:40 -0500

#36267 — Problem with Unicode char in Python 3.3.0

FromFranck Ditter <nobody@nowhere.org>
Date2013-01-06 17:43 +0100
SubjectProblem with Unicode char in Python 3.3.0
Message-ID<nobody-672426.17430906012013@news.free.fr>
Hi !
I work on MacOS-X Lion and IDLE/Python 3.3.0
I can't get the treble key (U1D11E) !

>>> "\U1D11E"
SyntaxError: (unicode error) 'unicodeescape' codec can't 
decode bytes in position 0-6: end of string in escape sequence

How can I display musical keys ?

Thanks,

   franck

[toc] | [next] | [standalone]


#36269

FromPeter Otten <__peter__@web.de>
Date2013-01-06 18:03 +0100
Message-ID<mailman.174.1357491789.2939.python-list@python.org>
In reply to#36267
Franck Ditter wrote:

> I work on MacOS-X Lion and IDLE/Python 3.3.0
> I can't get the treble key (U1D11E) !
> 
>>>> "\U1D11E"
> SyntaxError: (unicode error) 'unicodeescape' codec can't
> decode bytes in position 0-6: end of string in escape sequence
> 
> How can I display musical keys ?

Try
>>> "\U0001D11E"
'π„ž'

[toc] | [prev] | [next] | [standalone]


#36271

Frommarduk <marduk@python.net>
Date2013-01-06 12:10 -0500
Message-ID<mailman.175.1357492817.2939.python-list@python.org>
In reply to#36267

On Sun, Jan 6, 2013, at 11:43 AM, Franck Ditter wrote:
> Hi !
> I work on MacOS-X Lion and IDLE/Python 3.3.0
> I can't get the treble key (U1D11E) !
> 
> >>> "\U1D11E"
> SyntaxError: (unicode error) 'unicodeescape' codec can't 
> decode bytes in position 0-6: end of string in escape sequence
> 

You probably meant:

>>> '\U0001d11e'


For that synax you must use either '\uXXXX' or '\UXXXXXXXX' (i.e.
specify either 4 or 8 hex digits).

http://docs.python.org/2/howto/unicode#unicode-literals-in-python-source-code

[toc] | [prev] | [next] | [standalone]


#36346

FromFranck Ditter <nobody@nowhere.org>
Date2013-01-07 13:57 +0100
Message-ID<nobody-940F87.13572507012013@news.free.fr>
In reply to#36271
In article <mailman.175.1357492817.2939.python-list@python.org>,
 marduk <marduk@python.net> wrote:

> On Sun, Jan 6, 2013, at 11:43 AM, Franck Ditter wrote:
> > Hi !
> > I work on MacOS-X Lion and IDLE/Python 3.3.0
> > I can't get the treble key (U1D11E) !
> > 
> > >>> "\U1D11E"
> > SyntaxError: (unicode error) 'unicodeescape' codec can't 
> > decode bytes in position 0-6: end of string in escape sequence
> > 
> 
> You probably meant:
> 
> >>> '\U0001d11e'
> 
> 
> For that synax you must use either '\uXXXX' or '\UXXXXXXXX' (i.e.
> specify either 4 or 8 hex digits).
> 
> http://docs.python.org/2/howto/unicode#unicode-literals-in-python-source-code

<<< print('\U0001d11e')
Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    print('\U0001d11e')
UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e' 
in position 0: Non-BMP character not supported in Tk

[toc] | [prev] | [next] | [standalone]


#36347

FromChris Angelico <rosuav@gmail.com>
Date2013-01-08 00:04 +1100
Message-ID<mailman.217.1357563892.2939.python-list@python.org>
In reply to#36346
On Mon, Jan 7, 2013 at 11:57 PM, Franck Ditter <nobody@nowhere.org> wrote:
> <<< print('\U0001d11e')
> Traceback (most recent call last):
>   File "<pyshell#1>", line 1, in <module>
>     print('\U0001d11e')
> UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e'
> in position 0: Non-BMP character not supported in Tk

That's a different issue; IDLE can't handle non-BMP characters. Try it
from the terminal if you can - on my Linux systems (Debians and
Ubuntus with GNOME and gnome-terminal), the terminal is set to UTF-8
and quite happily accepts the full Unicode range. On Windows, that may
well not be the case, though.

ChrisA

[toc] | [prev] | [next] | [standalone]


#36348

FromTerry Reedy <tjreedy@udel.edu>
Date2013-01-07 08:12 -0500
Message-ID<mailman.218.1357564437.2939.python-list@python.org>
In reply to#36346
On 1/7/2013 7:57 AM, Franck Ditter wrote:

> <<< print('\U0001d11e')
> Traceback (most recent call last):
>    File "<pyshell#1>", line 1, in <module>
>      print('\U0001d11e')
> UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e'
> in position 0: Non-BMP character not supported in Tk

The message comes from printing to a tk text widget (the IDLE shell), 
not from creating the 1 char string. c = '\U0001d11e' works fine. When 
you have problems with creating and printing unicode, *separate* 
creating from printing to see where the problem is. (I do not know if 
the brand new tcl/tk 8.6 is any better.)

The windows console also chokes, but with a different message.

 >>> c='\U0001d11e'
 >>> print(c)
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "C:\Programs\Python33\lib\encodings\cp437.py", line 19, in encode
     return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001d11e' 
in posit
ion 0: character maps to <undefined>

Yes, this is very annoying, especially in Win 7.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#36419

FromTerry Reedy <tjreedy@udel.edu>
Date2013-01-08 03:40 -0500
Message-ID<mailman.266.1357634500.2939.python-list@python.org>
In reply to#36346
On 1/7/2013 8:12 AM, Terry Reedy wrote:
> On 1/7/2013 7:57 AM, Franck Ditter wrote:
>
>> <<< print('\U0001d11e')
>> Traceback (most recent call last):
>>    File "<pyshell#1>", line 1, in <module>
>>      print('\U0001d11e')
>> UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d11e'
>> in position 0: Non-BMP character not supported in Tk
>
> The message comes from printing to a tk text widget (the IDLE shell),
> not from creating the 1 char string. c = '\U0001d11e' works fine. When
> you have problems with creating and printing unicode, *separate*
> creating from printing to see where the problem is. (I do not know if
> the brand new tcl/tk 8.6 is any better.)
>
> The windows console also chokes, but with a different message.
>
>  >>> c='\U0001d11e'
>  >>> print(c)
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
>    File "C:\Programs\Python33\lib\encodings\cp437.py", line 19, in encode
>      return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\U0001d11e'
> in posit
> ion 0: character maps to <undefined>
>
> Yes, this is very annoying, especially in Win 7.

The above is in 3.3, in which '\U0001d11e' is actually translated to a 
length 1 string. In 3.2-, that literal is translated (on 3.2- narrow 
builds, as on Windows) to a length 2 string surrogate pair (in the BMP). 
On printing, the pair of surrogates got translated to a square box used 
for all characters for which the font does not have a glyph.  π„žWhen cut 
and pasted, it shows in this mail composer as a weird music sign with 
peculiar behavior.
3 -s, 3 spaces, paste, 3 spaces, 3 -s, but it may disappear.
---   π„ž   ---
So 3.3 is the first Windows version to get the UnicodeEncodeError on 
printing.

-- 
Terry Jan Reedy

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web