Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #19552

Re: what is the unicode?

From Dan Sommers <dan@tombstonezero.net>
Subject Re: what is the unicode?
Date 2012-01-28 15:01 +0000
References <CA+YdQ_5XfYqgxedBXAxOCcPiCaA6uoAnL1LNkcp2oayhZLu-bg@mail.gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.5179.1327771679.27778.python-list@python.org> (permalink)

Show all headers | View raw


On Sat, 28 Jan 2012 15:47:17 +0800, contro opinion wrote:


>>>> u'中国'.encode('utf-8')
> '\xe4\xb8\xad\xe5\x9b\xbd'
> 
> so,'\xe4\xb8\xad\xe5\x9b\xbd'  is the  utf-8  of  '中国'

Those bytes are the utf-8 encoding of those CJK ideograph (I won't even 
pretend to know what those CJK ideographs are or what they mean).

>>>> u'中国'.encode('gbk')
> '\xd6\xd0\xb9\xfa'
> so,'\xd6\xd0\xb9\xfa' is the  utf-8  of  '中国'

Those bytes are the gbk encoding of those CJK ideographs.

>>>> u'中国'
> u'\u4e2d\u56fd'
> 
> what is the meaning of u'\u4e2d\u56fd'? u'\u4e2d\u56fd'  =  \x4e2d\x56fd
>  ??

Those are the "code points" of those characters.  Unicode assigns code 
point, which are just integers, to characters.  The code points are easier 
to work with inside your application, but you usually have to encode them 
into bytes to transfer them into or out of your application.

HTH,
Dan

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: what is the unicode? Dan Sommers <dan@tombstonezero.net> - 2012-01-28 15:01 +0000

csiph-web