Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #19552
| Path | csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!txtfeed1.tudelft.nl!tudelft.nl!txtfeed2.tudelft.nl!amsnews11.chello.com!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <python-python-list@m.gmane.org> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.004 |
| X-Spam-Evidence | '*H*': 0.99; '*S*': 0.00; 'pretend': 0.07; '>>>>': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'utf-8': 0.09; '"code': 0.16; 'assigns': 0.16; 'encode': 0.16; 'subject:unicode': 0.16; 'wrote:': 0.16; 'bytes': 0.18; 'jan': 0.19; 'code': 0.25; 'sat,': 0.25; 'unicode': 0.28; "skip:' 10": 0.29; 'subject:?': 0.30; 'usually': 0.30; 'point,': 0.32; 'received:24': 0.32; "won't": 0.33; 'header:User-Agent:1': 0.33; 'to:addr:python-list': 0.33; 'points': 0.33; 'header:X-Complaints-To:1': 0.34; 'encoding': 0.36; 'but': 0.37; 'received:org': 0.37; 'easier': 0.38; 'subject:: ': 0.39; 'to:addr:python.org': 0.40; 'your': 0.61; 'transfer': 0.72; '8bit%:24': 0.84 |
| X-Injected-Via-Gmane | http://gmane.org/ |
| To | python-list@python.org |
| From | Dan Sommers <dan@tombstonezero.net> |
| Subject | Re: what is the unicode? |
| Date | Sat, 28 Jan 2012 15:01:09 +0000 (UTC) |
| References | <CA+YdQ_5XfYqgxedBXAxOCcPiCaA6uoAnL1LNkcp2oayhZLu-bg@mail.gmail.com> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=UTF-8 |
| Content-Transfer-Encoding | 8bit |
| X-Gmane-NNTP-Posting-Host | 24.100.92.194 |
| User-Agent | Pan/0.133 (House of Butterflies) |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.12 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.5179.1327771679.27778.python-list@python.org> (permalink) |
| Lines | 31 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1327771679 news.xs4all.nl 6898 [2001:888:2000:d::a6]:50151 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | x330-a1.tempe.blueboxinc.net comp.lang.python:19552 |
Show key headers only | View raw
On Sat, 28 Jan 2012 15:47:17 +0800, contro opinion wrote:
>>>> u'中国'.encode('utf-8')
> '\xe4\xb8\xad\xe5\x9b\xbd'
>
> so,'\xe4\xb8\xad\xe5\x9b\xbd' is the utf-8 of '中国'
Those bytes are the utf-8 encoding of those CJK ideograph (I won't even
pretend to know what those CJK ideographs are or what they mean).
>>>> u'中国'.encode('gbk')
> '\xd6\xd0\xb9\xfa'
> so,'\xd6\xd0\xb9\xfa' is the utf-8 of '中国'
Those bytes are the gbk encoding of those CJK ideographs.
>>>> u'中国'
> u'\u4e2d\u56fd'
>
> what is the meaning of u'\u4e2d\u56fd'? u'\u4e2d\u56fd' = \x4e2d\x56fd
> ??
Those are the "code points" of those characters. Unicode assigns code
point, which are just integers, to characters. The code points are easier
to work with inside your application, but you usually have to encode them
into bytes to transfer them into or out of your application.
HTH,
Dan
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Re: what is the unicode? Dan Sommers <dan@tombstonezero.net> - 2012-01-28 15:01 +0000
csiph-web