Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #100037

Re: Unicode failure

From eryk sun <eryksun@gmail.com>
Newsgroups comp.lang.python
Subject Re: Unicode failure
Date 2015-12-05 07:21 -0600
Message-ID <mailman.223.1449321767.14615.python-list@python.org> (permalink)
References <20151204130738.76313c43@imp> <n3t7jo$ae3$1@ger.gmane.org> <n3tla3$7p7$1@ger.gmane.org> <n3tutf$6n8$1@ger.gmane.org> <CAPTjJmpqxnZsR8hh5j3qknkq_-ELde85+M6CDthW8yLMT=t3SA@mail.gmail.com>

Show all headers | View raw


On Sat, Dec 5, 2015 at 12:10 AM, Chris Angelico <rosuav@gmail.com> wrote:
> On Sat, Dec 5, 2015 at 5:06 PM, Terry Reedy <tjreedy@udel.edu> wrote:
>> On 12/4/2015 10:22 PM, Random832 wrote:
>>>
>>> On 2015-12-04, Terry Reedy <tjreedy@udel.edu> wrote:
>>>>
>>>> Tk widgets, and hence IDLE windows, will print any character from \u0000
>>>> to \uffff without raising, even if the result is blank or �.  Higher
>>>> codepoints fail, but allowing the entire BMP is better than any Windows
>>>> codepage.
>>>
>>>
>>> Well, any bar 1200, 1201, 12000, 12001, 65000, 65001, and 54936.
>>
>>
>> Test before you post.
>>
>>>>> for cp in 1200, 1201, 12000, 12001, 65000, 65001, 54936:
>>         print(chr(cp))
>>
>>
>> Ұ
>> ұ
>> ⻠
>> ⻡
>> �
>> �
>> 횘
>
> Those numbers aren't codepoints, they're code pages. Specifically,
> they're UTF-16, UTF-32, UTF-8, and I'm not sure what 54936 is.

Codepage 65000 is UTF-7. Codepage 54936 [1] is GB18030, the official
character set of China. It's a UTF superset of GBK. For comparison,
codepage 936 is a subset of GBK (it's missing 95 characters) plus the
Euro symbol.

[1]: https://msdn.microsoft.com/en-us/library/dd317756

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Unicode failure eryk sun <eryksun@gmail.com> - 2015-12-05 07:21 -0600

csiph-web