Path: csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
To: python-list@python.org
From: Terry Reedy <tjreedy@udel.edu>
Subject: Re: Newbie question about text encoding
Date: Thu, 26 Feb 2015 12:02:25 -0500
References: <aae131a7-29a1-4f79-ac16-d1e223616c51@googlegroups.com> <davea@davea.name> <54EC5FA4.6070703@davea.name> <201502241455.t1OEtffT016452@fido.openend.se> <CAPTjJmqT_VnXDRpuX_yRLzUtDzedZqUNx5Zhba+d6ZVD9+PNdg@mail.gmail.com> <201502241507.t1OF7aUm018883@fido.openend.se> <rosuav@gmail.com> <CAPTjJmpg+Ar-83fLPN5Pg3U5udLbkS0tBqF+aGQbiLrCVJ5aSw@mail.gmail.com> <201502241524.t1OFO09k022270@fido.openend.se> <CAPTjJmoSZm8xRxeq-8G5KOKWddQxq23ieWqsY+jjCJXuY3DP0A@mail.gmail.com> <201502241620.t1OGKf4n002146@fido.openend.se> <54ECB134.5090304@davea.name> <201502241945.t1OJjshO013092@fido.openend.se> <201502241957.t1OJvrJS015604@fido.openend.se> <mailman.19148.1424810518.18130.python-list@python.org> <ef520397-b1f0-47bf-8d24-585a9ba230e2@googlegroups.com> <CAPTjJmreaPu7MZQgmbFNnhhg9R6w9dHPPo=yBbMoG85HxK+H_Q@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0
In-Reply-To: <CAPTjJmreaPu7MZQgmbFNnhhg9R6w9dHPPo=yBbMoG85HxK+H_Q@mail.gmail.com>
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.19274.1424970167.18130.python-list@python.org>
Lines: 69
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:86519

On 2/26/2015 8:24 AM, Chris Angelico wrote:
> On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody <rustompmody@gmail.com> w=
rote:
>> Wrote something up on why we should stop using ASCII:
>> http://blog.languager.org/2015/02/universal-unicode.html

I think that the main point of the post, that many Unicode chars are=20
truly planetary rather than just national/regional, is excellent.

>  From that post:
>
> """
> 5.1 Gibberish
>
> When going from the original 2-byte unicode (around version 3?) to the
> one having supplemental planes, the unicode consortium added blocks
> such as
>
> * Egyptian hieroglyphs
> * Cuneiform
> * Shavian
> * Deseret
> * Mahjong
> * Klingon
>
> To me (a layman) it looks unprofessional =E2=80=93 as though they are p=
laying
> games =E2=80=93 that billions of computing devices, each having billion=
s of
> storage words should have their storage wasted on blocks such as
> these.
> """
>
> The shift from Unicode as a 16-bit code to having multiple planes came
> in with Unicode 2.0, but the various blocks were assigned separately:
> * Egyptian hieroglyphs: Unicode 5.2
> * Cuneiform: Unicode 5.0
> * Shavian: Unicode 4.0
> * Deseret: Unicode 3.1
> * Mahjong Tiles: Unicode 5.1
> * Klingon: Not part of any current standard

You should add emoticons, but not call them or the above 'gibberish'.
I think that this part of your post is more 'unprofessional' than the=20
character blocks.  It is very jarring and seems contrary to your main poi=
nt.

> However, I don't think historians will appreciate you calling all of
> these "gibberish". To adequately describe and discuss old texts
> without these Unicode blocks, we'd have to either do everything with
> images, or craft some kind of reversible transliteration system and
> have dedicated software to render the texts on screen. Instead, what
> we have is a well-known and standardized system for transliterating
> all of these into numbers (code points), and rendering them becomes a
> simple matter of installing an appropriate font.
>
> Also, how does assigning meanings to codepoints "waste storage"? As
> soon as Unicode 2.0 hit and 16-bit code units stopped being
> sufficient, everyone needed to allocate storage - either 32 bits per
> character, or some other system - and the fact that some codepoints
> were unassigned had absolutely no impact on that. This is decidedly
> NOT unprofessional, and it's not wasteful either.

I agree.

--=20
Terry Jan Reedy