Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed1a.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <ef520397-b1f0-47bf-8d24-585a9ba230e2@googlegroups.com>
References: <aae131a7-29a1-4f79-ac16-d1e223616c51@googlegroups.com> <davea@davea.name> <54EC5FA4.6070703@davea.name> <201502241455.t1OEtffT016452@fido.openend.se> <CAPTjJmqT_VnXDRpuX_yRLzUtDzedZqUNx5Zhba+d6ZVD9+PNdg@mail.gmail.com> <201502241507.t1OF7aUm018883@fido.openend.se> <rosuav@gmail.com> <CAPTjJmpg+Ar-83fLPN5Pg3U5udLbkS0tBqF+aGQbiLrCVJ5aSw@mail.gmail.com> <201502241524.t1OFO09k022270@fido.openend.se> <CAPTjJmoSZm8xRxeq-8G5KOKWddQxq23ieWqsY+jjCJXuY3DP0A@mail.gmail.com> <201502241620.t1OGKf4n002146@fido.openend.se> <54ECB134.5090304@davea.name> <201502241945.t1OJjshO013092@fido.openend.se> <201502241957.t1OJvrJS015604@fido.openend.se> <mailman.19148.1424810518.18130.python-list@python.org> <ef520397-b1f0-47bf-8d24-585a9ba230e2@googlegroups.com>
Date: Fri, 27 Feb 2015 00:24:03 +1100
Subject: Re: Newbie question about text encoding
From: Chris Angelico <rosuav@gmail.com>
Cc: "python-list@python.org" <python-list@python.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.19255.1424957046.18130.python-list@python.org>
Lines: 54
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:86499

On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody <rustompmody@gmail.com> wrote=
:
> Wrote something up on why we should stop using ASCII:
> http://blog.languager.org/2015/02/universal-unicode.html

>From that post:

"""
5.1 Gibberish

When going from the original 2-byte unicode (around version 3?) to the
one having supplemental planes, the unicode consortium added blocks
such as

* Egyptian hieroglyphs
* Cuneiform
* Shavian
* Deseret
* Mahjong
* Klingon

To me (a layman) it looks unprofessional =E2=80=93 as though they are playi=
ng
games =E2=80=93 that billions of computing devices, each having billions of
storage words should have their storage wasted on blocks such as
these.
"""

The shift from Unicode as a 16-bit code to having multiple planes came
in with Unicode 2.0, but the various blocks were assigned separately:
* Egyptian hieroglyphs: Unicode 5.2
* Cuneiform: Unicode 5.0
* Shavian: Unicode 4.0
* Deseret: Unicode 3.1
* Mahjong Tiles: Unicode 5.1
* Klingon: Not part of any current standard

However, I don't think historians will appreciate you calling all of
these "gibberish". To adequately describe and discuss old texts
without these Unicode blocks, we'd have to either do everything with
images, or craft some kind of reversible transliteration system and
have dedicated software to render the texts on screen. Instead, what
we have is a well-known and standardized system for transliterating
all of these into numbers (code points), and rendering them becomes a
simple matter of installing an appropriate font.

Also, how does assigning meanings to codepoints "waste storage"? As
soon as Unicode 2.0 hit and 16-bit code units stopped being
sufficient, everyone needed to allocate storage - either 32 bits per
character, or some other system - and the fact that some codepoints
were unassigned had absolutely no impact on that. This is decidedly
NOT unprofessional, and it's not wasteful either.

ChrisA