Path: csiph.com!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Peter Pearson Newsgroups: comp.lang.python Subject: Re: Unicode normalisation [was Re: [beginner] What's wrong?] Date: 8 Apr 2016 18:03:24 GMT Lines: 23 Message-ID: References: <87bn5sqcac.fsf@elektro.pacujo.net> <56ffedf1$0$1611$c3e8da3$5496439d@news.astraweb.com> <87h9fkq7tl.fsf@elektro.pacujo.net> <3524319.g0I1c1cpMS@PointedEars.de> <2796705.edb3E9ArW3@PointedEars.de> <1584744.4h7ToaqLat@PointedEars.de> <5705b9ef$0$1611$c3e8da3$5496439d@news.astraweb.com> <570748ec$0$1620$c3e8da3$5496439d@news.astraweb.com> <874mbcgfmd.fsf@elektro.pacujo.net> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Trace: individual.net 0qNgD/2r3OSn7CqT9LHKRAvp1cQWbZL8ir26OEjq9z+cr3qe3t Cancel-Lock: sha1:QXYHOGu6edpu9WO9wviVpyYkgqM= User-Agent: slrn/pre1.0.0-18 (Linux) Xref: csiph.com comp.lang.python:106701 On Sat, 9 Apr 2016 03:50:16 +1000, Chris Angelico wrote: > On Sat, Apr 9, 2016 at 3:44 AM, Marko Rauhamaa wrote: [snip] >> (As for ligatures, I understand that there might be quite a bit of >> legacy software that dedicated code points and code pages for ligatures. >> Translating that legacy software to Unicode was made more >> straightforward by introducing analogous codepoints to Unicode. Unicode >> has quite many such codepoints: µ, K, Ω etc.) > > More specifically, Unicode solved the problems that *codepages* had > posed. And one of the principles of its design was that every > character in every legacy encoding had a direct representation as a > Unicode codepoint, allowing bidirectional transcoding for > compatibility. Perhaps if Unicode had existed from the dawn of > computing, we'd have less characters; but backward compatibility is > way too important to let a narrow purity argument sway it. I guess with that historical perspective the current situation seems almost inevitable. Thanks. And thanks to Steven D'Aprano for other relevant insights. -- To email me, substitute nowhere->runbox, invalid->com.