X-Received: by 10.50.115.104 with SMTP id jn8mr5449860igb.1.1399961345486; Mon, 12 May 2014 23:09:05 -0700 (PDT) X-Received: by 10.50.61.144 with SMTP id p16mr600725igr.16.1399961345395; Mon, 12 May 2014 23:09:05 -0700 (PDT) Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!r10no3733753igi.0!news-out.google.com!gi6ni851igc.0!nntp.google.com!c1no5460716igq.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.python Date: Mon, 12 May 2014 23:09:04 -0700 (PDT) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=59.95.59.119; posting-account=mBpa7woAAAAGLEWUUKpmbxm-Quu5D8ui NNTP-Posting-Host: 59.95.59.119 References: <8P7cv.78617$Sp6.8377@fx15.am4> <537172eb$0$29980$c3e8da3$5496439d@news.astraweb.com> <82899649-014a-4309-b06e-b981fc6921fa@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <72d4f4e7-1bbd-4ceb-8e7f-d8ca18e1c1b2@googlegroups.com> Subject: Re: Everything you did not want to know about Unicode in Python 3 From: Rustom Mody Injection-Date: Tue, 13 May 2014 06:09:05 +0000 Content-Type: text/plain; charset=ISO-8859-1 Xref: csiph.com comp.lang.python:71436 On Tuesday, May 13, 2014 11:09:06 AM UTC+5:30, Mark H. Harris wrote: > On 5/13/14 12:10 AM, Rustom Mody wrote: > > > I think the most helpful way forward is to accept two things: > > a. Unicode is a headache > > b. No-unicode is a non-option > > > QOTW (so far...) I said that getting unicode right straight off is unrealistic. I should have added this: Armin makes a (sarcastic?) dig about the fact that python (3) goofs because its mismatched with the assumptions of unix. | UNIX is bytes, has been defined that way and will always be that way. To | Unicode on UNIX is only madness if you force it on everything. But that's not | how Unicode on UNIX works. UNIX does not have a distinction between unicode | and byte APIs. They are one and the same which makes them easy to deal with.] | Python 3 takes a very difference stance on Unicode than UNIX does. Python 3 | says: everything is Unicode ... This may be right... Or it may be the other way round as I claim at http://blog.languager.org/2014/04/unicode-and-unix-assumption.html At this point I dont believe that anyone is very clear what is the right way and and wrong way