Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.005 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'subject:not': 0.03; 'encoding': 0.05; 'pop': 0.05; 'say,': 0.05; 'subject:Python': 0.06; 'utf-8': 0.07; 'string': 0.09; 'ambiguity': 0.09; 'boundaries': 0.09; 'broke': 0.09; 'mess': 0.09; 'cc:addr:python- list': 0.11; 'python': 0.11; 'assume': 0.14; '>>': 0.16; '8bit%:26': 0.16; 'calendar,': 0.16; 'concert': 0.16; 'encodings': 0.16; 'from:addr:pobox.com': 0.16; 'from:addr:skip': 0.16; 'inputs': 0.16; 'necessary).': 0.16; 'needless': 0.16; 'non- ascii': 0.16; 'subject:Unicode': 0.16; 'ignore': 0.16; 'sender:addr:gmail.com': 0.17; 'wrote:': 0.18; 'year,': 0.18; 'all,': 0.19; 'app': 0.19; 'else,': 0.19; 'email addr:gmail.com>': 0.22; 'saying': 0.22; 'cc:addr:python.org': 0.22; 'bytes': 0.24; 'certainly': 0.24; 'skip': 0.24; 'text.': 0.24; 'cc:2**0': 0.24; '>': 0.26; 'least': 0.26; 'header:In- Reply-To:1': 0.27; 'tried': 0.27; 'idea': 0.28; 'chris': 0.29; 'am,': 0.29; 'compared': 0.30; 'converting': 0.30; 'message- id:@mail.gmail.com': 0.30; "i'm": 0.30; 'went': 0.31; 'easier': 0.31; 'that.': 0.31; '(my': 0.31; '13,': 0.31; 'convenience': 0.31; 'allows': 0.31; 'probably': 0.32; 'stuff': 0.32; 'text': 0.33; 'plain': 0.33; 'trouble': 0.34; 'common': 0.35; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'really': 0.36; 'maintained': 0.36; "didn't": 0.36; 'skip:& 10': 0.38; 'gmail': 0.38; 'little': 0.38; 'anything': 0.39; 'does': 0.39; 'is.': 0.60; 'then,': 0.60; 'challenge': 0.61; 'matter': 0.61; 'first': 0.61; 'developed': 0.63; 'kind': 0.63; 'skip:\xe2 10': 0.65; 'to:addr:gmail.com': 0.65; 'life': 0.66; 'virus:src="cid:': 0.66; 'content-type:multipart/related': 0.67; 'between': 0.67; 'production': 0.68; 'six': 0.68; 'approaches': 0.68; '8bit%:21': 0.69; 'containing': 0.69; 'online': 0.71; 'day': 0.76; 'age': 0.80; 'cities.': 0.84; 'much,': 0.84; 'subject:know': 0.84; 'tended': 0.84; 'western': 0.86; 'subject:you': 0.87; 'faced': 0.91; 'subject:want': 0.91; 'thing,': 0.91; 'tough': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=GwLfOsacgWwYcP0pTlGc6OCWcRLQngmy/O44L9SXLww=; b=i+NtFhyMkx+kUWXEZ2hj+GS4TygRwzfFmNK1I7X4kYC0cfaaxpMpWqfIGOaxEferT7 TfN/3xwLM/owA2Vsedzn2hVYhiC3PP11AuSOhwJJp5C/MkcUrUAA9CCzFbznY2eFAtuw UeIuGZp7UBkCSAqBGysAbAAlfXDv4T2MCXG9d3hPJ6hjJNyP2JwSqtb8O1YrutARC4Tq JrEOFcQLiN+u7WfL/71+cGoX9gwr+KQ2CTVrPyWSZCTit4+IZpRZl0aVh2v3p7T3Nc/B dGh+y+uhcGn+R/2R024MTZMXCxIbIXy3CecbUz+I5+KkaFgFCap4aeJ3V3nJKmlBcezZ KyNw== MIME-Version: 1.0 X-Received: by 10.50.28.101 with SMTP id a5mr57296395igh.46.1399989749683; Tue, 13 May 2014 07:02:29 -0700 (PDT) Sender: skip.montanaro@gmail.com In-Reply-To: References: <8P7cv.78617$Sp6.8377@fx15.am4> <537172eb$0$29980$c3e8da3$5496439d@news.astraweb.com> <87tx8uccgd.fsf@elektro.pacujo.net> Date: Tue, 13 May 2014 09:02:29 -0500 X-Google-Sender-Auth: lvYTpfM0Apr24nn3X52WJjQ8_Tg Subject: Re: Everything you did not want to know about Unicode in Python 3 From: Skip Montanaro To: Chris Angelico Content-Type: multipart/related; boundary=089e0158aa2cb82c6c04f948834a Cc: "python-list@python.org" X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 108 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1399989757 news.xs4all.nl 2861 [2001:888:2000:d::a6]:46519 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:71493 --089e0158aa2cb82c6c04f948834a Content-Type: multipart/alternative; boundary=089e0158aa2cb82c6a04f9488349 --089e0158aa2cb82c6a04f9488349 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Tue, May 13, 2014 at 3:38 AM, Chris Angelico wrote: >> Python 2's ambiguity allows me not to answer the tough philosophical >> questions. I'm not saying it's necessarily a good thing, but it has its >> benefits. > > It's not a good thing. It means that you have the convenience of > pretending there's no problem, which means you don't notice trouble > until something happens... and then, in all probability, your app is > in production and you have no idea why stuff went wrong. BITD, when I still maintained and developed Musi-Cal (an early online concert calendar, long since gone), I faced a challenge when I first started encountering non-ASCII band names and cities. I resisted UTF-8. After all, if I printed a string containing an "=C3=A9", it came out lookin= g like What kind of mess was that??? I tried to ignore it, or assume Latin-1 would cover all the bases (my first non-ASCII inputs tended to come from Western Europe). If nothing else, at least "=C3=A9" was legible. Needless to say, those approaches didn't work well. After perhaps six months or a year, I broke down and started converting everything coming in =E2=80=8B or going out=E2=80=8B to UTF-8 at the boundaries of my system (making educated guesses at =E2=80=8Binput encodings if necessary). My life got a whole lot easier after that. The distinction between bytes and text didn't really matter much, certainly not compared to the mess I had before where strings of unknown data leaked into my system and its database. Skip =E2=80=8BP.S. My apologies for the mess this message probably is. Amazing a= s it may seem, Gmail in Chrome does a crappy job editing anything other than plain text. Also, I'm surprised in this day and age that common tools like Gnome Terminal have little or no encoding support. I wound up having to pop up urxvt to get an encodings-flexible terminal emulator...=E2=80=8B --089e0158aa2cb82c6a04f9488349 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Tue= , May 13, 2014 at 3:38 AM, Chris Angelico <rosuav@gmail.com> wrote:
>> Python 2's ambiguity= allows me not to answer the tough philosophical
>> questions. I'm not saying it's necessarily a good thing, b= ut it has its
>> benefits.
>
> It's not a good thi= ng. It means that you have the convenience of
> pretending there'= s no problem, which means you don't notice trouble
> until something happens... and then, in all probability, your app is> in production and you have no idea why stuff went wrong.

BITD= , when I still maintained and developed Musi-Cal (an early online concert c= alendar, long since gone), I faced a challenge when I first started encount= ering non-ASCII band names and cities. I resisted UTF-8. After all, if I pr= inted a string containing an "=C3=A9", it came out looking like
=C2=A0

What kind of mess was that???

I tried to ignore it,= or assume Latin-1 would cover all the bases (my first non-ASCII inputs ten= ded to come from Western Europe).=C2=A0If nothing else, at least "=C3= =A9"=C2=A0was legible.

Needless to say, those approaches didn't work well. After perhaps s= ix months or a year, I broke down and started converting everything coming = in
=E2=80=8B or going = out=E2=80=8B
to UTF-8 at the boundaries of my system (making educated guesses at
=E2=80=8Binput
=C2=A0e= ncodings if necessary). My life got a whole lot easier after that. The dist= inction between bytes and text didn't really matter much, certainly not= compared to the mess I had before where strings of unknown data leaked int= o my system and its database.

Skip

=
=E2=80=8BP.S. My apologies for the mess this message probably is. = Amazing as it may seem, Gmail in Chrome does a crappy job editing anything = other than plain text. Also, I'm surprised in this day and age that com= mon tools like Gnome Terminal have little or no encoding support. I wound u= p having to pop up urxvt to get an encodings-flexible terminal emulator...= =E2=80=8B

--089e0158aa2cb82c6a04f9488349-- --089e0158aa2cb82c6c04f948834a Content-Type: image/png; name="e.png" Content-Disposition: inline; filename="e.png" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: ii_hv59ld5o1_145f5e04393bb0d9 iVBORw0KGgoAAAANSUhEUgAAABAAAAAPCAAAAADIGBDzAAAABGdBTUEAALGPC/xhBQAAAAFzUkdC AK7OHOkAAAAgY0hSTQAAeiYAAICEAAD6AAAAgOgAAHUwAADqYAAAOpgAABdwnLpRPAAAAAJiS0dE AP+Hj8y/AAAACW9GRnMAAAK9AAAEoABC77N5AAAACXBIWXMAAAsSAAALEgHS3X78AAAACXZwQWcA AAlgAAAGQAB8SsQ3AAAAP0lEQVQI142PyREAIAjE0qb9FwIix6w/kAdmCCjH/mAJuMeABnl/hSgU LyWMSFUYJW3pYCY0iNAtXc47Cmz/4mhotoZh2acgAAAAAElFTkSuQmCC --089e0158aa2cb82c6c04f948834a--