Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.011 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'argument': 0.05; 'problem?': 0.07; 'processing.': 0.07; 'utf-8': 0.07; 'python': 0.11; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'non-ascii': 0.16; 'roy': 0.16; 'str()': 0.16; 'strange,': 0.16; 'exception': 0.16; 'wrote:': 0.18; 'stack': 0.19; 'saying': 0.22; 'error': 0.23; 'subject:Code': 0.24; 'unicode': 0.24; 'fine': 0.24; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; 'character': 0.29; 'characters': 0.30; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'included': 0.31; 'code': 0.31; 'obscure': 0.31; 'username': 0.31; 'except': 0.35; 'received:google.com': 0.35; 'surely': 0.36; 'so,': 0.37; 'handle': 0.38; 'to:addr:python- list': 0.38; 'anything': 0.39; 'moving': 0.39; 'to:addr:python.org': 0.39; 'how': 0.40; 'logs': 0.60; 'full': 0.61; 'our': 0.64; 'face': 0.64; '20,': 0.68; 'smith': 0.68; 'skip:n 40': 0.81; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=LBhOufBlDlVhQdxFWSLVxqm278ek3wlT4e3mqEAsTBU=; b=Kc+A02haKHsq4lstiER12+C1RieZWPFY9uZB2Wje7CXCl4AOE9fvOOPjUjArqw0vHK xjaQumms+3AQmPIHqIznJHGsYvAwKimoKm7cE8osY+ZQLB1cakqlw7FVY2/IHfrXeF0m TUKyMlvV5ADhu6dc20NIMxnacKgrq1/j5beFEqVshTajAXCdhBvo8LTTr+kaC4ShmPHX dtGfUsqK4aj4QD4fWuF0YcaKeoe3tPC7mLTRRjSGQrGNmGH+yVDhSPD8aM+lo03t73e+ uOgdDai97w2Xa7IkG5fvcHKB10IsRA3oYYLqYGi3wUkCiOvAnO7TdmPkIArQJ1efD8ZO PBdQ== MIME-Version: 1.0 X-Received: by 10.68.50.226 with SMTP id f2mr3786224pbo.76.1382220602631; Sat, 19 Oct 2013 15:10:02 -0700 (PDT) In-Reply-To: References: <52624e8f$0$29981$c3e8da3$5496439d@news.astraweb.com> <5262b042$0$29981$c3e8da3$5496439d@news.astraweb.com> Date: Sun, 20 Oct 2013 09:10:02 +1100 Subject: Re: Looking for UNICODE to ASCII Conversioni Example Code From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 19 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1382220612 news.xs4all.nl 15947 [2001:888:2000:d::a6]:41503 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:57118 On Sun, Oct 20, 2013 at 3:49 AM, Roy Smith wrote: > So, yesterday, I tracked down an uncaught exception stack in our logs to = a user whose username included the unicode character 'SMILING FACE WITH SUN= GLASSES' (U+1F60E). It turns out, that's perfectly fine as a user name, ex= cept that in one obscure error code path, we try to str() it during some er= ror processing. How is that a problem? Surely you have to deal with non-ASCII characters all the time - how is that particular one a problem? I'm looking at its UTF-8 and UTF-16 representations and not seeing anything strange, unless it's the \x0e in UTF-16 - but, again, you must surely have had to deal with non-ASCII-encoded-whichever-way-you-do-it. Or are you saying that that particular error code path did NOT handle non-ASCII characters? If so, that's a strong argument for moving to Python 3, to get full Unicode support in _all_ branches. ChrisA