Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'encoding': 0.05; 'run- time': 0.05; 'subject:Python': 0.06; 'referring': 0.07; 'utf-8': 0.07; 'string': 0.09; 'armin': 0.09; 'ascii': 0.09; 'facts': 0.09; 'measure': 0.09; 'skip:t 60': 0.09; 'things,': 0.09; 'cc:addr :python-list': 0.11; 'python': 0.11; 'jan': 0.12; 'question.': 0.14; '2.7:': 0.16; '3.3,': 0.16; 'attempted': 0.16; 'complainers': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'function?': 0.16; 'hurts': 0.16; 'low.': 0.16; 'subject:More': 0.16; 'subject:Unicode': 0.16; 'wrote:': 0.18; 'thu,': 0.19; 'examples': 0.20; 'meant': 0.20; 'help.': 0.21; '>>>': 0.22; 'programming': 0.22; 'cc:addr:python.org': 0.22; 'helper': 0.24; 'unicode': 0.24; "haven't": 0.24; 'cc:2**0': 0.24; 'right.': 0.26; 'suggested': 0.26; 'post': 0.26; 'header:In-Reply- To:1': 0.27; 'am,': 0.29; "doesn't": 0.30; 'message- id:@mail.gmail.com': 0.30; 'code': 0.31; 'easier': 0.31; 'overhead': 0.31; 'subject:About': 0.31; 'could': 0.34; 'problem': 0.35; 'common': 0.35; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'done': 0.36; 'list': 0.37; 'performance': 0.37; 'requiring': 0.38; 'explain': 0.39; 'extremely': 0.39; 'skip:8 10': 0.39; 'sure': 0.39; 'improved': 0.60; 'subject:"': 0.60; 'real': 0.63; 'such': 0.63; 'great': 0.65; 'internet': 0.71; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type:content-transfer-encoding; bh=hFknGshb7WVxbKoStLdJBdU8Ex2FRAhTDK4mJ+/gu/E=; b=gjUt0Jy2FRCAHtq5qNWwmBOTQbumVRFt2j65UCRZSro9YlsNQOkN+uO2UzGDFzG2ga jrKmsklfpuA+Jt32i3HzJMTiuS5J0W24FIXp372WA8iRlNcJoLxWJzncv2pzhvxXPLBp gu2REnLEjxbCf49njxSJMBTFSkBReRWZ6SmAjsUWU4AvNCOYX8cTznjNVkxUOVNg7bMY ZQQe7kazgE6iTgo58uDJ3wz1D7CwhqyEQ48Rnw46gN+fJNK8RFVST46CMIq/eBThUEp4 pcbfYTRti45uJJtBQdtkV4W0D15pNbLijBtHfNMtJuNXxpOQOYJTwGKTXkV5oe2aDApN vHUw== MIME-Version: 1.0 X-Received: by 10.68.108.194 with SMTP id hm2mr35934pbb.22.1389224738089; Wed, 08 Jan 2014 15:45:38 -0800 (PST) In-Reply-To: References: Date: Thu, 9 Jan 2014 10:45:37 +1100 Subject: Re: "More About Unicode in Python 2 and 3" From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 32 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1389224741 news.xs4all.nl 2915 [2001:888:2000:d::a6]:42067 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:63537 On Thu, Jan 9, 2014 at 10:34 AM, wrote: > I just meant to say that internet programming using ASCII urls is so comm= on and important that it hurts that Python 3 makes it so much harder. It su= re would be great if Python 3 could be improved to allow such programming t= o be done using ASCII urls without requiring all the unicode overhead. > > Armin is right. Calling his post a rant doesn't help. There's one big problem with that theory. We've been looking, on this list and on python-ideas, at some practical suggestions for adding something to Py3 that will help. So far, lots of people have suggested things, and the complainers haven't attempted to explain what they actually need. Hard facts and examples would help enormously. Incidentally, before referring to "all the Unicode overhead", it would help to actually measure the overhead of encoding and decoding. Python 2.7: >>> timeit.timeit("a.encode().decode()","a=3Du'a'*1000",number=3D500000) 8.787162614242874 Python 3.4: >>> timeit.timeit("a.encode().decode()","a=3Du'a'*1000",number=3D500000) 1.7354552045022515 Since 3.3, the cost of UTF-8 encoding/decoding an all-ASCII string is extremely low. So the real cost isn't in run-time performance but in code complexity. Would it be easier to work with ASCII URLs with a one-letter-name helper function? I never got an answer to that question. ChrisA