Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'subject:not': 0.03; 'from:addr:yahoo.co.uk': 0.04; 'subject:Python': 0.06; '*not*': 0.07; 'developer.': 0.07; 'encoded': 0.07; 'filing': 0.07; 'utf-8': 0.07; 'variables': 0.07; 'armin': 0.09; 'lawrence': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'wrong,': 0.09; 'python': 0.11; 'language.': 0.14; 'posted': 0.15; 'encoding.': 0.16; 'objects.': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'simple.': 0.16; 'ssh': 0.16; 'stdout': 0.16; 'subject:Unicode': 0.16; 'those,': 0.16; 'url:pocoo': 0.16; 'language': 0.16; 'wrote:': 0.18; 'developer,': 0.19; 'written': 0.21; 'machine': 0.22; '>>>': 0.22; 'example': 0.22; 'network,': 0.22; 'header:User-Agent:1': 0.23; "aren't": 0.24; 'bytes': 0.24; 'unicode': 0.24; 'mon,': 0.24; '---': 0.24; 'values': 0.27; 'header:X-Complaints-To:1': 0.27; 'header:In- Reply-To:1': 0.27; 'url:bugs': 0.29; "we'd": 0.29; 'then.': 0.30; 'code': 0.31; '+0100,': 0.31; "d'aprano": 0.31; 'steven': 0.31; 'file': 0.32; 'text': 0.33; 'url:python': 0.33; 'sense': 0.34; 'core': 0.34; 'test': 0.35; 'but': 0.35; 'version': 0.36; 'disk': 0.36; "he's": 0.36; 'surely': 0.36; 'url:org': 0.36; 'needed': 0.38; 'to:addr:python-list': 0.38; 'files': 0.38; 'rather': 0.38; 'anything': 0.39; 'extremely': 0.39; 'url:12': 0.39; 'to:addr:python.org': 0.39; 'either': 0.39; 'received:org': 0.40; 'read': 0.60; 'everybody': 0.60; 'guy': 0.60; 'problems.': 0.60; 'most': 0.60; 'free': 0.61; 'url:5': 0.61; 'viruses': 0.61; 'simple': 0.61; 'show': 0.63; 'protection': 0.63; 'our': 0.64; 'antivirus': 0.68; 'transfer': 0.82; 'subject:know': 0.84; 'touched': 0.84; 'url:2014': 0.84; 'subject:you': 0.87; 'subject:want': 0.91 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Mark Lawrence Subject: Re: Everything you did not want to know about Unicode in Python 3 Date: Tue, 13 May 2014 03:33:02 +0100 References: <8P7cv.78617$Sp6.8377@fx15.am4> <537172eb$0$29980$c3e8da3$5496439d@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Gmane-NNTP-Posting-Host: host-78-147-28-1.as13285.net User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 In-Reply-To: <537172eb$0$29980$c3e8da3$5496439d@news.astraweb.com> X-Antivirus: avast! (VPS 140512-4, 12/05/2014), Outbound message X-Antivirus-Status: Clean X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 65 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1399948395 news.xs4all.nl 2874 [2001:888:2000:d::a6]:53993 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:71422 On 13/05/2014 02:18, Steven D'Aprano wrote: > On Mon, 12 May 2014 17:47:48 +0000, alister wrote: > >> On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote: >> >>> This was *NOT* written by our resident unicode expert >>> http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ >>> >>> Posted as I thought it would make a rather pleasant change from >>> interminable threads about names vs values vs variables vs objects. >> >> Surely those example programs are not the pythonoic way to do things or >> am i missing something? > > Feel free to show us your version of "cat" for Python then. Feel free to > target any version you like. Don't forget to test it against files with > names and content that: > > - aren't valid UTF-8; > > - are valid UTF-8, but not valid in the local encoding. > > > >> if those code samples are anything to go by this guy makes JMF look >> sensible. > > Armin Ronacher is an extremely experienced and knowledgeable Python > developer, and a Python core developer. He might be wrong, but he's not > *obviously* wrong. > > Unicode is hard, not because Unicode is hard, but because of legacy > problems. I can create a file on a machine that uses ISO-8859-7 for the > file name, put JShift-JIS encoded text inside it, transfer it to a > machine that uses Windows-1251 as the file system encoding, then SSH into > that machine from a system using Big5, and try to make sense of it. If > everybody used UTF-8 any time data touched a disk or network, we'd be > laughing. It would all be so simple. > > Reading Armin's post, I think that all that is needed to simplify his > Python 3 version is: > > - have a bytes version of sys.argv (bargv? argvb?) and read > the file names from that; > > - have a simple way to write bytes to stdout and stderr. > > Most programs won't need either of those, but file system utilities will. > I think http://bugs.python.org/issue8776 and http://bugs.python.org/issue8775 are relevant but both were placed in the small round filing cabinet. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com