Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.007 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'guido': 0.05; 'utf-8': 0.07; 'forcing': 0.09; 'hiding': 0.09; 'imported': 0.09; 'methods,': 0.09; 'oop': 0.09; 'oop,': 0.09; 'url:unicode': 0.09; 'python': 0.11; '2.7': 0.14; '3.3,': 0.16; 'ascii,': 0.16; 'cleanly': 0.16; 'clear.': 0.16; 'codecs': 0.16; 'different,': 0.16; 'effect,': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'italian': 0.16; "object's": 0.16; 'range,': 0.16; 'sequential': 0.16; 'unicode.': 0.16; 'ignore': 0.16; 'wrote:': 0.18; 'programming': 0.22; '(in': 0.22; 'byte': 0.24; 'non': 0.24; 'appreciated': 0.26; 'pass': 0.26; 'subject:/': 0.26; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; 'characters': 0.30; 'moved': 0.30; 'message-id:@mail.gmail.com': 0.30; 'code': 0.31; 'probably': 0.32; 'everyone': 0.33; 'style': 0.33; "i'd": 0.34; 'could': 0.34; 'classes': 0.35; 'convert': 0.35; 'objects': 0.35; 'point.': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'add': 0.35; 'there': 0.35; 'really': 0.36; 'functions.': 0.36; 'should': 0.36; 'too': 0.37; 'to:addr:python-list': 0.38; 'moving': 0.39; 'supporting': 0.39; 'url:2012': 0.39; 'to:addr:python.org': 0.39; 'skip:u 10': 0.60; 'read': 0.60; 'future': 0.60; 'lost': 0.61; 'range': 0.61; "you're": 0.61; 'different': 0.65; 'between': 0.67; 'chinese': 0.74; 'jul': 0.74; 'ending': 0.78; 'url:wordpress': 0.78; '3.4': 0.84; 'articles:': 0.84; 'bmp,': 0.84; 'releases.': 0.91; 'differences': 0.93; 'wanting': 0.93; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=ts7Z6E/WjdvxMlWKpA1sLv7iETUsvNR9Mtiwdv4D2Jo=; b=bFRnih+hEvz4fgIYJrCIg64/QKOHZnR9/U0C9gm1mqaUrMZyj9nVL50tKTMeeRpRmk yaK8A+75xUTxMBSVNsMCb+PbMCoucBaq8LSCbV0zA4TnaVaJ1G8lenlvrtaPAY/xh9jj kpoCjGDgGFJsXfA0JW83r4OWm8gkleXPzJDgzbYZuNelE1X4dhWhkM6bg/kDqdOoFh17 y1xmKIDAf7scNTVDqzEUPdzzSUt0Kb0FJJVD44fbFl9ZKBb6BlwKHoP22BTuQMqw5ur3 ZORS9N7hd4Nvy9MIlWHKRK1n3GL2wetY2zQBX3gB2AIEc6aszfrjnu4UmUrGQi1UQHcd r8GA== MIME-Version: 1.0 X-Received: by 10.220.128.72 with SMTP id j8mr14347151vcs.3.1373305937545; Mon, 08 Jul 2013 10:52:17 -0700 (PDT) In-Reply-To: <7ef8c0e7-7f7c-4a22-89a9-50f62c4a8064@googlegroups.com> References: <7ef8c0e7-7f7c-4a22-89a9-50f62c4a8064@googlegroups.com> Date: Tue, 9 Jul 2013 03:52:17 +1000 Subject: Re: hex dump w/ or w/out utf-8 chars From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 36 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1373305945 news.xs4all.nl 15919 [2001:888:2000:d::a6]:33743 X-Complaints-To: abuse@xs4all.nl Path: csiph.com!usenet.pasdenom.info!news.franciliens.net!feed.ac-versailles.fr!nerim.net!novso.com!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Xref: csiph.com comp.lang.python:50164 On Tue, Jul 9, 2013 at 3:31 AM, wrote: > Unfortunately (as probably I told you before) I will never pass to > Python 3... Guido should not always listen only to gurus like him... > I don't like Python as before...starting from OOP and ending with codecs > like utf-8. Regarding OOP, much appreciated expecially by experts, he > could use python 2 for hiding the complexities of OOP (improving, as an > effect, object's code hiding) moving classes and objects to > imported methods, leaving in this way the programming style to the > well known old style: sequential programming and functions. > About utf-8... the same solution: keep utf-8 but for the non experts, add > methods to convert to solutions which use the range 128-255 of only one > byte (I do not give a damn about chinese and "similia"!...) > I know that is a lost battle (in italian "una battaglia persa")! Well, there won't be a Python 2.8, so you really should consider moving at some point. Python 3.3 is already way better than 2.7 in many ways, 3.4 will improve on 3.3, and the future is pretty clear. But nobody's forcing you, and 2.7.x will continue to get bugfix/security releases for a while. (Personally, I'd be happy if everyone moved off the 2.3/2.4 releases. It's not too hard supporting 2.6+ or 2.7+.) The thing is, you're thinking about UTF-8, but you should be thinking about Unicode. I recommend you read these articles: http://www.joelonsoftware.com/articles/Unicode.html http://unspecified.wordpress.com/2012/04/19/the-importance-of-language-level-abstract-unicode-strings/ So long as you are thinking about different groups of characters as different, and wanting a solution that maps characters down into the <256 range, you will never be able to cleanly internationalize. With Python 3.3+, you can ignore the differences between ASCII, BMP, and SMP characters; they're all just "characters". Everything works perfectly with Unicode. ChrisA