Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Chris Angelico Newsgroups: comp.lang.python Subject: Re: Pyhon 2.x or 3.x, which is faster? Date: Thu, 10 Mar 2016 18:55:52 +1100 Lines: 30 Message-ID: References: <56df6761$0$1588$c3e8da3$5496439d@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: news.uni-berlin.de hxgWCKgJLCU+XMtB/iwauQRb24Cgfy92U0YmBv3vje/g== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'anyway.': 0.04; 'cache': 0.05; 'memory.': 0.05; 'cc:addr:python-list': 0.09; 'implies': 0.09; 'seen,': 0.09; 'subject:which': 0.09; 'python': 0.10; 'thu,': 0.15; '2016': 0.16; '6:30': 0.16; 'build"': 0.16; 'correctness': 0.16; 'correctness.': 0.16; 'distinct': 0.16; 'folding': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'stuff.': 0.16; 'wrote:': 0.16; 'memory': 0.17; 'comparing': 0.18; 'language': 0.19; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'lawrence': 0.22; 'referring': 0.22; '(you': 0.23; 'split': 0.23; 'thus': 0.24; 'written': 0.24; 'header:In-Reply-To:1': 0.24; "i've": 0.25; 'rest': 0.26; 'coding': 0.27; 'handling': 0.27; 'right.': 0.27; 'message-id:@mail.gmail.com': 0.27; 'correct': 0.28; '(although': 0.29; 'equality': 0.29; 'pep': 0.29; 'windows,': 0.29; 'probably': 0.31; "can't": 0.32; 'generally': 0.32; 'help,': 0.32; 'received:google.com': 0.35; 'on,': 0.35; 'text': 0.35; 'easiest': 0.35; 'unicode': 0.35; "isn't": 0.35; 'but': 0.36; 'there': 0.36; 'received:209.85': 0.36; '(and': 0.36; 'subject:?': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'two': 0.37; 'being': 0.37; 'thanks': 0.37; 'received:209.85.213': 0.37; 'received:209': 0.38; 'represent': 0.38; 'mark': 0.40; 'still': 0.40; 'software': 0.40; 'entire': 0.61; 'programs': 0.62; 'skip:n 10': 0.62; 'relatively': 0.63; 'more': 0.63; 'mar': 0.65; '95%': 0.84; 'arabic': 0.84; 'chrisa': 0.84; 'to:none': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc; bh=fH/dP6zJ6i0lH5X7xxZNJeCiMAaTYavshBttSqrbuFM=; b=xjVpEHgUfKuhar7xOpwuW38lEj/43ERsux6T9oxI6WfVUAnPyxBq/c5CwyC/SFdAPC SPW6VyQA8xgI45YKkVhdjEBnJavWg496clIP2QZxUBXkLNG5oLBlIeBUdZKMYCkYhhxZ L4565hyCxXLyxnT7XQPIl71N9gbT51xZKeqJIc1BoVMwxCMJ9SBt8GPzukrEbdvqor3B a6ujyJd3XHCdMo3ey+HK/DfuNvaDoSkuE9qQoMANjg7ZRhrm8lvpR6ni9bJFfJbtIBc6 DAdQCG0aob6ZAhMKuUjC5eGfC6r2jsx0JQ5EbCNBbXobpF8FPJq00rRQM0gidHoy6QTP MBbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:cc; bh=fH/dP6zJ6i0lH5X7xxZNJeCiMAaTYavshBttSqrbuFM=; b=Pow1hr7VU3ehBvygQdGDGsupJeTQkVoycuSp/U6PDOKvDBs5XdxD2l7OfPodxzR7fO pJfbhnIfdhXHt9Tf/zGM+018YVPbz7yWg6RUMIGNNTnptvkCWxaAm0Elk4MqPdsJ1B1j +5OldKYX3R2yAHbOlKLDqjCUxk9lSyBOYGO6DhR4p2DR0FF2tVdzR6AFGnAWWgxxEKjd n2r3hCKBFtcK0sPppjVc6MXR1HTbcKQyIgqRn23UHSKxApsoPq+phEwmkJ+hqTLOisNt QdPF4It7Sw/r3jr7d/CHwec9yJXQQ8dggI+ZhUdtaKhoF5JHkxYs5rDVAqZv28EC7wbF VqOg== X-Gm-Message-State: AD7BkJIPgmnqlKlBMUs87Ff0bPAL45atX168G2mEdZOQNw4bb/L+eeJzjwvrzl3ZEP5AXx2BbBAFPlQ/hRtFAQ== X-Received: by 10.50.59.244 with SMTP id c20mr2302209igr.94.1457596552360; Wed, 09 Mar 2016 23:55:52 -0800 (PST) In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:104482 On Thu, Mar 10, 2016 at 6:30 PM, Mark Lawrence wrote: >> From what I've seen, a lot of software can't get [Unicode] right anyway. >> > > Are you referring to PEP393 having taken notice of the RUE? Even with PEP 393, there's no guarantee that a Python program will get Unicode right. The bytes/text split in Python 3 is a huge help, but proper handling of the entire Unicode range implies more than simply being able to represent all characters (although that's a critical prerequisite). There are design considerations with case folding (tip: it's easiest and safest to be case sensitive), collation/sorting (tip: it's impossible to be perfect unless you know which language is involved), text directionality (you probably know that Arabic is written right-to-left, but are you aware that there are also characters with "weak" directionality, distinct from those with "neutral" directionality?) and so on, plus a bunch of relatively straight-forward coding considerations (eg comparing two strings for equality generally requires NFC/NFC normalization, and might require NFKC/NFKD), which a number of programs still don't get right. PEP 393 actually isn't very much about correctness; a "wide build" of pre-3.3 Python has the correct behaviour, but is wasteful with memory. By removing the temptation to conserve memory using UTF-16, PEP 393 did improve correctness on Windows, but its main focus is on memory efficiency (and thus performance, thanks to cache locality). But hey. Just being able to represent all characters is probably about 95% of Unicode correctness. The rest is the little stuff. ChrisA