Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Chris Angelico Newsgroups: comp.lang.python Subject: Re: The Cost of Dynamism (was Re: Pyhon 2.x or 3.x, which is faster?) Date: Tue, 22 Mar 2016 02:45:59 +1100 Lines: 29 Message-ID: References: <56e44258$0$1598$c3e8da3$5496439d@news.astraweb.com> <8737rvxs89.fsf@elektro.pacujo.net> <56e7483d$0$1608$c3e8da3$5496439d@news.astraweb.com> <56effbc1$0$1622$c3e8da3$5496439d@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: news.uni-berlin.de AZMiIdjOhwrksOKrwymA9gpqA+lGUhAnGJdXlnkGeoEQ== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.003 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'source.': 0.05; '21,': 0.07; 'cc:addr:python-list': 0.09; '22,': 0.09; 'dict': 0.09; 'lookup': 0.09; 'subject:which': 0.09; 'python': 0.10; '11:59': 0.16; '2016': 0.16; 'code?': 0.16; 'defined;': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'subject:?)': 0.16; 'wrote:': 0.16; 'language': 0.19; '>>>': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'saying': 0.22; '3.x': 0.22; 'ascii': 0.22; 'fraction': 0.22; 'parser': 0.22; 'am,': 0.23; 'defined': 0.23; 'sets': 0.23; 'header:In-Reply-To:1': 0.24; 'mon,': 0.24; 'chris': 0.26; 'earlier': 0.27; 'define': 0.27; 'message- id:@mail.gmail.com': 0.27; 'idea': 0.28; 'character.': 0.29; 'character': 0.29; 'allows': 0.30; "i'm": 0.30; 'code': 0.30; 'table': 0.32; 'source': 0.33; "d'aprano": 0.33; 'indexed': 0.33; 'steven': 0.33; 'definition': 0.34; 'tue,': 0.34; 'received:google.com': 0.35; 'could': 0.35; 'unicode': 0.35; 'should': 0.36; 'needed': 0.36; 'there': 0.36; 'received:209.85': 0.36; '(and': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'received:209.85.213': 0.37; 'received:209': 0.38; 'test': 0.39; 'does': 0.39; 'still': 0.40; 'forget': 0.60; 'your': 0.60; 'subject:The': 0.61; 'skip:u 10': 0.61; 'skip:n 10': 0.62; 'more': 0.63; 'mar': 0.65; 'chrisa': 0.84; 'correctly?': 0.84; 'expect.': 0.84; 'pardon': 0.84; 'to:none': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc; bh=mH+PETtHR36z7TJ7IjfeEn4q2iVEMaLBqiwsIxnd8kc=; b=O+yzTApEAusY7c8662n+E2ZYCmPkDgM2J8QzFM4TiobVZfu0ovGK8RUHq82CDDNEM9 TQnz6VxgwWexhy87VbdurT4r7J7Nw69Gc3BB7l8Le5GLXGR60ceOnxjJhtxtcciI+Vc9 7nuqTbLfsGi49gKIQlf5DoefgODntLhDvBnm2E7JH/509Au9s9ZuZC48atyLzuoEQU0W fZ1YloV0hE3QFtMKbqJJTDRMtsFe1x7urtqwg+nlkSFx6fpdASHVdxxsTG2N92f1C5ZE SD52Tx299lyXpJDK1F+Uq0pNckrng0MoTtjqViQB+nJ1k8/12lSchUvrj5qs387E76nK 1QLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:cc; bh=mH+PETtHR36z7TJ7IjfeEn4q2iVEMaLBqiwsIxnd8kc=; b=MZQFhyht1+KMfItgeqCbgS20JpBZH1y0Vyf4/IYNui6xiegYrf0R0A0jEtBK0ngHYL np0h8mtarl8ptGsq5SP3Oc66Km6Rhqu+r3OyyPYox5sjUcdH+PyGuPDctQHFuAMgnd81 bkmQP/9hwlntYOhzwXLj50gtzTcQ75o//j/Gc1H0si9rcJw/3+f5WS7VGrbVq2zSEbPe tXrwWqTp+tVxVXE/s/KjG9FTnFEQv3b/GkRp3p1wWvDYy/7+nRZeKwOBzbbTzUzR/lWx hYVf+Ne3OcNSJ5/FxUalQglAqyt49ps2WrFDv/Q9DzTQ/Mkc9wMBRUvL9XmzaMZSDBsb zy4Q== X-Gm-Message-State: AD7BkJIsypkx2NUadS0N/XqAvaWnuG1zia68fHF7LkEsdbRIO3dgMIl3eLM4HIn69kqAI6h1IhJ4UmzaBQNb7A== X-Received: by 10.50.138.233 with SMTP id qt9mr13202042igb.13.1458575159812; Mon, 21 Mar 2016 08:45:59 -0700 (PDT) In-Reply-To: <56effbc1$0$1622$c3e8da3$5496439d@news.astraweb.com> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:105365 On Tue, Mar 22, 2016 at 12:48 AM, Steven D'Aprano wrote: > On Mon, 21 Mar 2016 11:59 pm, Chris Angelico wrote: > >> On Mon, Mar 21, 2016 at 11:34 PM, BartC wrote: >>> For Python I would have used a table of 0..255 functions, indexed by the >>> ord() code of each character. So all 52 letter codes map to the same >>> name-handling function. (No Dict is needed at this point.) >>> >> >> Once again, you forget that there are not 256 characters - there are >> 1114112. (Give or take.) > > Pardon me, do I understand you correctly? You're saying that the C parser is > Unicode-aware and allows you to use Unicode in C source code? Because > Bart's test is for a (simplified?) C tokeniser, and expecting his tokeniser > to support character sets that C does not would be, well, Not Cricket, my > good chap. We nutted part of this out earlier in the thread; Python 3.x code is, and any other modern language should be, defined to have Unicode source. (And yes, MRAB, I'm aware that only a tiny fraction of codepoints are defined; it's still a lot more than 256, and going to make for a far larger lookup table.) While you could plausibly define that your source code consists only of printable ASCII characters (eg 09,10,13,32-126), it is an extremely bad idea to declare that it has 256 possibilities - you're shackling your language to a parser definition that includes both more and less than people will expect. ChrisA