Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.026 X-Spam-Evidence: '*H*': 0.95; '*S*': 0.00; 'else:': 0.03; 'string.': 0.05; 'subject:Python': 0.06; 'string': 0.09; 'compact': 0.09; 'function:': 0.09; 'lookup': 0.09; 'pep': 0.09; 'stored': 0.12; 'posted': 0.15; '"it\'s': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'reasonably': 0.16; 'roy': 0.16; 'wrote:': 0.18; 'thu,': 0.19; 'value.': 0.19; 'bytes': 0.24; 'string,': 0.24; 'header': 0.24; 'source': 0.25; 'post': 0.26; 'subject:/': 0.26; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; 'character': 0.29; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'inspect': 0.31; 'url:se': 0.31; 'anyone': 0.31; 'implemented': 0.33; 'received:209.85': 0.35; 'something': 0.35; 'case,': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'false': 0.36; 'possible': 0.36; 'similar': 0.36; 'should': 0.36; 'too': 0.37; 'received:209': 0.37; 'to:addr:python-list': 0.38; 'issue': 0.38; 'bad': 0.39; 'sure': 0.39; 'to:addr:python.org': 0.39; 'enough': 0.39; 'days': 0.60; 'most': 0.60; 'entire': 0.61; 'simple': 0.61; 'first': 0.61; 'url:p': 0.64; 'more': 0.64; 'worth': 0.66; 'here': 0.66; 'determine': 0.67; 'smith': 0.68; 'url:f': 0.68; 'url:a': 0.72; 'bmp,': 0.84; 'it"': 0.84; "it'd": 0.84; 'subject:long': 0.84; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=bcPEnEwfeQCcCnMcMwIWAUGKLRlU/FK1Am7lgO39e08=; b=C0h3cN3yaC1RkggN+NaOk+P8OlEnX0CzaWQg0M/CBzQDbSnWim6dvNGo+O931D7/ij el369xnmhqkqkzW7zTxsAAvhQfzVhHZzGRzJJ8639ZjlsZvW/5dfTneAB665++cJNis4 n2DMgVfStBPXljXJ71Vq/3szPbUuKJbfOFOIesNBLDoRlcACsHESdd7G1OeyfmI8HOul ezK1oXSCPZCebW7Ppaeq4xKs5VM/2QNzUceQs+uA54cikdaPoVp1p1D3KyGZfnugmJz5 yiiA7FHRWW6wnJFHHTUp2jtIUQRMiUfQ3TVpDOhaor6A4H9os5f2zA+9npny4iM/5wbM zOcw== MIME-Version: 1.0 X-Received: by 10.52.16.211 with SMTP id i19mr1144863vdd.91.1364998648141; Wed, 03 Apr 2013 07:17:28 -0700 (PDT) In-Reply-To: References: <87dff083-14d8-4163-89f3-d78a9be6c802@c15g2000vbl.googlegroups.com> <3qadncD4-6fcPsbMnZ2dnUVZ_rqdnZ2d@westnet.com.au> <515bbedb$0$29891$c3e8da3$5496439d@news.astraweb.com> <515be00e$0$29891$c3e8da3$5496439d@news.astraweb.com> Date: Thu, 4 Apr 2013 01:17:28 +1100 Subject: Re: Performance of int/long in Python 3 From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 30 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1364999064 news.xs4all.nl 6967 [2001:888:2000:d::a6]:52316 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:42673 On Thu, Apr 4, 2013 at 12:43 AM, Roy Smith wrote: > This has to inspect the entire string, no? I posted (essentially) this > a few days ago: > > if all(ord(c) <= 0xffff for c in s): > return "it's all bmp" > else: > return "it's got astral crap in it" > > I'm reasonably sure all() is smart enough to stop at the first False > value. Probably, but it still has to scan the body of the string. It'd not be too bad if it's all astral, but if it's all BMP, it has to scan the whole string. In the max() case, it has to scan the whole string anyway, as there's no other way to determine the maximum. I'm thinking here of this function: http://pike.lysator.liu.se/generated/manual/modref/ex/7.2_3A_3A/String/width.html It's implemented as a simple lookup into the header. (Pike strings, like PEP 393 strings, are stored in the most compact way possible - 1, 2, or 4 bytes per character - with a conceptually similar header structure.) Is this something that would be worth having available? Should I post an issue about it? ChrisA more for self-ref than anyone else's: source of Pike's String.width(): http://pike-git.lysator.liu.se/gitweb.cgi?p=pike.git;a=blob;f=src/builtin.cmod;hb=HEAD#l1077