Path: csiph.com!usenet.pasdenom.info!news.albasani.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.005 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'python.': 0.02; 'subject:Python': 0.06; 'differently': 0.07; 'string': 0.09; 'integers': 0.09; 'len(x)': 0.09; 'strings.': 0.09; 'windows,': 0.09; 'python': 0.11; "wouldn't": 0.14; '(1,': 0.16; '*any*': 0.16; 'ascii,': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'integers.': 0.16; 'losing': 0.16; 'partly': 0.16; 'string)': 0.16; 'tuple': 0.16; 'unicode.': 0.16; 'wrote:': 0.18; 'first.': 0.19; 'fit': 0.20; 'certainly': 0.24; 'string,': 0.24; 'unicode': 0.24; 'fairly': 0.24; 'mon,': 0.24; 'regardless': 0.24; 'versions': 0.24; 'subject:/': 0.26; 'header:In-Reply-To:1': 0.27; 'array': 0.29; "doesn't": 0.30; 'characters': 0.30; 'compared': 0.30; 'returned': 0.30; 'message-id:@mail.gmail.com': 0.30; 'second,': 0.31; 'tuples': 0.31; 'probably': 0.32; 'entirely': 0.33; 'received:209.85': 0.35; 'received:209.85.220': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'really': 0.36; '2.6': 0.36; 'should': 0.36; 'received:209': 0.37; 'performance': 0.37; 'being': 0.38; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'previous': 0.38; 'ability': 0.39; 'does': 0.39; 'supporting': 0.39; 'to:addr:python.org': 0.39; 'happen': 0.63; 'more': 0.64; 'within': 0.65; 'content,': 0.68; 'fact,': 0.69; 'gain': 0.79; 'bmp,': 0.84; 'comparable': 0.84; 'laid': 0.84; 'penalty': 0.84; 'subject:long': 0.84; 'rusi': 0.91; 'imagine': 0.93; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=Zm+CNc4oAJgvusjVy69nw19tRedxwHxwECJ00jc/Rpw=; b=fDR5i+EWGK7rrw5xlMqAczU6jUSkpUYyTZHbVt1mBZT88Ex7vQ88NdM/jQTPfIEKYw Kh1Ncst0ywHB4YrZolJoyuVDP9CROXIO1gqHfThGno2/Kl5cKAtVmYY5jaTFbeVTi/Fi FQu5p9ZWYNXHHuwI7dq3tXEfALyPCD2Ny/QqmRHnk5sjF5D80+hvD+5ObkTOqw4fageQ 1tFlkD36Rr+OMBPAA7BypNea+6ImT+FMehojXB8SBUW5JwqUsoO6z6ktxlbmlsl5kpQd y58QITC07mFJpxN6Kzpn4DDfdei9chbUv5rwUzOkoVCuaZKhLvJdr55DxMyuNQRVsJZ5 VP/Q== MIME-Version: 1.0 X-Received: by 10.58.75.46 with SMTP id z14mr8111209vev.52.1364795871151; Sun, 31 Mar 2013 22:57:51 -0700 (PDT) In-Reply-To: <6a146aba-a032-4aac-b2d3-7acedcebd804@q3g2000pbv.googlegroups.com> References: <5153a12d$0$29998$c3e8da3$5496439d@news.astraweb.com> <5153d313$0$29984$c3e8da3$5496439d@news.astraweb.com> <0b6be19c-ff11-4e24-a7dc-fec0af411393@kw7g2000pbb.googlegroups.com> <5153f5ce$0$29984$c3e8da3$5496439d@news.astraweb.com> <11ef1d36-0783-4cb2-b29f-9ae573ed7e47@googlegroups.com> <6a146aba-a032-4aac-b2d3-7acedcebd804@q3g2000pbv.googlegroups.com> Date: Mon, 1 Apr 2013 16:57:51 +1100 Subject: Re: Performance of int/long in Python 3 From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 34 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1364795879 news.xs4all.nl 6879 [2001:888:2000:d::a6]:60236 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:42447 On Mon, Apr 1, 2013 at 4:33 PM, rusi wrote: > So I really wonder: Is python losing more by supporting SMP with > performance hit on BMP? If your strings fit entirely within the BMP, then you should see no penalty compared to previous versions of Python. If they happen to fit inside ASCII, then there may well be significant improvements. But regardless, what you gain is the ability to work with *any* string, regardless of its content, without worrying about it. You can count characters regardless of their content. Imagine if a tuple of integers behaved differently if some of those integers flipped to being long ints: x = (1, 2, 4, 8, 1<<30, 1<<300, 1<<10) Wouldn't you be surprised if len(x) returned 8? I certainly would be. And that's what a narrow build of Python does with Unicode. Unicode strings are approximately comparable to tuples of integers. In fact, they can be interchanged fairly readily: string = "Treble clef: \U0001D11E" array = tuple(map(ord,string)) assert(len(array) == 14) out_string = ''.join(map(chr,array)) assert(out_string == string) This doesn't work in Python 2.6 on Windows, partly because of surrogates, but also because chr() isn't designed for Unicode strings. There's probably a solution to the second, but not really to the first. The tuple of ords should match the way the characters are laid out to a human. ChrisA