Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.002 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'subject:Python': 0.06; 'utf-8': 0.07; 'string': 0.09; 'ascii': 0.09; 'interpreted': 0.09; 'optimizing': 0.09; 'subject: [': 0.09; '(something': 0.16; 'bits.': 0.16; 'character.': 0.16; 'encodings': 0.16; 'optionally': 0.16; 'pairs': 0.16; 'string)': 0.16; 'surrogate': 0.16; 'three:': 0.16; 'utf8': 0.16; 'variable.': 0.16; 'which,': 0.16; 'wrote:': 0.18; 'result.': 0.19; 'thu,': 0.19; 'fit': 0.20; 'byte': 0.24; 'pointer': 0.24; 'string,': 0.24; 'header': 0.24; 'options': 0.25; 'subject:/': 0.26; 'header:In-Reply-To:1': 0.27; 'chris': 0.29; 'am,': 0.29; 'points': 0.29; 'message- id:@mail.gmail.com': 0.30; "i'm": 0.30; '(which': 0.31; 'struct': 0.31; 'received:209.85': 0.35; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'detail': 0.37; 'two': 0.37; 'received:209': 0.37; 'subject:]': 0.38; 'needed': 0.38; 'to:addr:python-list': 0.38; 'sure': 0.39; 'to:addr:python.org': 0.39; 'according': 0.40; 'length': 0.61; 'course': 0.61; 'field': 0.63; 'kept': 0.65; 'mar': 0.68; 'optimized': 0.68; 'subject:long': 0.84; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=CMbI9XCqaWaMZxk2OMep3GWm+XS1Bg3yxxk5bXP15f4=; b=apjg3ZZR4eSD8UQCROa6yvimGQ59T3+Bv2wjKeVbhGKFDsbX4Kkjf9KslMlo25RmUP 5YMV6vJTAbcHGDahNddfAIqTngR9jYW2NyOPlnaEuMd3wP0s01/3IIz8CrU7LSZo+Tcf fmbI0K9vOWJCQEffAJEtMo1dtvYsMIy2jbslYbgWMb9GxnnOOoWwth0TbTXwHEzN8WIL TCi/OFcXu6KlL5/8SOKlhyzjGeW+HLBQo1egXKZd7NATf4VU/yEsarhVwR/aEh5+rHRi gq4fpBquFeUX9sgw1ICJT9ev5YBHgn2pQZojJaKhuJxJ1eUdtcruhvsiuPWaG8VnBceT hoxw== X-Received: by 10.52.76.103 with SMTP id j7mr23988335vdw.90.1364487159207; Thu, 28 Mar 2013 09:12:39 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <0b779c80-4f50-4716-8c30-47755c15f304@m12g2000yqp.googlegroups.com> <5153a12d$0$29998$c3e8da3$5496439d@news.astraweb.com> <987c4bd9-0e5e-4387-9c78-1075a77d3c47@c6g2000yqh.googlegroups.com> <51543f45$0$29998$c3e8da3$5496439d@news.astraweb.com> <944f195c-cbfe-47e1-a963-05fe3d98238d@5g2000yqz.googlegroups.com> From: Ian Kelly Date: Thu, 28 Mar 2013 10:11:59 -0600 Subject: Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] To: Python Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 23 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1364487167 news.xs4all.nl 6892 [2001:888:2000:d::a6]:45483 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:42164 On Thu, Mar 28, 2013 at 8:38 AM, Chris Angelico wrote: > PEP393 strings have two optimizations, or kinda three: > > 1a) ASCII-only strings > 1b) Latin1-only strings > 2) BMP-only strings > 3) Everything else > > Options 1a and 1b are almost identical - I'm not sure what the detail > is, but there's something flagging those strings that fit inside seven > bits. (Something to do with optimizing encodings later?) Both are > optimized down to a single byte per character. The only difference for ASCII-only strings is that they are kept in a struct with a smaller header. The smaller header omits the utf8 pointer (which optionally points to an additional UTF-8 representation of the string) and its associated length variable. These are not needed for ASCII-only strings because an ASCII string can be directly interpreted as a UTF-8 string for the same result. The smaller header also omits the "wstr_length" field which, according to the PEP, "differs from length only if there are surrogate pairs in the representation." For an ASCII string, of course there would not be any surrogate pairs.