Path: csiph.com!usenet.pasdenom.info!news.etla.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.006 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'subject:Python': 0.06; 'extent': 0.07; 'string': 0.09; 'arrays': 0.09; 'false.': 0.09; 'subject: [': 0.09; 'cc:addr:python-list': 0.11; '*always*': 0.16; 'internally': 0.16; 'only)': 0.16; 'safely.': 0.16; 'truncate': 0.16; 'truncation': 0.16; 'wrote:': 0.18; 'pointed': 0.19; 'thu,': 0.19; 'cc:addr:python.org': 0.22; 'byte': 0.24; 'bytes': 0.24; 'regardless': 0.24; 'cc:2**0': 0.24; 'cc:no real name:2**0': 0.24; "i've": 0.25; 'subject:/': 0.26; 'header:In-Reply-To:1': 0.27; 'point': 0.28; 'am,': 0.29; 'array': 0.29; 'characters': 0.30; 'message-id:@mail.gmail.com': 0.30; 'problem': 0.35; 'received:209.85': 0.35; 'received:209.85.220': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'possible': 0.36; 'received:209': 0.37; 'subject:]': 0.38; 'does': 0.39; 'major': 0.40; 'algorithms': 0.60; 'simply': 0.61; 'more': 0.64; 'to:addr:gmail.com': 0.65; 'mar': 0.68; 'divide': 0.84; 'subject:long': 0.84; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=KinzE0xhVtRRbjlnN2k+BiBWDpx84/w+BX3V7wPKoD8=; b=0xUsCWwNe7Ljag5L6s45BOmhOz/4njMU3zEBqc2M4AthReNKfR/+usEqD+yn2+Xcea Pibr0YlCmwurCFq8W3DyyXsCcYmWOiWw9k2pLSgQFjhOd731Nj+NjjEzWAsf30pBPGzS Mkx6SJC6IhAqg2Q2kQNVbOo+YeHw5+XXQBer4Tdd/vAHKHzkb+EW9Y4yxr+4iIAbn6Dv tPLHgPwoQKhQvzDRxyZ7LmMgCTqnzzR42tnBu9pl++lgsfqyDsstc2O+ba8cNPDut0y2 DZAp1IyUCgrLoq8YL9OJoPgJKgEuMYd1DllEhjPa65hm6PTgLLkNLYBi27XJdV0FqV7+ I1rQ== X-Received: by 10.52.21.47 with SMTP id s15mr10851712vde.8.1364488467897; Thu, 28 Mar 2013 09:34:27 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <0b779c80-4f50-4716-8c30-47755c15f304@m12g2000yqp.googlegroups.com> <5153a12d$0$29998$c3e8da3$5496439d@news.astraweb.com> <987c4bd9-0e5e-4387-9c78-1075a77d3c47@c6g2000yqh.googlegroups.com> From: Ian Kelly Date: Thu, 28 Mar 2013 10:33:46 -0600 Subject: Re: flaming vs accuracy [was Re: Performance of int/long in Python 3] To: jmfauth Content-Type: text/plain; charset=ISO-8859-1 Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 17 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1364488470 news.xs4all.nl 6939 [2001:888:2000:d::a6]:38013 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:42167 On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wrote: > The flexible string representation takes the problem from the > other side, it attempts to work with the characters by using > their representations and it (can only) fails... This is false. As I've pointed out to you before, the FSR does not divide characters up by representation. It divides them up by codepoint -- more specifically, by the *bit-width* of the codepoint. We call the internal format of the string "ASCII" or "Latin-1" or "UCS-2" for conciseness and a point of reference, but fundamentally all of the FSR formats are simply byte arrays of *codepoints* -- you know, those things you keep harping on. The major optimization performed by the FSR is to consistently truncate the leading zero bytes from each codepoint when it is possible to do so safely. But regardless of to what extent this truncation is applied, the string is *always* internally just an array of codepoints, and the same algorithms apply for all representations.