Path: csiph.com!usenet.pasdenom.info!news.albasani.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'string.': 0.05; 'indexing': 0.07; 'memory.': 0.07; 'purpose.': 0.07; 'tests.': 0.07; 'string': 0.09; 'arrays': 0.09; 'broke': 0.09; 'consistency': 0.09; 'fixed,': 0.09; 'immutable': 0.09; 'mind,': 0.09; 'pep': 0.09; 'strings.': 0.09; 'python': 0.11; 'bug': 0.12; '2.7': 0.14; '*and': 0.16; 'behave': 0.16; 'cons': 0.16; 'correctness': 0.16; 'devs': 0.16; 'finds': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'hidden,': 0.16; 'immutable,': 0.16; 'keypress': 0.16; 'merely': 0.16; 'rebuilding': 0.16; 'reedy': 0.16; 'relevant.': 0.16; 'underlying': 0.16; 'usage,': 0.16; 'so.': 0.16; 'language': 0.16; 'wrote:': 0.18; 'bit': 0.19; "python's": 0.19; 'thu,': 0.19; 'widget': 0.19; '(the': 0.22; '>>>': 0.22; 'memory': 0.22; 'replace': 0.24; 'unicode': 0.24; 'fine': 0.24; 'question': 0.24; 'source': 0.25; 'equivalent': 0.26; 'least': 0.26; 'header:In- Reply-To:1': 0.27; 'correct': 0.29; 'michael': 0.29; 'chris': 0.29; 'feature': 0.29; 'am,': 0.29; 'compared': 0.30; 'message- id:@mail.gmail.com': 0.30; 'code': 0.31; '25,': 0.31; 'apparently': 0.31; 'correctly.': 0.31; 'overhead': 0.31; 'releases,': 0.31; 'though.': 0.31; 'anyone': 0.31; 'lists': 0.32; 'this.': 0.32; 'option': 0.32; 'quite': 0.32; 'text': 0.33; 'core': 0.34; 'could': 0.34; 'editor': 0.35; 'test': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'really': 0.36; 'idle': 0.36; 'ordered': 0.36; 'doing': 0.36; 'thanks': 0.36; 'possible': 0.36; 'should': 0.36; 'application': 0.37; 'list': 0.37; 'list.': 0.37; 'performance': 0.37; 'represent': 0.38; 'basis.': 0.38; 'to:addr :python-list': 0.38; 'rather': 0.38; 'structure': 0.39; 'to:addr:python.org': 0.39; 'even': 0.60; 'most': 0.60; 'simply': 0.61; "you'll": 0.62; 'name': 0.63; 'such': 0.63; 'different': 0.65; 'details': 0.65; 'worth': 0.66; 'direct': 0.67; 'between': 0.67; 'close': 0.67; 'nobody': 0.68; 'reads': 0.68; 'pro': 0.69; 'below.': 0.71; 'jul': 0.74; 'yourself': 0.78; 'concluding': 0.84; 'different.': 0.84; 'maybe,': 0.84; 'mock': 0.84; 'texts,': 0.84; 'exposing': 0.91; 'imagine': 0.93; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=skpJNRQ118BbjAxm/REobTcFAcwRmvv8zWeHNvTqUgI=; b=H2pvMuHu/OshRd10O1QhgdE94/jTeZaPrQtSd6lE6wffvX/wnW5mS4Ij7Q/5dhaIq0 DmnqwaVL/HcUeHn9l4UJNAX8oSnNJLiVB45GW02afkmznYMkmY0MrYkQSz61R9oK3NjM 7i0O6CVgVYZxs2kadoWxnL8LvtG+6fVwOZaJtCF9IQJT0afDKd4i01RWGqJXe/zm4kUh Q49ijMDlKmMWTbA/L9Sqea01tyGLNJ6A6R453enDwb/52wuor8wz5lNl6zPu07apfSHA Zf+LAUYsC5VJLKza09IhEBvjc0YYWXkkkOUWNIoGi0hsuKLPCLQACFmr6L1vVy6iQeSa l4lA== MIME-Version: 1.0 X-Received: by 10.58.249.236 with SMTP id yx12mr15230381vec.25.1374689742201; Wed, 24 Jul 2013 11:15:42 -0700 (PDT) In-Reply-To: References: <571a6dfe-fd66-42cf-92fc-8b97cbe6e9e4@googlegroups.com> <51DFDE65.5040001@Gmail.com> <4f1067f6-bc99-42ad-9166-37fb228b90e8@googlegroups.com> <51EFEC17.90303@gmail.com> Date: Thu, 25 Jul 2013 04:15:42 +1000 Subject: Re: RE Module Performance From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 67 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1374689751 news.xs4all.nl 15895 [2001:888:2000:d::a6]:40252 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:51159 On Thu, Jul 25, 2013 at 3:52 AM, Terry Reedy wrote: > On 7/24/2013 11:00 AM, Michael Torrie wrote: >> >> On 07/24/2013 08:34 AM, Chris Angelico wrote: >>> >>> Frankly, Python's strings are a *terrible* internal representation >>> for an editor widget - not because of PEP 393, but simply because >>> they are immutable, and every keypress would result in a rebuilding >>> of the string. On the flip side, I could quite plausibly imagine >>> using a list of strings; > > > I used exactly this, a list of strings, for a Python-coded text-only mock > editor to replace the tk Text widget in idle tests. It works fine for the > purpose. For small test texts, the inefficiency of immutable strings is not > relevant. > > Tk apparently uses a C-coded btree rather than a Python list. All details > are hidden, unless one finds and reads the source ;-), but but it uses C > arrays rather than Python strings. > > >>> In this usage, the FSR is beneficial, as it's possible to have >>> different strings at different widths. > > > For my purpose, the mock Text works the same in 2.7 and 3.3+. Thanks for that report! And yes, it's going to behave exactly the same way, because its underlying structure is an ordered list of ordered lists of Unicode codepoints, ergo 3.3/PEP 393 is merely a question of performance. But if you put your code onto a narrow build, you'll have issues as seen below. >> Maybe, but simply thinking logically, FSR and UCS-4 are equivalent in >> pros and cons, > > They both have the pro that indexing is direct *and correct*. The cons are > different. They're close enough, though. It's simply a performance tradeoff - use the memory all the time, or take a bit of overhead to give yourself the option of using less memory. The difference is negligible compared to... >> and the cons of using UCS-2 (the old narrow builds) are >> well known. UCS-2 simply cannot represent all of unicode correctly. > > Python's narrow builds, at least for several releases, were in between USC-2 > and UTF-16 in that they used surrogates to represent all unicodes but did > not correct indexing for the presence of astral chars. This is a nuisance > for those who do use astral chars, such as emotes and CJK name chars, on an > everyday basis. ... this. If nobody had ever thought of doing a multi-format string representation, I could well imagine the Python core devs debating whether the cost of UTF-32 strings is worth the correctness and consistency improvements... and most likely concluding that narrow builds get abolished. And if any other language (eg ECMAScript) decides to move from UTF-16 to UTF-32, I would wholeheartedly support the move, even if it broke code to do so. To my mind, exposing UTF-16 surrogates to the application is a bug to be fixed, not a feature to be maintained. But since we can get the best of both worlds with only a small amount of overhead, I really don't see why anyone should be objecting. ChrisA