Path: csiph.com!usenet.pasdenom.info!news.albasani.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <ksp493$m5g$1@ger.gmane.org>
References: <mailman.4618.1373613834.3114.python-list@python.org> <571a6dfe-fd66-42cf-92fc-8b97cbe6e9e4@googlegroups.com> <51DFDE65.5040001@Gmail.com> <CAN1F8qUFP3uX57HhiiUPaYqO3h_HiT8Q_YD=vCYky3EAWsdE7Q@mail.gmail.com> <mailman.4666.1373670835.3114.python-list@python.org> <4f1067f6-bc99-42ad-9166-37fb228b90e8@googlegroups.com> <CAPTjJmoRUUuQgpjukkctnGe4ydHGYr2CEgD9uDtmBRHSejpW0g@mail.gmail.com> <CA+vVgJVxAHEk_Z4uzZFNYtmEFXmyyoo0Jg_rBeqHSDBuF0E1Ug@mail.gmail.com> <CAPTjJmqBs0PfQ5ahfURtSyCimPnFU-1HpFndO8h6T+Vbkj10Bg@mail.gmail.com> <51EFEC17.90303@gmail.com> <ksp493$m5g$1@ger.gmane.org>
Date: Thu, 25 Jul 2013 04:15:42 +1000
Subject: Re: RE Module Performance
From: Chris Angelico <rosuav@gmail.com>
To: python-list@python.org
Content-Type: text/plain; charset=ISO-8859-1
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.5059.1374689751.3114.python-list@python.org>
Lines: 67
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:51159

On Thu, Jul 25, 2013 at 3:52 AM, Terry Reedy <tjreedy@udel.edu> wrote:
> On 7/24/2013 11:00 AM, Michael Torrie wrote:
>>
>> On 07/24/2013 08:34 AM, Chris Angelico wrote:
>>>
>>> Frankly, Python's strings are a *terrible* internal representation
>>> for an editor widget - not because of PEP 393, but simply because
>>> they are immutable, and every keypress would result in a rebuilding
>>> of the string. On the flip side, I could quite plausibly imagine
>>> using a list of strings;
>
>
> I used exactly this, a list of strings, for a Python-coded text-only mock
> editor to replace the tk Text widget in idle tests. It works fine for the
> purpose. For small test texts, the inefficiency of immutable strings is not
> relevant.
>
> Tk apparently uses a C-coded btree rather than a Python list. All details
> are hidden, unless one finds and reads the source ;-), but but it uses C
> arrays rather than Python strings.
>
>
>>> In this usage, the FSR is beneficial, as it's possible to have
>>> different strings at different widths.
>
>
> For my purpose, the mock Text works the same in 2.7 and 3.3+.

Thanks for that report! And yes, it's going to behave exactly the same
way, because its underlying structure is an ordered list of ordered
lists of Unicode codepoints, ergo 3.3/PEP 393 is merely a question of
performance. But if you put your code onto a narrow build, you'll have
issues as seen below.

>> Maybe, but simply thinking logically, FSR and UCS-4 are equivalent in
>> pros and cons,
>
> They both have the pro that indexing is direct *and correct*. The cons are
> different.

They're close enough, though. It's simply a performance tradeoff - use
the memory all the time, or take a bit of overhead to give yourself
the option of using less memory. The difference is negligible compared
to...

>> and the cons of using UCS-2 (the old narrow builds) are
>> well known.  UCS-2 simply cannot represent all of unicode correctly.
>
> Python's narrow builds, at least for several releases, were in between USC-2
> and UTF-16 in that they used surrogates to represent all unicodes but did
> not correct indexing for the presence of astral chars. This is a nuisance
> for those who do use astral chars, such as emotes and CJK name chars, on an
> everyday basis.

... this. If nobody had ever thought of doing a multi-format string
representation, I could well imagine the Python core devs debating
whether the cost of UTF-32 strings is worth the correctness and
consistency improvements... and most likely concluding that narrow
builds get abolished. And if any other language (eg ECMAScript)
decides to move from UTF-16 to UTF-32, I would wholeheartedly support
the move, even if it broke code to do so. To my mind, exposing UTF-16
surrogates to the application is a bug to be fixed, not a feature to
be maintained. But since we can get the best of both worlds with only
a small amount of overhead, I really don't see why anyone should be
objecting.

ChrisA