Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Date: Tue, 30 Jul 2013 17:13:40 +0100
From: MRAB <python@mrabarnett.plus.com>
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/20130620 Thunderbird/17.0.7
MIME-Version: 1.0
To: python-list@python.org
Subject: Re: RE Module Performance
References: <mailman.4618.1373613834.3114.python-list@python.org> <571a6dfe-fd66-42cf-92fc-8b97cbe6e9e4@googlegroups.com> <51DFDE65.5040001@Gmail.com> <CAN1F8qUFP3uX57HhiiUPaYqO3h_HiT8Q_YD=vCYky3EAWsdE7Q@mail.gmail.com> <mailman.4666.1373670835.3114.python-list@python.org> <4f1067f6-bc99-42ad-9166-37fb228b90e8@googlegroups.com> <mailman.5094.1374759404.3114.python-list@python.org> <51f14395$0$29971$c3e8da3$5496439d@news.astraweb.com> <mailman.5106.1374766576.3114.python-list@python.org> <51f15e03$0$29971$c3e8da3$5496439d@news.astraweb.com> <mailman.5127.1374808181.3114.python-list@python.org> <8203e802-9dc5-44c5-9547-6e1947ee224b@googlegroups.com> <mailman.5160.1374890711.3114.python-list@python.org> <f4bb2528-930e-4c0a-820e-66f00ac2b5b6@googlegroups.com> <mailman.5188.1374983652.3114.python-list@python.org> <43ce1b65-9d6d-47dd-b209-9a3bbafc0b8c@googlegroups.com> <51F7CFD1.1090403@rece.vub.ac.be>
In-Reply-To: <51F7CFD1.1090403@rece.vub.ac.be>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
Precedence: list
Reply-To: python-list@python.org
Newsgroups: comp.lang.python
Message-ID: <mailman.5321.1375200818.3114.python-list@python.org>
Lines: 52
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:51578

On 30/07/2013 15:38, Antoon Pardon wrote:
> Op 30-07-13 16:01, wxjmfauth@gmail.com schreef:
>>
>> I am pretty sure that once you have typed your 127504 ascii
>> characters, you are very happy the buffer of your editor does not
>> waste time in reencoding the buffer as soon as you enter an €, the
>> 125505th char. Sorry, I wanted to say z instead of euro, just to
>> show that backspacing the last char and reentering a new char
>> implies twice a reencoding.
>
> Using a single string as an editor buffer is a bad idea in python for
> the simple reason that strings are immutable.

Using a single string as an editor buffer is a bad idea in _any_
language because an insertion would require all the following
characters to be moved.

> So adding characters would mean continuously copying the string
> buffer into a new string with the next character added. Copying
> 127504 characters into a new string will not make that much of a
> difference whether the octets are just copied to octets or are
> unpacked into 32 bit words.
>
>> Somebody wrote "FSR" is just an optimization. Yes, but in case of
>> an editor à la FSR, this optimization take place everytime you
>> enter a char. Your poor editor, in fact the FSR, is finally
>> spending its time in optimizing and finally it optimizes nothing.
>> (It is even worse).
>
> Even if you would do it this way, it would *not* take place every
> time you enter a char. Once your buffer would contain a wide
> character, it would just need to convert the single character that is
> added after each keystroke. It would not need to convert the whole
> buffer after each key stroke.
>
>> If you type correctly a z instead of an €, it is not necessary to
>> reencode the buffer. Problem, you do you know that you do not have
>> to reencode? simple just check it, and by just checking it wastes
>> time to test it you have to optimized or not and hurt a little bit
>> more what is supposed to be an optimization.
>
> Your scenario is totally unrealistic. First of all because of the
> immutable nature of python strings, second because you suggest that
> real time usage would result in frequent conversions which is highly
> unlikely.
>
What you would have is a list of mutable chunks.

Inserting into a chunk would be fast, and a chunk would be split if
it's already full. Also, small adjacent chunks would be joined together.

Finally, a chunk could use FSR to reduce memory usage.