Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <43ce1b65-9d6d-47dd-b209-9a3bbafc0b8c@googlegroups.com>
References: <mailman.4618.1373613834.3114.python-list@python.org> <571a6dfe-fd66-42cf-92fc-8b97cbe6e9e4@googlegroups.com> <51DFDE65.5040001@Gmail.com> <CAN1F8qUFP3uX57HhiiUPaYqO3h_HiT8Q_YD=vCYky3EAWsdE7Q@mail.gmail.com> <mailman.4666.1373670835.3114.python-list@python.org> <4f1067f6-bc99-42ad-9166-37fb228b90e8@googlegroups.com> <mailman.5094.1374759404.3114.python-list@python.org> <51f14395$0$29971$c3e8da3$5496439d@news.astraweb.com> <mailman.5106.1374766576.3114.python-list@python.org> <51f15e03$0$29971$c3e8da3$5496439d@news.astraweb.com> <mailman.5127.1374808181.3114.python-list@python.org> <8203e802-9dc5-44c5-9547-6e1947ee224b@googlegroups.com> <mailman.5160.1374890711.3114.python-list@python.org> <f4bb2528-930e-4c0a-820e-66f00ac2b5b6@googlegroups.com> <mailman.5188.1374983652.3114.python-list@python.org> <43ce1b65-9d6d-47dd-b209-9a3bbafc0b8c@googlegroups.com>
Date: Tue, 30 Jul 2013 15:45:57 +0100
Subject: Re: RE Module Performance
From: Chris Angelico <rosuav@gmail.com>
To: python-list@python.org
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.5312.1375195560.3114.python-list@python.org>
Lines: 51
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:51563

On Tue, Jul 30, 2013 at 3:01 PM,  <wxjmfauth@gmail.com> wrote:
> I am pretty sure that once you have typed your 127504
> ascii characters, you are very happy the buffer of your
> editor does not waste time in reencoding the buffer as
> soon as you enter an =80, the 125505th char. Sorry, I wanted
> to say z instead of euro, just to show that backspacing the
> last char and reentering a new char implies twice a reencoding.

You're still thinking that the editor's buffer is a Python string. As
I've shown earlier, this is a really bad idea, and that has nothing to
do with FSR/PEP 393. An immutable string is *horribly* inefficient at
this; if you want to keep concatenating onto a string, the recommended
method is a list of strings that gets join()d at the end, and the same
technique works well here. Here's a little demo class that could make
the basis for such a system:

class EditorBuffer:
	def __init__(self,fn):
		self.fn=3Dfn
		self.buffer=3D[open(fn).read()]
	def insert(self,pos,char):
		if pos=3D=3D0:
			# Special case: insertion at beginning of buffer
			if len(self.buffer[0])>1024: self.buffer.insert(0,char)
			else: self.buffer[0]=3Dchar+self.buffer[0]
			return
		for idx,part in enumerate(self.buffer):
			l=3Dlen(part)
			if pos>l:
				pos-=3Dl
				continue
			if pos<l:
				# Cursor is somewhere inside this string
				splitme=3Dself.buffer[idx]
				self.buffer[idx:idx+1]=3Dsplitme[:pos],splitme[pos:]
				l=3Dpos
			# Cursor is now at the end of this string
			if l>1024: self.buffer[idx:idx+1]=3Dself.buffer[idx],char
			else: self.buffer[idx]+=3Dchar
			return
		raise ValueError("Cannot insert past end of buffer")
	def __str__(self):
		return ''.join(self.buffer)
	def save(self):
		open(fn,"w").write(str(self))

It guarantees that inserts will never need to resize more than 1KB of
text. As a real basis for an editor, it still sucks, but it's purely
to prove this one point.

ChrisA