Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <7xfw7j3a1x.fsf@ruckus.brouhaha.com>
References: <f801e06f-f7b2-4aca-b352-66856a939746@googlegroups.com> <308df2af-abe7-4043-b199-0a39f440e0ab@googlegroups.com> <502f8a2a$0$29978$c3e8da3$5496439d@news.astraweb.com> <7xehn4vyya.fsf@ruckus.brouhaha.com> <mailman.3477.1345337181.4697.python-list@python.org> <7xfw7j3a1x.fsf@ruckus.brouhaha.com>
Date: Sun, 19 Aug 2012 12:19:00 +1000
Subject: Re: How do I display unicode value stored in a string variable using ord()
From: Chris Angelico <rosuav@gmail.com>
To: python-list@python.org
Content-Type: text/plain; charset=ISO-8859-1
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.3479.1345342743.4697.python-list@python.org>
Lines: 19
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:27338

On Sun, Aug 19, 2012 at 12:11 PM, Paul Rubin <no.email@nospam.invalid> wrote:
> Chris Angelico <rosuav@gmail.com> writes:
>> UTF-8 is highly inefficient for indexing. Given a buffer of (say) a
>> few thousand bytes, how do you locate the 273rd character?
>
> How often do you need to do that, as opposed to traversing the string by
> iteration?  Anyway, you could use a rope-like implementation, or an
> index structure over the string.

Well, imagine if Python strings were stored in UTF-8. How would you slice it?

>>> "asdfqwer"[4:]
'qwer'

That's a not uncommon operation when parsing strings or manipulating
data. You'd need to completely rework your algorithms to maintain a
position somewhere.

ChrisA