Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder2.enfer-du-nord.net!newsfeed.eweka.nl!eweka.nl!feeder3.eweka.nl!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <7x4nnzmhbn.fsf@ruckus.brouhaha.com>
References: <f801e06f-f7b2-4aca-b352-66856a939746@googlegroups.com> <308df2af-abe7-4043-b199-0a39f440e0ab@googlegroups.com> <502f8a2a$0$29978$c3e8da3$5496439d@news.astraweb.com> <7xehn4vyya.fsf@ruckus.brouhaha.com> <mailman.3477.1345337181.4697.python-list@python.org> <7xfw7j3a1x.fsf@ruckus.brouhaha.com> <mailman.3479.1345342743.4697.python-list@python.org> <7xtxvzehhb.fsf@ruckus.brouhaha.com> <50309d69$0$29978$c3e8da3$5496439d@news.astraweb.com> <7x4nnzmhbn.fsf@ruckus.brouhaha.com>
Date: Sun, 19 Aug 2012 18:24:57 +1000
Subject: Re: How do I display unicode value stored in a string variable using ord()
From: Chris Angelico <rosuav@gmail.com>
To: python-list@python.org
Content-Type: text/plain; charset=ISO-8859-1
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.3487.1345364700.4697.python-list@python.org>
Lines: 26
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:27362

On Sun, Aug 19, 2012 at 6:11 PM, Paul Rubin <no.email@nospam.invalid> wrote:
> Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:
>>     result = text[end:]
>
> if end not near the end of the original string, then this is O(N)
> even with fixed-width representation, because of the char copying.
>
> if it is near the end, by knowing where the string data area
> ends, I think it should be possible to scan backwards from
> the end, recognizing what bytes can be the beginning of code points and
> counting off the appropriate number.  This is O(1) if "near the end"
> means "within a constant".

Only if you know exactly where the end is (which requires storing and
maintaining a character length - this may already be happening, I
don't know). But that approach means you need to have code for both
ways (forward search or reverse), and of course it relies on your
encoding being reverse-scannable in this way (as UTF-8 is, but not
all).

And of course, taking the *entire* rest of the string isn't the only
thing you do. What if you want to take the next six characters after
that index? That would be constant time with a fixed-width storage
format.

ChrisA