Path: csiph.com!usenet.pasdenom.info!news.albasani.net!newsfeed.freenet.ag!news2.euro.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <7x7gsv4lw4.fsf@ruckus.brouhaha.com>
References: <f801e06f-f7b2-4aca-b352-66856a939746@googlegroups.com> <308df2af-abe7-4043-b199-0a39f440e0ab@googlegroups.com> <502f8a2a$0$29978$c3e8da3$5496439d@news.astraweb.com> <7xehn4vyya.fsf@ruckus.brouhaha.com> <mailman.3477.1345337181.4697.python-list@python.org> <7xfw7j3a1x.fsf@ruckus.brouhaha.com> <mailman.3479.1345342743.4697.python-list@python.org> <7xtxvzehhb.fsf@ruckus.brouhaha.com> <mailman.3481.1345345309.4697.python-list@python.org> <7x7gsv4lw4.fsf@ruckus.brouhaha.com>
Date: Sun, 19 Aug 2012 13:31:21 +1000
Subject: Re: How do I display unicode value stored in a string variable using ord()
From: Chris Angelico <rosuav@gmail.com>
To: python-list@python.org
Content-Type: text/plain; charset=ISO-8859-1
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.3483.1345347084.4697.python-list@python.org>
Lines: 27
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:27345

On Sun, Aug 19, 2012 at 1:10 PM, Paul Rubin <no.email@nospam.invalid> wrote:
> Chris Angelico <rosuav@gmail.com> writes:
>> I don't have a Python example of parsing a huge string, but I've done
>> it in other languages, and when I can depend on indexing being a cheap
>> operation, I'll happily do exactly that.
>
> I'd be interested to know what the context was, where you parsed
> a big unicode string in a way that required random access to
> the nth character in the string.

It's something I've done in C/C++ fairly often. Take one big fat
buffer, slice it and dice it as you get the information you want out
of it. I'll retain and/or calculate indices (when I'm not using
pointers, but that's a different kettle of fish). Generally, I'm
working with pure ASCII, but port those same algorithms to Python and
you'll easily be able to read in a file in some known encoding and
manipulate it as Unicode.

It's not so much 'random access to the nth character' as an efficient
way of jumping forward. For instance, if I know that the next thing is
a literal string of n characters (that I don't care about), I want to
skip over that and keep parsing. The Adobe Message Format is
particularly noteworthy in this, but it's a stupid format and I don't
recommend people spend too much time reading up on it (unless you like
that sensation of your brain trying to escape through your ear).

ChrisA