Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!npeer.de.kpn-eurorings.net!npeer-ng0.de.kpn-eurorings.net!news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <1efhl8i0dmr9b.15q8opn6p0cj3.dlg@40tude.net>
References: <1efhl8i0dmr9b.15q8opn6p0cj3.dlg@40tude.net>
Date: Fri, 16 Aug 2013 23:14:20 +0100
Subject: Re: Proper use of the codecs module.
From: Chris Angelico <rosuav@gmail.com>
To: python-list@python.org
Content-Type: text/plain; charset=ISO-8859-1
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.7.1376691263.23369.python-list@python.org>
Lines: 13
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:52610

On Fri, Aug 16, 2013 at 3:02 PM, Andrew <andrew@invalid.invalid> wrote:
> I have a mixed binary/text file[0], and the text portions use a radically
> nonstandard character set. I want to read them easily given information
> about the character encoding and an offset for the beginning of a string.

To add to all the information already given: Is the file small enough
to comfortably fit into memory? If so, you'll find it a LOT easier to
play with strings in RAM than files on disk. Even if not, you may find
a lot of tasks simplified by just reading a kay or a meg in and then
working within that. That spares you the fiddliness of read(1) all the
time, at the expense of potentially reading more than you need.

ChrisA