Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #52610

Re: Proper use of the codecs module.

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!npeer.de.kpn-eurorings.net!npeer-ng0.de.kpn-eurorings.net!news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <rosuav@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.024
X-Spam-Evidence '*H*': 0.95; '*S*': 0.00; '16,': 0.03; 'encoding': 0.05; 'string.': 0.05; 'comfortably': 0.09; 'expense': 0.09; 'mixed': 0.09; 'portions': 0.09; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'simplified': 0.16; 'wrote:': 0.18; 'not,': 0.20; 'fit': 0.20; 'aug': 0.22; 'header:In-Reply- To:1': 0.27; 'character': 0.29; 'andrew': 0.30; 'message- id:@mail.gmail.com': 0.30; 'easier': 0.31; 'that.': 0.31; 'kay': 0.31; 'file': 0.32; 'text': 0.33; 'beginning': 0.33; 'fri,': 0.33; 'subject:the': 0.34; 'received:google.com': 0.35; 'add': 0.35; 'ram': 0.36; 'set.': 0.36; 'so,': 0.37; 'easily': 0.37; 'tasks': 0.38; 'to:addr:python-list': 0.38; 'files': 0.38; 'pm,': 0.38; 'to:addr:python.org': 0.39; 'enough': 0.39; 'even': 0.60; 'read': 0.60; "you'll": 0.62; 'information': 0.63; 'more': 0.64; 'within': 0.65; 'potentially': 0.81; 'meg': 0.84; 'spares': 0.84; '2013': 0.98
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=1SNA4ffdN2CZOwi0Nw2Ddqy8JDWOeqU3Yx8Sf+ZBHf0=; b=exGVo3BCgxlt7kfoTLlZVKFCOIkGcRdiIxczYlumIqzipJ4v+3XhmKGdfX13djTyLx 2jYfSJ5Rmi6y+RRQEIlSc22sIL/cIWTzAwIedX73/OV9QnAPEfM+PMLH69b57qmbNg9b +2YEtWqiFiVz7VbGkSUW2+M4Esta8Y6+NHk1JsuJ82PELDPU5YpL3sQYMiD75YyX7pXX LMZqwcjzvxh52UEypcLQAD3xPBwgPKf3ZQgFUXEc4J/kz9Z17PPss8QbhALRz7vAhobx m0Hqw6IbleAvZ9vbVi/Uq+78w+x0RX4d/mGnN9uP8OOJEXhJb5NjO3T2oHmY+ENKqCSI SqXA==
MIME-Version 1.0
X-Received by 10.58.80.7 with SMTP id n7mr3175253vex.23.1376691260904; Fri, 16 Aug 2013 15:14:20 -0700 (PDT)
In-Reply-To <1efhl8i0dmr9b.15q8opn6p0cj3.dlg@40tude.net>
References <1efhl8i0dmr9b.15q8opn6p0cj3.dlg@40tude.net>
Date Fri, 16 Aug 2013 23:14:20 +0100
Subject Re: Proper use of the codecs module.
From Chris Angelico <rosuav@gmail.com>
To python-list@python.org
Content-Type text/plain; charset=ISO-8859-1
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.7.1376691263.23369.python-list@python.org> (permalink)
Lines 13
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1376691263 news.xs4all.nl 15867 [2001:888:2000:d::a6]:48979
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:52610

Show key headers only | View raw


On Fri, Aug 16, 2013 at 3:02 PM, Andrew <andrew@invalid.invalid> wrote:
> I have a mixed binary/text file[0], and the text portions use a radically
> nonstandard character set. I want to read them easily given information
> about the character encoding and an offset for the beginning of a string.

To add to all the information already given: Is the file small enough
to comfortably fit into memory? If so, you'll find it a LOT easier to
play with strings in RAM than files on disk. Even if not, you may find
a lot of tasks simplified by just reading a kay or a meg in and then
working within that. That spares you the fiddliness of read(1) all the
time, at the expense of potentially reading more than you need.

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

Proper use of the codecs module. Andrew <andrew@invalid.invalid> - 2013-08-16 10:02 -0400
  Re: Proper use of the codecs module. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-08-16 19:12 +0000
    Re: Proper use of the codecs module. Andrew <andrew@invalid.invalid> - 2013-08-16 16:16 -0400
  Re: Proper use of the codecs module. Chris Angelico <rosuav@gmail.com> - 2013-08-16 23:14 +0100

csiph-web