Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #26038

Re: catch UnicodeDecodeError

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.mixmin.net!hq-usenetpeers.eweka.nl!81.171.88.15.MISMATCH!eweka.nl!lightspeed.eweka.nl!194.109.133.85.MISMATCH!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <phihag@phihag.de>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.004
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; 'sys': 0.05; 'exception.': 0.07; 'problem?': 0.07; 'try:': 0.07; 'cc:addr:python-list': 0.10; 'encoding': 0.15; "'rb').read()": 0.16; 'filename:fname piece:signature': 0.16; 'wrote:': 0.17; 'bytes': 0.17; 'char': 0.17; 'import': 0.21; 'cc:2**0': 0.23; 'example': 0.23; 'this:': 0.23; 'cc:no real name:2**0': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'possibly': 0.27; "doesn't": 0.28; 'fine': 0.28; '(and': 0.32; 'file': 0.32; "skip:' 20": 0.32; 'skip:b 20': 0.34; 'received:192.168.2': 0.34; 'pm,': 0.35; 'except': 0.36; 'bad': 0.37; 'subject:: ': 0.38; 'shows': 0.38; 'received:192': 0.39; 'received:192.168': 0.40; 'your': 0.60; 'skip:u 10': 0.60; 'provide': 0.62; 'email addr:gmail.com': 0.63
Date Wed, 25 Jul 2012 13:35:09 +0200
From Philipp Hagemeister <phihag@phihag.de>
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:10.0.5) Gecko/20120624 Icedove/10.0.5
MIME-Version 1.0
To jaroslav.dobrek@gmail.com
Subject Re: catch UnicodeDecodeError
References <04f7ff8d-9881-4a04-ab2e-b5573b5f3cd1@googlegroups.com>
In-Reply-To <04f7ff8d-9881-4a04-ab2e-b5573b5f3cd1@googlegroups.com>
X-Enigmail-Version 1.4
OpenPGP id=FAFB085C
Content-Type multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="------------enigA8FEDB22F8DEF8AC7F8DED5C"
Cc python-list@python.org
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.2570.1343216119.4697.python-list@python.org> (permalink)
Lines 50
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1343216119 news.xs4all.nl 6860 [2001:888:2000:d::a6]:36236
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:26038

Show key headers only | View raw


[Multipart message — attachments visible in raw view] - view raw

Hi Jaroslav,

you can catch a UnicodeDecodeError just like any other exception. Can
you provide a full example program that shows your problem?

This works fine on my system:


import sys
open('tmp', 'wb').write(b'\xff\xff')
try:
    buf = open('tmp', 'rb').read()
    buf.decode('utf-8')
except UnicodeDecodeError as ude:
    sys.exit("Found a bad char in file " + "tmp")


Note that you cannot possibly determine the line number if you don't
know what encoding the file is in (and what EOL it uses).

What you can do is count the number of bytes with the value 10 before
ude.start, like this:

lineGuess = buf[:ude.start].count(b'\n') + 1

- Philipp

On 07/25/2012 01:05 PM, jaroslav.dobrek@gmail.com wrote:
> it doesn't work

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

catch UnicodeDecodeError jaroslav.dobrek@gmail.com - 2012-07-25 04:05 -0700
  Re: catch UnicodeDecodeError Andrew Berg <bahamutzero8825@gmail.com> - 2012-07-25 06:34 -0500
  Re: catch UnicodeDecodeError Philipp Hagemeister <phihag@phihag.de> - 2012-07-25 13:35 +0200
    Re: catch UnicodeDecodeError jaroslav.dobrek@gmail.com - 2012-07-25 05:09 -0700
    Re: catch UnicodeDecodeError jaroslav.dobrek@gmail.com - 2012-07-25 05:09 -0700
      Re: catch UnicodeDecodeError Dave Angel <d@davea.name> - 2012-07-25 14:50 -0400
        Re: catch UnicodeDecodeError Jaroslav Dobrek <jaroslav.dobrek@gmail.com> - 2012-07-26 00:46 -0700
          Re: catch UnicodeDecodeError Stefan Behnel <stefan_ml@behnel.de> - 2012-07-26 10:28 +0200
            Re: catch UnicodeDecodeError Jaroslav Dobrek <jaroslav.dobrek@gmail.com> - 2012-07-26 03:51 -0700
              Re: catch UnicodeDecodeError Stefan Behnel <stefan_ml@behnel.de> - 2012-07-26 13:15 +0200
                Re: catch UnicodeDecodeError jaroslav.dobrek@gmail.com - 2012-07-26 04:58 -0700
                Re: catch UnicodeDecodeError jaroslav.dobrek@gmail.com - 2012-07-26 04:58 -0700
              Re: catch UnicodeDecodeError Philipp Hagemeister <phihag@phihag.de> - 2012-07-26 14:17 +0200
              Re: catch UnicodeDecodeError Stefan Behnel <stefan_ml@behnel.de> - 2012-07-26 14:24 +0200
          Re: catch UnicodeDecodeError Chris Angelico <rosuav@gmail.com> - 2012-07-26 19:46 +1000
          Re: catch UnicodeDecodeError wxjmfauth@gmail.com - 2012-07-26 03:19 -0700
      Re: catch UnicodeDecodeError Philipp Hagemeister <phihag@phihag.de> - 2012-07-26 14:43 +0200

csiph-web