Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.mixmin.net!hq-usenetpeers.eweka.nl!81.171.88.15.MISMATCH!eweka.nl!lightspeed.eweka.nl!194.109.133.85.MISMATCH!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.004 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'sys': 0.05; 'exception.': 0.07; 'problem?': 0.07; 'try:': 0.07; 'cc:addr:python-list': 0.10; 'encoding': 0.15; "'rb').read()": 0.16; 'filename:fname piece:signature': 0.16; 'wrote:': 0.17; 'bytes': 0.17; 'char': 0.17; 'import': 0.21; 'cc:2**0': 0.23; 'example': 0.23; 'this:': 0.23; 'cc:no real name:2**0': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'possibly': 0.27; "doesn't": 0.28; 'fine': 0.28; '(and': 0.32; 'file': 0.32; "skip:' 20": 0.32; 'skip:b 20': 0.34; 'received:192.168.2': 0.34; 'pm,': 0.35; 'except': 0.36; 'bad': 0.37; 'subject:: ': 0.38; 'shows': 0.38; 'received:192': 0.39; 'received:192.168': 0.40; 'your': 0.60; 'skip:u 10': 0.60; 'provide': 0.62; 'email addr:gmail.com': 0.63 Date: Wed, 25 Jul 2012 13:35:09 +0200 From: Philipp Hagemeister User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.5) Gecko/20120624 Icedove/10.0.5 MIME-Version: 1.0 To: jaroslav.dobrek@gmail.com Subject: Re: catch UnicodeDecodeError References: <04f7ff8d-9881-4a04-ab2e-b5573b5f3cd1@googlegroups.com> In-Reply-To: <04f7ff8d-9881-4a04-ab2e-b5573b5f3cd1@googlegroups.com> X-Enigmail-Version: 1.4 OpenPGP: id=FAFB085C Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="------------enigA8FEDB22F8DEF8AC7F8DED5C" Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 50 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1343216119 news.xs4all.nl 6860 [2001:888:2000:d::a6]:36236 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:26038 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigA8FEDB22F8DEF8AC7F8DED5C Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Jaroslav, you can catch a UnicodeDecodeError just like any other exception. Can you provide a full example program that shows your problem? This works fine on my system: import sys open('tmp', 'wb').write(b'\xff\xff') try: buf =3D open('tmp', 'rb').read() buf.decode('utf-8') except UnicodeDecodeError as ude: sys.exit("Found a bad char in file " + "tmp") Note that you cannot possibly determine the line number if you don't know what encoding the file is in (and what EOL it uses). What you can do is count the number of bytes with the value 10 before ude.start, like this: lineGuess =3D buf[:ude.start].count(b'\n') + 1 - Philipp On 07/25/2012 01:05 PM, jaroslav.dobrek@gmail.com wrote: > it doesn't work --------------enigA8FEDB22F8DEF8AC7F8DED5C Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEAREKAAYFAlAP2e8ACgkQ9eq1gvr7CFxjIgCfZDryZu+HIQl4wSfH62sAEJl/ IlgAoJUqLDDWYZREqYe9O5PKYdlsMBki =cGOq -----END PGP SIGNATURE----- --------------enigA8FEDB22F8DEF8AC7F8DED5C--