Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
To: python-list@python.org
From: Stefan Behnel <stefan_ml@behnel.de>
Subject: Re: catch UnicodeDecodeError
Date: Thu, 26 Jul 2012 13:15:37 +0200
References: <04f7ff8d-9881-4a04-ab2e-b5573b5f3cd1@googlegroups.com> <mailman.2570.1343216119.4697.python-list@python.org> <b8723e64-12fa-4e53-8914-8f2b8e9c0f1d@googlegroups.com> <mailman.2581.1343242258.4697.python-list@python.org> <38f5cdaf-c021-4ccd-8fcb-c68b21d3aeb2@w24g2000vby.googlegroups.com> <mailman.2593.1343291337.4697.python-list@python.org> <17bf754d-b1e9-4bb7-bf42-190325ee969a@q29g2000vby.googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120714 Thunderbird/14.0
In-Reply-To: <17bf754d-b1e9-4bb7-bf42-190325ee969a@q29g2000vby.googlegroups.com>
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.2599.1343301351.4697.python-list@python.org>
Lines: 42
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:26081

Jaroslav Dobrek, 26.07.2012 12:51:
>>> try:
>>>     for line in f: # here text is decoded implicitly
>>>        do_something()
>>> except UnicodeDecodeError():
>>>     do_something_different()
> 
> the code above (without the brackets) is semantically bad: The
> exception is not caught.

Sure it is. Just to repeat myself: if the above doesn't catch the
exception, then the exception did not originate from the place where you
think it did. Again: look at the traceback.


>>> The problem is that vast majority of the thousands of files that I
>>> process are correctly encoded. But then, suddenly, there is a bad
>>> character in a new file. (This is so because most files today are
>>> generated by people who don't know that there is such a thing as
>>> encodings.) And then I need to rewrite my very complex program just
>>> because of one single character in one single file.
>>
>> Why would that be the case? The places to change should be very local in
>> your code.
> 
> This is the case in a program that has many different functions which
> open and parse different
> types of files. When I read and parse a directory with such different
> types of files, a program that
> uses
> 
> for line in f:
> 
> will not exit with any hint as to where the error occurred. I just
> exits with a UnicodeDecodeError.

... that tells you the exact code line where the error occurred. No need to
look around.

Stefan