Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #26125
| Date | 2012-07-27 01:03 +0200 |
|---|---|
| From | Alan Franzoni <mailing@franzoni.eu> |
| Subject | codecs.register_error for "strict", unicode.encode() and str.decode() |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.2641.1343344059.4697.python-list@python.org> (permalink) |
Hello,
I think I'm missing some piece here.
I'm trying to register a default error handler for handling exceptions
for preventing encoding/decoding errors (I know how this works and that
making this global is probably not a good practice, but I found this
strange behaviour while writing a proof of concept of how to let Python
work in a more forgiving way).
What I discovered is that register_error() for "strict" seems to work in
the way I expect for string decoding, not for unicode encoding.
That's what happens on Mac, Python 2.7.1 from Apple:
melquiades:tmp alan$ cat minimal_test_encode.py
# -*- coding: utf-8 -*-
import codecs
def handle_encode(e):
return ("ASD", e.end)
codecs.register_error("strict", handle_encode)
print u"à".encode("ascii")
melquiades:tmp alan$ python minimal_test_encode.py
Traceback (most recent call last):
File "minimal_test_encode.py", line 10, in <module>
u"à".encode("ascii")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in
position 0: ordinal not in range(128)
OTOH this works properly:
melquiades:tmp alan$ cat minimal_test_decode.py
# -*- coding: utf-8 -*-
import codecs
def handle_decode(e):
return (u"ASD", e.end)
codecs.register_error("strict", handle_decode)
print "à".decode("ascii")
melquiades:tmp alan$ python minimal_test_decode.py
ASDASD
What piece am I missing? The doc at
http://docs.python.org/library/codecs.html says " For
encoding /error_handler/ will be called with a UnicodeEncodeError
<http://docs.python.org/library/exceptions.html#exceptions.UnicodeEncodeError> instance,
which contains information about the location of the error.", is there
any reason why the standard "strict" handler cannot be replaced?
Thanks for any clue.
File links:
https://dl.dropbox.com/u/249926/minimal_test_decode.py
https://dl.dropbox.com/u/249926/minimal_test_encode.py
--
Alan Franzoni
contact me at public@[mysurname].eu
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
codecs.register_error for "strict", unicode.encode() and str.decode() Alan Franzoni <mailing@franzoni.eu> - 2012-07-27 01:03 +0200
csiph-web