Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #55480 > unrolled thread
| Started by | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| First post | 2013-10-04 13:56 +0000 |
| Last post | 2013-10-04 18:44 -0400 |
| Articles | 6 — 5 participants |
Back to article view | Back to comp.lang.python
API for custom Unicode error handlers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-10-04 13:56 +0000
Re: API for custom Unicode error handlers Chris Angelico <rosuav@gmail.com> - 2013-10-05 03:22 +1000
Re: API for custom Unicode error handlers Ethan Furman <ethan@stoneleaf.us> - 2013-10-04 11:05 -0700
Re: API for custom Unicode error handlers Serhiy Storchaka <storchaka@gmail.com> - 2013-10-04 22:08 +0300
Re: API for custom Unicode error handlers Serhiy Storchaka <storchaka@gmail.com> - 2013-10-04 22:35 +0300
Re: API for custom Unicode error handlers Terry Reedy <tjreedy@udel.edu> - 2013-10-04 18:44 -0400
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-10-04 13:56 +0000 |
| Subject | API for custom Unicode error handlers |
| Message-ID | <524ec8fe$0$29984$c3e8da3$5496439d@news.astraweb.com> |
I have some custom Unicode error handlers, and I'm looking for advice on
the right API for dealing with them.
I have a module containing custom Unicode error handlers. For example:
# Python 3
import unicodedata
def namereplace_errors(exc):
c = exc.object[exc.start]
try:
name = unicodedata.name(c)
except (KeyError, ValueError):
n = ord(c)
if n <= 0xFFFF:
replace = "\\u%04x"
else:
assert n <= 0x10FFFF
replace = "\\U%08x"
replace = replace % n
else:
replace = "\\N{%s}" % name
return replace, exc.start + 1
Before I can use the error handler, I need to register it using this:
import codecs
codecs.register_error('namereplace', namereplace_errors)
And now:
py> 'abc\u04F1'.encode('ascii', 'namereplace')
b'abc\\N{CYRILLIC SMALL LETTER U WITH DIAERESIS}'
Now, my question:
Should the module holding the error handlers automatically register them?
In other words, if I do:
import error_handlers
just importing it will have the side-effect of registering the error
handlers. Normally, I dislike imports that have side-effects of this
sort, but I'm not sure that the alternative is better, that is, to put
responsibility on the caller to register some, or all, of the handlers:
import error_handlers
error_handlers.register(error_handlers.namereplace_errors)
error_handlers.register_all()
As far as I know, there is no way to find out what error handlers are
registered, and no way to deregister one after it has been registered.
Which API would you prefer if you were using this module?
--
Steven
[toc] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-10-05 03:22 +1000 |
| Message-ID | <mailman.727.1380907377.18130.python-list@python.org> |
| In reply to | #55480 |
On Fri, Oct 4, 2013 at 11:56 PM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > Should the module holding the error handlers automatically register them? > In other words, if I do: > > import error_handlers > > just importing it will have the side-effect of registering the error > handlers. Normally, I dislike imports that have side-effects of this > sort, but I'm not sure that the alternative is better, that is, to put > responsibility on the caller to register some, or all, of the handlers: > > import error_handlers > error_handlers.register(error_handlers.namereplace_errors) > error_handlers.register_all() Caveat: I don't actually use codecs much, so I don't know the specifics. I'd be quite happy with importing having a side-effect here. If you import a module that implements a numeric type, it should immediately register itself with the Numeric ABC, right? This is IMO equivalent to that. > As far as I know, there is no way to find out what error handlers are > registered, and no way to deregister one after it has been registered. The only risk that I see is of an accidental collision. Having a codec registered that you don't use can't hurt (afaik). Is there any mechanism for detecting a name collision? If not, I wouldn't worry about it. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Ethan Furman <ethan@stoneleaf.us> |
|---|---|
| Date | 2013-10-04 11:05 -0700 |
| Message-ID | <mailman.728.1380911272.18130.python-list@python.org> |
| In reply to | #55480 |
On 10/04/2013 06:56 AM, Steven D'Aprano wrote: > > Should the module holding the error handlers automatically register them? I think it should. Registration only needs to happen once, the module is useless without being registered, no threads nor processes are being started, and the only reason to import the module is to get the functionality... isn't it? What about help(), sphynx (sp?), or other introspection tools? This sounds similar to cgitb -- another module which you only import if you want the html'ized traceback, and yet it requires a separate cgitb.enable() call... I change my mind, it shouldn't. Throw in a .enable() function and call it good. :) -- ~Ethan~
[toc] | [prev] | [next] | [standalone]
| From | Serhiy Storchaka <storchaka@gmail.com> |
|---|---|
| Date | 2013-10-04 22:08 +0300 |
| Message-ID | <mailman.729.1380913714.18130.python-list@python.org> |
| In reply to | #55480 |
04.10.13 20:22, Chris Angelico написав(ла): > I'd be quite happy with importing having a side-effect here. If you > import a module that implements a numeric type, it should immediately > register itself with the Numeric ABC, right? This is IMO equivalent to > that. There is a difference. You can't use a numeric type without importing a module, but you can use error handler registered outside of your module. This leads to subtle bugs. Let the A module imports error_handlers and uses error handle. The module B uses error handle but doesn't import error_handlers. C.py imports A and B and all works. D.py imports B and A and fails.
[toc] | [prev] | [next] | [standalone]
| From | Serhiy Storchaka <storchaka@gmail.com> |
|---|---|
| Date | 2013-10-04 22:35 +0300 |
| Message-ID | <mailman.730.1380916476.18130.python-list@python.org> |
| In reply to | #55480 |
04.10.13 16:56, Steven D'Aprano написав(ла):
> I have some custom Unicode error handlers, and I'm looking for advice on
> the right API for dealing with them.
>
> I have a module containing custom Unicode error handlers. For example:
>
> # Python 3
> import unicodedata
> def namereplace_errors(exc):
> c = exc.object[exc.start]
> try:
> name = unicodedata.name(c)
> except (KeyError, ValueError):
> n = ord(c)
> if n <= 0xFFFF:
> replace = "\\u%04x"
> else:
> assert n <= 0x10FFFF
> replace = "\\U%08x"
> replace = replace % n
> else:
> replace = "\\N{%s}" % name
> return replace, exc.start + 1
I'm planning to built this error handler in 3.4 (see
http://comments.gmane.org/gmane.comp.python.ideas/21296).
Actually Python implementation should looks like:
def namereplace_errors(exc):
if not isinstance(exc, UnicodeEncodeError):
raise exc
replace = []
for c in exc.object[exc.start:exc.end]:
try:
replace.append(r'\N{%s}' % unicodedata.name(c))
except KeyError:
n = ord(c)
if n < 0x100:
replace.append(r'\x%02x' % n)
elif n < 0x10000:
replace.append(r'\u%04x' % n)
else:
replace.append(r'\U%08x' % n)
return ''.join(replace), exc.end
> Now, my question:
>
> Should the module holding the error handlers automatically register them?
This question interesting me too.
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2013-10-04 18:44 -0400 |
| Message-ID | <mailman.733.1380926707.18130.python-list@python.org> |
| In reply to | #55480 |
On 10/4/2013 3:35 PM, Serhiy Storchaka wrote: > 04.10.13 16:56, Steven D'Aprano написав(ла): >> I have some custom Unicode error handlers, and I'm looking for advice on >> the right API for dealing with them. > I'm planning to built this error handler in 3.4 (see > http://comments.gmane.org/gmane.comp.python.ideas/21296). >> Should the module holding the error handlers automatically register them? > > This question interesting me too. I did not respond on the p-i thread, but +1 for 'namereplace' also. Like others, I would prefer auto-register unless that creates a problem. If it is a problem, perhaps the registry mechanism needs improvement. On the other hand, it is it built-in, it will be pre-registered. -- Terry Jan Reedy
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web