Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #55480 > unrolled thread

API for custom Unicode error handlers

Started bySteven D'Aprano <steve+comp.lang.python@pearwood.info>
First post2013-10-04 13:56 +0000
Last post2013-10-04 18:44 -0400
Articles 6 — 5 participants

Back to article view | Back to comp.lang.python


Contents

  API for custom Unicode error handlers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-10-04 13:56 +0000
    Re: API for custom Unicode error handlers Chris Angelico <rosuav@gmail.com> - 2013-10-05 03:22 +1000
    Re: API for custom Unicode error handlers Ethan Furman <ethan@stoneleaf.us> - 2013-10-04 11:05 -0700
    Re: API for custom Unicode error handlers Serhiy Storchaka <storchaka@gmail.com> - 2013-10-04 22:08 +0300
    Re: API for custom Unicode error handlers Serhiy Storchaka <storchaka@gmail.com> - 2013-10-04 22:35 +0300
    Re: API for custom Unicode error handlers Terry Reedy <tjreedy@udel.edu> - 2013-10-04 18:44 -0400

#55480 — API for custom Unicode error handlers

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-10-04 13:56 +0000
SubjectAPI for custom Unicode error handlers
Message-ID<524ec8fe$0$29984$c3e8da3$5496439d@news.astraweb.com>
I have some custom Unicode error handlers, and I'm looking for advice on 
the right API for dealing with them.

I have a module containing custom Unicode error handlers. For example:

# Python 3
import unicodedata
def namereplace_errors(exc):
    c = exc.object[exc.start]
    try:
        name = unicodedata.name(c)
    except (KeyError, ValueError):
        n = ord(c)
        if n <= 0xFFFF:
            replace = "\\u%04x"
        else:
            assert n <= 0x10FFFF
            replace = "\\U%08x"
        replace = replace % n
    else:
        replace = "\\N{%s}" % name
    return replace, exc.start + 1


Before I can use the error handler, I need to register it using this:


import codecs
codecs.register_error('namereplace', namereplace_errors)

And now:

py> 'abc\u04F1'.encode('ascii', 'namereplace')
b'abc\\N{CYRILLIC SMALL LETTER U WITH DIAERESIS}'


Now, my question:

Should the module holding the error handlers automatically register them? 
In other words, if I do:

import error_handlers

just importing it will have the side-effect of registering the error 
handlers. Normally, I dislike imports that have side-effects of this 
sort, but I'm not sure that the alternative is better, that is, to put 
responsibility on the caller to register some, or all, of the handlers:

import error_handlers
error_handlers.register(error_handlers.namereplace_errors)
error_handlers.register_all()


As far as I know, there is no way to find out what error handlers are 
registered, and no way to deregister one after it has been registered.

Which API would you prefer if you were using this module?


-- 
Steven

[toc] | [next] | [standalone]


#55486

FromChris Angelico <rosuav@gmail.com>
Date2013-10-05 03:22 +1000
Message-ID<mailman.727.1380907377.18130.python-list@python.org>
In reply to#55480
On Fri, Oct 4, 2013 at 11:56 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> Should the module holding the error handlers automatically register them?
> In other words, if I do:
>
> import error_handlers
>
> just importing it will have the side-effect of registering the error
> handlers. Normally, I dislike imports that have side-effects of this
> sort, but I'm not sure that the alternative is better, that is, to put
> responsibility on the caller to register some, or all, of the handlers:
>
> import error_handlers
> error_handlers.register(error_handlers.namereplace_errors)
> error_handlers.register_all()

Caveat: I don't actually use codecs much, so I don't know the specifics.

I'd be quite happy with importing having a side-effect here. If you
import a module that implements a numeric type, it should immediately
register itself with the Numeric ABC, right? This is IMO equivalent to
that.

> As far as I know, there is no way to find out what error handlers are
> registered, and no way to deregister one after it has been registered.

The only risk that I see is of an accidental collision. Having a codec
registered that you don't use can't hurt (afaik). Is there any
mechanism for detecting a name collision? If not, I wouldn't worry
about it.

ChrisA

[toc] | [prev] | [next] | [standalone]


#55487

FromEthan Furman <ethan@stoneleaf.us>
Date2013-10-04 11:05 -0700
Message-ID<mailman.728.1380911272.18130.python-list@python.org>
In reply to#55480
On 10/04/2013 06:56 AM, Steven D'Aprano wrote:
>
> Should the module holding the error handlers automatically register them?

I think it should.

Registration only needs to happen once, the module is useless without being registered, no threads nor processes are 
being started, and the only reason to import the module is to get the functionality... isn't it?

What about help(), sphynx (sp?), or other introspection tools?

This sounds similar to cgitb -- another module which you only import if you want the html'ized traceback, and yet it 
requires a separate cgitb.enable() call...

I change my mind, it shouldn't.

Throw in a .enable() function and call it good.  :)

--
~Ethan~

[toc] | [prev] | [next] | [standalone]


#55488

FromSerhiy Storchaka <storchaka@gmail.com>
Date2013-10-04 22:08 +0300
Message-ID<mailman.729.1380913714.18130.python-list@python.org>
In reply to#55480
04.10.13 20:22, Chris Angelico написав(ла):
> I'd be quite happy with importing having a side-effect here. If you
> import a module that implements a numeric type, it should immediately
> register itself with the Numeric ABC, right? This is IMO equivalent to
> that.

There is a difference. You can't use a numeric type without importing a 
module, but you can use error handler registered outside of your module.

This leads to subtle bugs. Let the A module imports error_handlers and 
uses error handle. The module B uses error handle but doesn't import 
error_handlers. C.py imports A and B and all works. D.py imports B and A 
and fails.

[toc] | [prev] | [next] | [standalone]


#55491

FromSerhiy Storchaka <storchaka@gmail.com>
Date2013-10-04 22:35 +0300
Message-ID<mailman.730.1380916476.18130.python-list@python.org>
In reply to#55480
04.10.13 16:56, Steven D'Aprano написав(ла):
> I have some custom Unicode error handlers, and I'm looking for advice on
> the right API for dealing with them.
>
> I have a module containing custom Unicode error handlers. For example:
>
> # Python 3
> import unicodedata
> def namereplace_errors(exc):
>      c = exc.object[exc.start]
>      try:
>          name = unicodedata.name(c)
>      except (KeyError, ValueError):
>          n = ord(c)
>          if n <= 0xFFFF:
>              replace = "\\u%04x"
>          else:
>              assert n <= 0x10FFFF
>              replace = "\\U%08x"
>          replace = replace % n
>      else:
>          replace = "\\N{%s}" % name
>      return replace, exc.start + 1

I'm planning to built this error handler in 3.4 (see 
http://comments.gmane.org/gmane.comp.python.ideas/21296).

Actually Python implementation should looks like:

def namereplace_errors(exc):
     if not isinstance(exc, UnicodeEncodeError):
         raise exc
     replace = []
     for c in exc.object[exc.start:exc.end]:
         try:
             replace.append(r'\N{%s}' % unicodedata.name(c))
         except KeyError:
             n = ord(c)
             if n < 0x100:
                 replace.append(r'\x%02x' % n)
             elif n < 0x10000:
                 replace.append(r'\u%04x' % n)
             else:
                 replace.append(r'\U%08x' % n)
     return ''.join(replace), exc.end

> Now, my question:
>
> Should the module holding the error handlers automatically register them?

This question interesting me too.

[toc] | [prev] | [next] | [standalone]


#56165

FromTerry Reedy <tjreedy@udel.edu>
Date2013-10-04 18:44 -0400
Message-ID<mailman.733.1380926707.18130.python-list@python.org>
In reply to#55480
On 10/4/2013 3:35 PM, Serhiy Storchaka wrote:
> 04.10.13 16:56, Steven D'Aprano написав(ла):
>> I have some custom Unicode error handlers, and I'm looking for advice on
>> the right API for dealing with them.

> I'm planning to built this error handler in 3.4 (see
> http://comments.gmane.org/gmane.comp.python.ideas/21296).

>> Should the module holding the error handlers automatically register them?
>
> This question interesting me too.

I did not respond on the p-i thread, but +1 for 'namereplace' also. Like 
others, I would prefer auto-register unless that creates a problem. If 
it is a problem, perhaps the registry mechanism needs improvement. On 
the other hand, it is it built-in, it will be pre-registered.

-- 
Terry Jan Reedy

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web