Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <3323513.NxQf30XKqz@xrated>
References: <3323513.NxQf30XKqz@xrated>
Date: Wed, 4 Dec 2013 10:20:31 +1100
Subject: Re: Python 2.7.5: Strange and differing behavior depending on sys.setdefaultencoding being set
From: Chris Angelico <rosuav@gmail.com>
Cc: "python-list@python.org" <python-list@python.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.3536.1386112840.18130.python-list@python.org>
Lines: 74
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:60969

On Wed, Dec 4, 2013 at 9:32 AM, Hans-Peter Jansen <hpj@urpla.net> wrote:
> I'm experiencing strange behavior with attached code, that differs depend=
ing
> on sys.setdefaultencoding being set or not. If it is set, the code works =
as
> expected, if not - what should be the usual case - the code fails with so=
me
> non-sensible traceback.

Interesting. You're mixing str and unicode objects a lot here. The
cleanest solution, IMO, would be to either switch to Python 3 or add
this to the top of your code:

from __future__ import unicode_literals

Either way, you'll have all your quoted strings be Unicode, rather
than byte, strings. Then take away the requirement that Unicode
strings contain non-ASCII characters, and let everything go through
that code branch.

Looking at this line in reprstr():

s =3D "u'%s'" % s.replace("'", "\\'")

Two potential problems with that. Firstly, the representation is
flawed: a backslash in the input string won't be changed, so it's not
a true repr; but if this is just for debugging output, that's not a
big deal. Secondly, this code might produce either a str or a unicode,
depending on the type of s. That may cause messes later; since you
seem to be mostly working with the unicode type after that, it'd
probably be simpler/safer to make that always return one:

s =3D u"u'%s'" % s.replace("'", "\\'")

But the actual problem, I think, is that repr() guarantees to return a
str, and you're trying to return a unicode. Here's an illustration:

# -*- coding: utf-8 -*-
class Foo(object):
    def __repr__(self):
        return u'=C3=A4=C3=B6=C3=BC'

foo =3D Foo()
print(foo.__repr__())
print(repr(foo))

The first one succeeds, because building up that string isn't at all a
problem. The second one then tries to turn the return value of
__repr__ into a string using the default encoding - which defaults to
'ascii', hence the problem you're seeing.

Solution 1: Switch to Python 3, in which this will work fine (because
repr() in Py3 returns a Unicode string, since _everything_ is
Unicode).

Solution 2: Explicitly encode in frec, or at the end of Record.__repr__():

        def __repr__(self):
            s =3D u'%s(\n%s\n)' % (self.__class__.__name__, frec(self.__dic=
t__))
            return s.encode("utf-8")

(that could be a one-liner, but it's already pushing 80-chars, so if
you have a length limit, breaking it helps)

Solution 3: Don't use __repr__ here, but simply have your frec
function intelligently handle Record types. Effectively, you have your
own method of generating a debug description of a Record, which could
then return a unicode instead of a str.

I personally recommend switching to Python 3 :) But presumably that's
not an option, or you'd already have considered it.

ChrisA