Re: Thanks for all responses

Date	2011-06-01 19:29 +0200
From	Wolfgang Meiners <WolfgangMeiners01@web.de>
Newsgroups	comp.lang.python
Subject	Re: Thanks for all responses
References	<4de40ee8$0$6623$9b4e6d93@newsspool2.arcor-online.net> <mailman.2315.1306841548.9059.python-list@python.org> <4de50cfd$0$6538$9b4e6d93@newsspool4.arcor-online.net> <4de546f8$0$6556$9b4e6d93@newsspool4.arcor-online.net> <mailman.2337.1306878988.9059.python-list@python.org>
Message-ID	<4de67709$0$6572$9b4e6d93@newsspool3.arcor-online.net> (permalink)
Organization	Arcor

Show all headers | View raw

Am 31.05.11 23:56, schrieb Chris Angelico:
> On Wed, Jun 1, 2011 at 5:52 AM, Wolfgang Meiners
> <WolfgangMeiners01@web.de> wrote:
>> Whenever i 'cross the border' of my program, i have to encode the 'list
>> of bytes' to an unicode string or decode the unicode string to a 'list
>> of bytes' which is meaningful to the world outside.
> 
> Most people use "encode" and "decode" the other way around; you encode
> a string as UTF-8, and decode UTF-8 into a Unicode string. But yes,
> you're correct.

Ok. I think i will adapt to the majority in this point.
I think i mixed up
unicodestring=unicode(bytestring,encoding='utf8')
and
bytestring=u'unicodestring'.encode('utf8')

> 
>> So encode early, decode lately means, to do it as near to the border as
>> possible and to encode/decode i need a coding system, for example 'utf8'
> 

I think i should change this to decode early, encode lately.

> Correct on both counts.
> 
>> That means, there should be an encoding/decoding possibility to every
>> interface i can use: files, stdin, stdout, stderr, gui (should be the
>> most important ones).
> 
> The file objects (as returned by open()) have an encoding, which
> (IMHO) defaults to "utf8". GUI work depends on your GUI toolkit, and
> might well accept Unicode strings directly - check the docs.
> 
>>    def __repr__(self):
>>        return u'My name is %s' % self.Name
> 
> This means that repr() will return a Unicode string.
> 
>>    # this does work
>>    print a.__repr__()
>>
>>    # throws an error if default encoding is ascii
>>    # but works if default encoding is utf8
>>    print a
>>
>>    # throws an error because a is not a string
>>    print unicode(a, encoding='utf8')
> 
> The __repr__ function is supposed to return a string object, in Python
> 2. See http://docs.python.org/reference/datamodel.html#object.__repr__
> for that and other advice on writing __repr__. The problems you're
> seeing are a result of the built-in repr() function calling
> a.__repr__() and then treating the return value as an ASCII str, not a
> Unicode string.
> 
> This would work:
>     def __repr__(self):
>         return (u'My name is %s' % self.Name).encode('utf8')
> 
> Alternatively, migrate to Python 3, where the default is Unicode
> strings. I tested this in Python 3.2 on Windows, but it should work on
> anything in the 3.x branch:
> 
> class NoEnc:
> 	def __init__(self,Name=None):
> 		self.Name=Name
> 	def __repr__(self):
> 		return 'My name is %s' % self.Name
> 
> if __name__ == '__main__':
> 
>    a = NoEnc('Müller')
> 
>    # this will still work (print is now a function, not a statement)
>    print(a.__repr__())
> 
>    # this will work in Python 3.x
>    print(a)
> 
>    # 'unicode' has been renamed to 'str', but it's already unicode so
> this makes no sense
>    print(str(a, encoding='utf8'))
> 
>    # to convert it to UTF-8, convert it to a string with str() or
> repr() and then print:
>    print(str(a).encode('utf8'))
> ############################
> 
> Note that the last one will probably not do what you expect. The
> Python 3 'print' function (it's not a statement any more, so you need
> parentheses around its argument) wants a Unicode string, so you don't
> need to encode it. When you encode a Unicode string as in the last
> example, it returns a bytes string (an array of bytes), which looks
> like this: b'My name is M\xc3\xbcller'  The print function wants
> Unicode, though, so it takes this unexpected object and calls str() on
> it, hence the odd display.
> 
> Hope that helps!

Yes it helped a lot. One last question here: When i have free choice and
i dont know Python 2 and Python 3 very good: What would be the
recommended choice?

> 
> Chris Angelico

Wolfgang

Thread

sqlalchemy and Unicode strings: errormessage Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-05-30 23:40 +0200
  Re: sqlalchemy and Unicode strings: errormessage Chris Withers <chris@simplistix.co.uk> - 2011-05-31 10:55 +0100
    Re: sqlalchemy and Unicode strings: errormessage Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-05-31 17:47 +0200
  Re: sqlalchemy and Unicode strings: errormessage Daniel Kluev <dan.kluev@gmail.com> - 2011-05-31 22:32 +1100
    Re: sqlalchemy and Unicode strings: errormessage Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-05-31 17:45 +0200
      Re: sqlalchemy and Unicode strings: errormessage Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-05-31 18:10 +0200
      Re: sqlalchemy and Unicode strings: errormessage Benjamin Kaplan <benjamin.kaplan@case.edu> - 2011-05-31 09:42 -0700
      RE: sqlalchemy and Unicode strings: errormessage "Prasad, Ramit" <ramit.prasad@jpmchase.com> - 2011-05-31 12:31 -0400
      Re: sqlalchemy and Unicode strings: errormessage Chris Angelico <rosuav@gmail.com> - 2011-06-01 03:19 +1000
      Thanks for all responses Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-05-31 21:52 +0200
        Re: Thanks for all responses Chris Angelico <rosuav@gmail.com> - 2011-06-01 07:56 +1000
          Re: Thanks for all responses Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-06-01 19:29 +0200
            Re: Thanks for all responses Chris Angelico <rosuav@gmail.com> - 2011-06-02 04:38 +1000

csiph-web