Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #6808
| Date | 2011-06-01 19:29 +0200 |
|---|---|
| From | Wolfgang Meiners <WolfgangMeiners01@web.de> |
| Newsgroups | comp.lang.python |
| Subject | Re: Thanks for all responses |
| References | <4de40ee8$0$6623$9b4e6d93@newsspool2.arcor-online.net> <mailman.2315.1306841548.9059.python-list@python.org> <4de50cfd$0$6538$9b4e6d93@newsspool4.arcor-online.net> <4de546f8$0$6556$9b4e6d93@newsspool4.arcor-online.net> <mailman.2337.1306878988.9059.python-list@python.org> |
| Message-ID | <4de67709$0$6572$9b4e6d93@newsspool3.arcor-online.net> (permalink) |
| Organization | Arcor |
Am 31.05.11 23:56, schrieb Chris Angelico:
> On Wed, Jun 1, 2011 at 5:52 AM, Wolfgang Meiners
> <WolfgangMeiners01@web.de> wrote:
>> Whenever i 'cross the border' of my program, i have to encode the 'list
>> of bytes' to an unicode string or decode the unicode string to a 'list
>> of bytes' which is meaningful to the world outside.
>
> Most people use "encode" and "decode" the other way around; you encode
> a string as UTF-8, and decode UTF-8 into a Unicode string. But yes,
> you're correct.
Ok. I think i will adapt to the majority in this point.
I think i mixed up
unicodestring=unicode(bytestring,encoding='utf8')
and
bytestring=u'unicodestring'.encode('utf8')
>
>> So encode early, decode lately means, to do it as near to the border as
>> possible and to encode/decode i need a coding system, for example 'utf8'
>
I think i should change this to decode early, encode lately.
> Correct on both counts.
>
>> That means, there should be an encoding/decoding possibility to every
>> interface i can use: files, stdin, stdout, stderr, gui (should be the
>> most important ones).
>
> The file objects (as returned by open()) have an encoding, which
> (IMHO) defaults to "utf8". GUI work depends on your GUI toolkit, and
> might well accept Unicode strings directly - check the docs.
>
>> def __repr__(self):
>> return u'My name is %s' % self.Name
>
> This means that repr() will return a Unicode string.
>
>> # this does work
>> print a.__repr__()
>>
>> # throws an error if default encoding is ascii
>> # but works if default encoding is utf8
>> print a
>>
>> # throws an error because a is not a string
>> print unicode(a, encoding='utf8')
>
> The __repr__ function is supposed to return a string object, in Python
> 2. See http://docs.python.org/reference/datamodel.html#object.__repr__
> for that and other advice on writing __repr__. The problems you're
> seeing are a result of the built-in repr() function calling
> a.__repr__() and then treating the return value as an ASCII str, not a
> Unicode string.
>
> This would work:
> def __repr__(self):
> return (u'My name is %s' % self.Name).encode('utf8')
>
> Alternatively, migrate to Python 3, where the default is Unicode
> strings. I tested this in Python 3.2 on Windows, but it should work on
> anything in the 3.x branch:
>
> class NoEnc:
> def __init__(self,Name=None):
> self.Name=Name
> def __repr__(self):
> return 'My name is %s' % self.Name
>
> if __name__ == '__main__':
>
> a = NoEnc('Müller')
>
> # this will still work (print is now a function, not a statement)
> print(a.__repr__())
>
> # this will work in Python 3.x
> print(a)
>
> # 'unicode' has been renamed to 'str', but it's already unicode so
> this makes no sense
> print(str(a, encoding='utf8'))
>
> # to convert it to UTF-8, convert it to a string with str() or
> repr() and then print:
> print(str(a).encode('utf8'))
> ############################
>
> Note that the last one will probably not do what you expect. The
> Python 3 'print' function (it's not a statement any more, so you need
> parentheses around its argument) wants a Unicode string, so you don't
> need to encode it. When you encode a Unicode string as in the last
> example, it returns a bytes string (an array of bytes), which looks
> like this: b'My name is M\xc3\xbcller' The print function wants
> Unicode, though, so it takes this unexpected object and calls str() on
> it, hence the odd display.
>
> Hope that helps!
Yes it helped a lot. One last question here: When i have free choice and
i dont know Python 2 and Python 3 very good: What would be the
recommended choice?
>
> Chris Angelico
Wolfgang
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
sqlalchemy and Unicode strings: errormessage Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-05-30 23:40 +0200
Re: sqlalchemy and Unicode strings: errormessage Chris Withers <chris@simplistix.co.uk> - 2011-05-31 10:55 +0100
Re: sqlalchemy and Unicode strings: errormessage Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-05-31 17:47 +0200
Re: sqlalchemy and Unicode strings: errormessage Daniel Kluev <dan.kluev@gmail.com> - 2011-05-31 22:32 +1100
Re: sqlalchemy and Unicode strings: errormessage Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-05-31 17:45 +0200
Re: sqlalchemy and Unicode strings: errormessage Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-05-31 18:10 +0200
Re: sqlalchemy and Unicode strings: errormessage Benjamin Kaplan <benjamin.kaplan@case.edu> - 2011-05-31 09:42 -0700
RE: sqlalchemy and Unicode strings: errormessage "Prasad, Ramit" <ramit.prasad@jpmchase.com> - 2011-05-31 12:31 -0400
Re: sqlalchemy and Unicode strings: errormessage Chris Angelico <rosuav@gmail.com> - 2011-06-01 03:19 +1000
Thanks for all responses Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-05-31 21:52 +0200
Re: Thanks for all responses Chris Angelico <rosuav@gmail.com> - 2011-06-01 07:56 +1000
Re: Thanks for all responses Wolfgang Meiners <WolfgangMeiners01@web.de> - 2011-06-01 19:29 +0200
Re: Thanks for all responses Chris Angelico <rosuav@gmail.com> - 2011-06-02 04:38 +1000
csiph-web