Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!weretis.net!feeder1.news.weretis.net!feeder4.news.weretis.net!feeder.news-service.com!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'skip:# 20': 0.03; 'wed,': 0.03; 'string.': 0.04; '3.2': 0.05; 'string,': 0.05; 'ascii': 0.07; 'defaults': 0.07; 'function,': 0.07; 'unexpected': 0.07; 'python': 0.08; "'my": 0.09; "(it's": 0.09; '3.x': 0.09; '__name__': 0.09; 'correct.': 0.09; 'lately': 0.09; 'migrate': 0.09; 'str,': 0.09; 'utf-8': 0.09; 'this:': 0.10; 'def': 0.12; 'am,': 0.14; 'gui': 0.14; 'meaningful': 0.14; 'wrote:': 0.14; "'__main__':": 0.16; 'angelico': 0.16; 'argument)': 0.16; 'bytes),': 0.16; 'docs.': 0.16; 'encode': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'parentheses': 0.16; 'repr()': 0.16; 'statement)': 0.16; 'stdin,': 0.16; 'str()': 0.16; 'throws': 0.16; 'use:': 0.16; 'utf8': 0.16; 'wolfgang': 0.16; '\xa0def': 0.16; '\xa0print': 0.16; 'bytes': 0.19; 'object,': 0.19; 'convert': 0.19; 'header:In-Reply-To:1': 0.21; 'interface': 0.21; 'received:209.85.210.174': 0.23; 'received:mail-iy0-f174.google.com': 0.23; 'objects': 0.23; 'function': 0.25; 'statement': 0.26; 'string': 0.26; 'object': 0.26; 'example': 0.27; 'correct': 0.28; 'message- id:@mail.gmail.com': 0.28; 'seeing': 0.28; 'skip:_ 20': 0.28; '(as': 0.29; 'depends': 0.29; 'odd': 0.29; 'unicode': 0.29; 'work:': 0.29; 'class': 0.29; 'array': 0.30; 'strings.': 0.30; 'unicode,': 0.30; 'looks': 0.31; 'it.': 0.31; 'supposed': 0.31; 'calling': 0.31; 'print': 0.31; 'does': 0.33; 'to:addr:python- list': 0.33; 'error': 0.33; 'chris': 0.34; 'file': 0.34; 'example,': 0.35; 'there': 0.35; 'problems': 0.36; 'probably': 0.36; 'program,': 0.37; 'takes': 0.37; 'received:google.com': 0.37; 'received:209.85': 0.37; 'coding': 0.37; 'url:docs': 0.37; 'url:python': 0.38; 'anything': 0.38; 'url:org': 0.38; 'but': 0.38; 'subject:: ': 0.38; 'skip:s 20': 0.39; 'should': 0.39; 'received:209': 0.39; 'files,': 0.39; 'returned': 0.39; 'to:addr:python.org': 0.39; 'possibility': 0.40; 'hope': 0.60; 'your': 0.60; 'world': 0.63; '(print': 0.84; 'encoding,': 0.84; 'outside.': 0.84; 'border': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding; bh=PnaxUZa/KRI7Fqcb6dCluk7SGdwE7Mld9ln0HzUUu0Q=; b=dzt4aw3ncxOX7lrKxRWOlOaNtbIx/PDRfrsqzoWx/mwpev6vdOA4u413bMFs10fHxf YFBQQbnzngdYcGIjwAdTbo71CXtfDCeJJE5whu4FKcIk9vzI/DquIONO/H4fF9X07hT7 I9RM78Q+s7D2j1wvPS6oJYultHkZ3HAOVxfJI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=RNe067dICfcxgQrYzS+kvP7YZgVGjN8VbBg1/aPv6GjUZBc3ywa9Z7zgL6Tib8Lt3T NCbIriQqVcN6TkRw/UZXhnf9bNqbDgmaC552YhisFaDpDV5HmW7eBof9xAHz5Vvkz8o6 eUv/RT0UNbvNJA7wSuiP70AtsYf3l7iq8mwpo= MIME-Version: 1.0 In-Reply-To: <4de546f8$0$6556$9b4e6d93@newsspool4.arcor-online.net> References: <4de40ee8$0$6623$9b4e6d93@newsspool2.arcor-online.net> <4de50cfd$0$6538$9b4e6d93@newsspool4.arcor-online.net> <4de546f8$0$6556$9b4e6d93@newsspool4.arcor-online.net> Date: Wed, 1 Jun 2011 07:56:24 +1000 Subject: Re: Thanks for all responses From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 90 NNTP-Posting-Host: 82.94.164.166 X-Trace: 1306878988 news.xs4all.nl 49044 [::ffff:82.94.164.166]:58619 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:6759 On Wed, Jun 1, 2011 at 5:52 AM, Wolfgang Meiners wrote: > Whenever i 'cross the border' of my program, i have to encode the 'list > of bytes' to an unicode string or decode the unicode string to a 'list > of bytes' which is meaningful to the world outside. Most people use "encode" and "decode" the other way around; you encode a string as UTF-8, and decode UTF-8 into a Unicode string. But yes, you're correct. > So encode early, decode lately means, to do it as near to the border as > possible and to encode/decode i need a coding system, for example 'utf8' Correct on both counts. > That means, there should be an encoding/decoding possibility to every > interface i can use: files, stdin, stdout, stderr, gui (should be the > most important ones). The file objects (as returned by open()) have an encoding, which (IMHO) defaults to "utf8". GUI work depends on your GUI toolkit, and might well accept Unicode strings directly - check the docs. > =A0 =A0def __repr__(self): > =A0 =A0 =A0 =A0return u'My name is %s' % self.Name This means that repr() will return a Unicode string. > =A0 =A0# this does work > =A0 =A0print a.__repr__() > > =A0 =A0# throws an error if default encoding is ascii > =A0 =A0# but works if default encoding is utf8 > =A0 =A0print a > > =A0 =A0# throws an error because a is not a string > =A0 =A0print unicode(a, encoding=3D'utf8') The __repr__ function is supposed to return a string object, in Python 2. See http://docs.python.org/reference/datamodel.html#object.__repr__ for that and other advice on writing __repr__. The problems you're seeing are a result of the built-in repr() function calling a.__repr__() and then treating the return value as an ASCII str, not a Unicode string. This would work: =A0 =A0def __repr__(self): =A0 =A0 =A0 =A0return (u'My name is %s' % self.Name).encode('utf8') Alternatively, migrate to Python 3, where the default is Unicode strings. I tested this in Python 3.2 on Windows, but it should work on anything in the 3.x branch: class NoEnc: def __init__(self,Name=3DNone): self.Name=3DName def __repr__(self): return 'My name is %s' % self.Name if __name__ =3D=3D '__main__': a =3D NoEnc('M=FCller') # this will still work (print is now a function, not a statement) print(a.__repr__()) # this will work in Python 3.x print(a) # 'unicode' has been renamed to 'str', but it's already unicode so this makes no sense print(str(a, encoding=3D'utf8')) # to convert it to UTF-8, convert it to a string with str() or repr() and then print: print(str(a).encode('utf8')) ############################ Note that the last one will probably not do what you expect. The Python 3 'print' function (it's not a statement any more, so you need parentheses around its argument) wants a Unicode string, so you don't need to encode it. When you encode a Unicode string as in the last example, it returns a bytes string (an array of bytes), which looks like this: b'My name is M\xc3\xbcller' The print function wants Unicode, though, so it takes this unexpected object and calls str() on it, hence the odd display. Hope that helps! Chris Angelico