Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'example:': 0.03; 'encoding': 0.05; 'url:pipermail': 0.05; '"""': 0.07; '-*-': 0.07; 'utf-8': 0.07; '"__main__":': 0.09; '__name__': 0.09; 'arguments': 0.09; 'coding:': 0.09; 'doctest': 0.09; 'inserted': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:skip:c 10': 0.09; 'def': 0.12; 'windows': 0.15; 'broken.': 0.16; 'encodings': 0.16; 'labelled': 0.16; 'literal,': 0.16; 'literal.': 0.16; 'ought': 0.16; 'reasonably': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'unequal': 0.16; 'subject:python': 0.16; 'followed': 0.16; 'wrote:': 0.18; 'properly': 0.19; 'seems': 0.21; '>>>': 0.22; 'appears': 0.22; 'import': 0.22; 'header:User-Agent:1': 0.23; 'error': 0.23; 'errors.': 0.24; 'unicode': 0.24; 'post': 0.26; 'header:X -Complaints-To:1': 0.27; 'header:In-Reply-To:1': 0.27; 'tried': 0.27; 'character': 0.29; 'characters': 0.30; "i'm": 0.30; 'code': 0.31; 'getting': 0.31; 'posting': 0.31; 'claiming': 0.31; 'comparison': 0.31; "d'aprano": 0.31; 'ok.': 0.31; 'steven': 0.31; 'file': 0.32; 'linux': 0.33; 'url:python': 0.33; 'copying': 0.34; 'skip:_ 10': 0.34; 'problem': 0.35; 'convert': 0.35; 'equal': 0.35; 'but': 0.35; 'there': 0.35; 'shows': 0.36; 'url:org': 0.36; 'skip:- 20': 0.37; 'two': 0.37; 'being': 0.38; 'to:addr:python- list': 0.38; 'to:addr:python.org': 0.39; 'skip:p 20': 0.39; 'received:org': 0.40; 'url:mail': 0.40; 'skip:u 10': 0.60; 'lower': 0.61; 'show': 0.63; 'refer': 0.63; 'our': 0.64; 'subject: & ': 0.68; 'received:109': 0.72; 'square': 0.74; 'skip:* 70': 0.78; 'drawings': 0.84; 'failures.': 0.84; 'printouts': 0.84; 'skip:/ 30': 0.84; 'top.': 0.84; 'want:': 0.84 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Robin Becker Subject: Re: doctests compatibility for python 2 & python 3 Date: Fri, 17 Jan 2014 16:17:27 +0000 References: <52d916e4$0$29999$c3e8da3$5496439d@news.astraweb.com> <52d94bea$0$29999$c3e8da3$5496439d@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Gmane-NNTP-Posting-Host: 109.174.168.73 User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 In-Reply-To: <52d94bea$0$29999$c3e8da3$5496439d@news.astraweb.com> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 87 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1389975458 news.xs4all.nl 2923 [2001:888:2000:d::a6]:37165 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:64173 On 17/01/2014 15:27, Steven D'Aprano wrote: .......... >> >> # -*- coding: utf-8 -*- >> def func(a): >> """ >> >>> print(func(u'aaa\u020b')) >> aaa╚ï >> """ >> return a > > There seems to be some mojibake in your post, which confuses issues. > > You refer to \u020b, which is LATIN SMALL LETTER I WITH INVERTED BREVE. > At least, that's what it ought to be. But in your post, it shows up as > the two character mojibake, ╚ followed by ï (BOX DRAWINGS DOUBLE UP AND > RIGHT followed by LATIN SMALL LETTER I WITH DIAERESIS). It appears that > your posting software somehow got confused and inserted the two > characters which you would have got using cp-437 while claiming that they > are UTF-8. (Your post is correctly labelled as UTF-8.) > > I'm confident that the problem isn't with my newsreader, Pan, because it > is pretty damn good at getting encodings right, but also because your > post shows the same mojibake in the email archive: > > https://mail.python.org/pipermail/python-list/2014-January/664771.html > > To clarify: you tried to show \u020B as a literal. As a literal, it ought > to be the single character ȋ which is a lower case I with curved accent on > top. The UTF-8 of that character is b'\xc8\x8b', which in the cp-437 code > page is two characters ╚ ï. when I edit the file in vim with ut88 encoding I do see your ȋ as the literal. However, as you note I'm on windows and no amount of cajoling will get it to work reasonably so my printouts are broken. So on windows (py27) C:\code\hg-repos>python -c"print(u'aaa\u020b')" aaa╚ï on my linux $ python2 -c"print(u'aaa\u020b')" aaaȋ $ python2 tdt1.py /usr/lib/python2.7/doctest.py:1531: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal if got == want: /usr/lib/python2.7/doctest.py:1551: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal if got == want: ********************************************************************** File "tdt1.py", line 4, in __main__.func Failed example: print(func(u'aaa\u020b')) Expected: aaaȋ Got: aaaȋ ********************************************************************** 1 items had failures: 1 of 1 in __main__.func ***Test Failed*** 1 failures. robin@everest ~/tmp: $ cat tdt1.py # -*- coding: utf-8 -*- def func(a): """ >>> print(func(u'aaa\u020b')) aaaȋ """ return a def _doctest(): import doctest doctest.testmod() if __name__ == "__main__": _doctest() robin@everest ~/tmp: so the error persists with our without copying errors. Note that on my putty terminal I don't see the character properly (I see unknown glyph square box), but it copies OK. -- Robin Becker