Groups > comp.lang.python > #64154 > unrolled thread

doctests compatibility for python 2 & python 3

Started by	Robin Becker <robin@reportlab.com>
First post	2014-01-17 11:16 +0000
Last post	2014-01-17 16:17 +0000
Articles	5 — 2 participants

Back to article view | Back to comp.lang.python

  doctests compatibility for python 2 & python 3 Robin Becker <robin@reportlab.com> - 2014-01-17 11:16 +0000
    Re: doctests compatibility for python 2 & python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-17 11:41 +0000
      Re: doctests compatibility for python 2 & python 3 Robin Becker <robin@reportlab.com> - 2014-01-17 12:12 +0000
        Re: doctests compatibility for python 2 & python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-17 15:27 +0000
          Re: doctests compatibility for python 2 & python 3 Robin Becker <robin@reportlab.com> - 2014-01-17 16:17 +0000

#64154 — doctests compatibility for python 2 & python 3

From	Robin Becker <robin@reportlab.com>
Date	2014-01-17 11:16 +0000
Subject	doctests compatibility for python 2 & python 3
Message-ID	<mailman.5634.1389957389.18130.python-list@python.org>

I have some problems making some doctests for python2 code compatible with 
python3. The problem is that as part of our approach we are converting the code 
to use unicode internally. So we allow eihter byte strings or unicode in inputs, 
but we are trying to convert to unicode outputs.

That makes doctests quite hard as

def func(a):
     """
     >>> func(u'aaa')
     'aaa'
     """
     return a

fails in python2 whilst

def func(a):
     """
     >>> func(u'aaa')
     u'aaa'
     """
     return a

fails in python3. Aside from changing the tests so they look like
     """
     >>> func(u'aaa')==u'aaa'
     True
     """
which make the test utility harder. If the test fails I don't see the actual 
outcome and expected I see expected True got False.

Is there an easy way to make these kinds of tests work in python 2 & 3?
-- 
Robin Becker

[toc] | [next] | [standalone]

#64157

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2014-01-17 11:41 +0000
Message-ID	<52d916e4$0$29999$c3e8da3$5496439d@news.astraweb.com>
In reply to	#64154

On Fri, 17 Jan 2014 11:16:17 +0000, Robin Becker wrote:

> I have some problems making some doctests for python2 code compatible
> with python3. The problem is that as part of our approach we are
> converting the code to use unicode internally. So we allow eihter byte
> strings or unicode in inputs, but we are trying to convert to unicode
> outputs.

Alas, I think you've run into one of the weaknesses of doctest. Don't get 
me wrong, I am a huge fan of doctest, but it is hard to write polyglot 
string tests with it, as you have discovered.

However, you may be able to get 95% of the way by using print.

def func(a):
    """
    >>> print(func(u'aaa'))
    aaa
    """
    return a

ought to behave identically in both Python 2 and Python 3.3, provided you 
only print one object at a time. This ought to work with both ASCII and 
non-ASCII (at least in the BMP).

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#64158

From	Robin Becker <robin@reportlab.com>
Date	2014-01-17 12:12 +0000
Message-ID	<mailman.5637.1389960773.18130.python-list@python.org>
In reply to	#64157

On 17/01/2014 11:41, Steven D'Aprano wrote:
> def func(a):
>      """
>      >>> print(func(u'aaa'))
>      aaa
>      """
>      return a
I think this approach seems to work if I turn the docstring into unicode

def func(a):
	u"""
	>>> print(func(u'aaa\u020b'))
	aaa\u020b
	"""
	return a
def _doctest():
	import doctest
	doctest.testmod()

if __name__ == "__main__":
	_doctest()

If I leave the u off the docstring it goes wrong in python 2.7. I also tried to 
put an encoding onto the file and use the actual utf8 characters ie

# -*- coding: utf-8 -*-
def func(a):
     """
     >>> print(func(u'aaa\u020b'))
     aaa╚ï
     """
     return a
def _doctest():
     import doctest
     doctest.testmod()

and that works in python3, but fails in python 2 with this
> (py27) C:\code\hg-repos>python tdt1.py
> C:\python\Lib\doctest.py:1531: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - in
> terpreting them as being unequal
>   if got == want:
> C:\python\Lib\doctest.py:1551: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - in
> terpreting them as being unequal
>   if got == want:
> **********************************************************************
> File "tdt1.py", line 4, in __main__.func
> Failed example:
>     print(func(u'aaa\u020b'))
> Expected:
>     aaa╚ï
> Got:
>     aaa╚ï
> **********************************************************************
> 1 items had failures:
>    1 of   1 in __main__.func
> ***Test Failed*** 1 failures.


--
Robin Becker

[toc] | [prev] | [next] | [standalone]

#64171

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2014-01-17 15:27 +0000
Message-ID	<52d94bea$0$29999$c3e8da3$5496439d@news.astraweb.com>
In reply to	#64158

On Fri, 17 Jan 2014 12:12:35 +0000, Robin Becker wrote:

> On 17/01/2014 11:41, Steven D'Aprano wrote:
>> def func(a):
>>      """
>>      >>> print(func(u'aaa'))
>>      aaa
>>      """
>>      return a
>
> I think this approach seems to work if I turn the docstring into unicode
> 
> def func(a):
> 	u"""
> 	>>> print(func(u'aaa\u020b'))
> 	aaa\u020b
> 	"""
> 	return a

Good catch! Without the u-prefix, the \u... is not interpreted as an 
escape sequence, but as a literal backslash-u.

> If I leave the u off the docstring it goes wrong in python 2.7. I also
> tried to put an encoding onto the file and use the actual utf8
> characters ie
> 
> # -*- coding: utf-8 -*-
> def func(a):
>      """
>      >>> print(func(u'aaa\u020b'))
>      aaa╚ï
>      """
>      return a

There seems to be some mojibake in your post, which confuses issues.

You refer to \u020b, which is LATIN SMALL LETTER I WITH INVERTED BREVE. 
At least, that's what it ought to be. But in your post, it shows up as 
the two character mojibake, ╚ followed by ï (BOX DRAWINGS DOUBLE UP AND 
RIGHT followed by LATIN SMALL LETTER I WITH DIAERESIS). It appears that 
your posting software somehow got confused and inserted the two 
characters which you would have got using cp-437 while claiming that they 
are UTF-8. (Your post is correctly labelled as UTF-8.)

I'm confident that the problem isn't with my newsreader, Pan, because it 
is pretty damn good at getting encodings right, but also because your 
post shows the same mojibake in the email archive:

https://mail.python.org/pipermail/python-list/2014-January/664771.html

To clarify: you tried to show \u020B as a literal. As a literal, it ought 
to be the single character ȋ which is a lower case I with curved accent on 
top. The UTF-8 of that character is b'\xc8\x8b', which in the cp-437 code 
page is two characters ╚ ï. 

py> '\u020b'.encode('utf8').decode('cp437')
'╚ï'

Hence, mojibake.

> def _doctest():
>      import doctest
>      doctest.testmod()
> 
> and that works in python3, but fails in python 2 with this
>> (py27) C:\code\hg-repos>python tdt1.py C:\python\Lib\doctest.py:1531:
>> UnicodeWarning: Unicode equal comparison failed to convert both
>> arguments to Unicode - in terpreting them as being unequal
>>   if got == want:
>> C:\python\Lib\doctest.py:1551: UnicodeWarning: Unicode equal comparison
>> failed to convert both arguments to Unicode - in terpreting them as
>> being unequal

I cannot replicate this specific exception. I think it may be a side-
effect of you being on Windows. (I'm on Linux, and everything is UTF-8.)

>>   if got == want:
>> **********************************************************************
>> File "tdt1.py", line 4, in __main__.func Failed example:
>>     print(func(u'aaa\u020b'))
>> Expected:
>>     aaa╚ï
>> Got:
>>     aaa╚ï

The difficulty here is that it is damn near impossible to sort out which, 
if any, bits are mojibake inserted by your posting software, which by 
your editor, your terminal, which by Python, and which are artifacts of 
the doctest system.

The usual way to debug these sorts of errors is to stick a call to repr() 
just before the print.

print(repr(func(u'aaa\u020b')))

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#64173

From	Robin Becker <robin@reportlab.com>
Date	2014-01-17 16:17 +0000
Message-ID	<mailman.5646.1389975458.18130.python-list@python.org>
In reply to	#64171

On 17/01/2014 15:27, Steven D'Aprano wrote:
..........
>>
>> # -*- coding: utf-8 -*-
>> def func(a):
>>       """
>>       >>> print(func(u'aaa\u020b'))
>>       aaa╚ï
>>       """
>>       return a
>
> There seems to be some mojibake in your post, which confuses issues.
>
> You refer to \u020b, which is LATIN SMALL LETTER I WITH INVERTED BREVE.
> At least, that's what it ought to be. But in your post, it shows up as
> the two character mojibake, ╚ followed by ï (BOX DRAWINGS DOUBLE UP AND
> RIGHT followed by LATIN SMALL LETTER I WITH DIAERESIS). It appears that
> your posting software somehow got confused and inserted the two
> characters which you would have got using cp-437 while claiming that they
> are UTF-8. (Your post is correctly labelled as UTF-8.)
>
> I'm confident that the problem isn't with my newsreader, Pan, because it
> is pretty damn good at getting encodings right, but also because your
> post shows the same mojibake in the email archive:
>
> https://mail.python.org/pipermail/python-list/2014-January/664771.html
>
> To clarify: you tried to show \u020B as a literal. As a literal, it ought
> to be the single character ȋ which is a lower case I with curved accent on
> top. The UTF-8 of that character is b'\xc8\x8b', which in the cp-437 code
> page is two characters ╚ ï.

when I edit the file in vim with ut88 encoding I do see your ȋ as the literal. 
However, as you note I'm on windows and no amount of cajoling will get it to 
work reasonably so my printouts are broken. So on windows

(py27) C:\code\hg-repos>python -c"print(u'aaa\u020b')"
aaa╚ï

on my linux

$ python2 -c"print(u'aaa\u020b')"
aaaȋ

$ python2 tdt1.py
/usr/lib/python2.7/doctest.py:1531: UnicodeWarning: Unicode equal comparison 
failed to convert both arguments to Unicode - interpreting them as being unequal
   if got == want:
/usr/lib/python2.7/doctest.py:1551: UnicodeWarning: Unicode equal comparison 
failed to convert both arguments to Unicode - interpreting them as being unequal
   if got == want:
**********************************************************************
File "tdt1.py", line 4, in __main__.func
Failed example:
     print(func(u'aaa\u020b'))
Expected:
     aaaȋ
Got:
     aaaȋ
**********************************************************************
1 items had failures:
    1 of   1 in __main__.func
***Test Failed*** 1 failures.
robin@everest ~/tmp:
$ cat tdt1.py
# -*- coding: utf-8 -*-
def func(a):
     """
     >>> print(func(u'aaa\u020b'))
     aaaȋ
     """
     return a
def _doctest():
     import doctest
     doctest.testmod()

if __name__ == "__main__":
     _doctest()
robin@everest ~/tmp:

so the error persists with our without copying errors.

Note that on my putty terminal I don't see the character properly (I see unknown 
glyph square box), but it copies OK.
-- 
Robin Becker

[toc] | [prev] | [standalone]

csiph-web

doctests compatibility for python 2 & python 3

Contents

#64154 — doctests compatibility for python 2 & python 3

#64157

#64158

#64171

#64173