Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #64154 > unrolled thread
| Started by | Robin Becker <robin@reportlab.com> |
|---|---|
| First post | 2014-01-17 11:16 +0000 |
| Last post | 2014-01-17 16:17 +0000 |
| Articles | 5 — 2 participants |
Back to article view | Back to comp.lang.python
doctests compatibility for python 2 & python 3 Robin Becker <robin@reportlab.com> - 2014-01-17 11:16 +0000
Re: doctests compatibility for python 2 & python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-17 11:41 +0000
Re: doctests compatibility for python 2 & python 3 Robin Becker <robin@reportlab.com> - 2014-01-17 12:12 +0000
Re: doctests compatibility for python 2 & python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-17 15:27 +0000
Re: doctests compatibility for python 2 & python 3 Robin Becker <robin@reportlab.com> - 2014-01-17 16:17 +0000
| From | Robin Becker <robin@reportlab.com> |
|---|---|
| Date | 2014-01-17 11:16 +0000 |
| Subject | doctests compatibility for python 2 & python 3 |
| Message-ID | <mailman.5634.1389957389.18130.python-list@python.org> |
I have some problems making some doctests for python2 code compatible with
python3. The problem is that as part of our approach we are converting the code
to use unicode internally. So we allow eihter byte strings or unicode in inputs,
but we are trying to convert to unicode outputs.
That makes doctests quite hard as
def func(a):
"""
>>> func(u'aaa')
'aaa'
"""
return a
fails in python2 whilst
def func(a):
"""
>>> func(u'aaa')
u'aaa'
"""
return a
fails in python3. Aside from changing the tests so they look like
"""
>>> func(u'aaa')==u'aaa'
True
"""
which make the test utility harder. If the test fails I don't see the actual
outcome and expected I see expected True got False.
Is there an easy way to make these kinds of tests work in python 2 & 3?
--
Robin Becker
[toc] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-01-17 11:41 +0000 |
| Message-ID | <52d916e4$0$29999$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #64154 |
On Fri, 17 Jan 2014 11:16:17 +0000, Robin Becker wrote:
> I have some problems making some doctests for python2 code compatible
> with python3. The problem is that as part of our approach we are
> converting the code to use unicode internally. So we allow eihter byte
> strings or unicode in inputs, but we are trying to convert to unicode
> outputs.
Alas, I think you've run into one of the weaknesses of doctest. Don't get
me wrong, I am a huge fan of doctest, but it is hard to write polyglot
string tests with it, as you have discovered.
However, you may be able to get 95% of the way by using print.
def func(a):
"""
>>> print(func(u'aaa'))
aaa
"""
return a
ought to behave identically in both Python 2 and Python 3.3, provided you
only print one object at a time. This ought to work with both ASCII and
non-ASCII (at least in the BMP).
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Robin Becker <robin@reportlab.com> |
|---|---|
| Date | 2014-01-17 12:12 +0000 |
| Message-ID | <mailman.5637.1389960773.18130.python-list@python.org> |
| In reply to | #64157 |
On 17/01/2014 11:41, Steven D'Aprano wrote:
> def func(a):
> """
> >>> print(func(u'aaa'))
> aaa
> """
> return a
I think this approach seems to work if I turn the docstring into unicode
def func(a):
u"""
>>> print(func(u'aaa\u020b'))
aaa\u020b
"""
return a
def _doctest():
import doctest
doctest.testmod()
if __name__ == "__main__":
_doctest()
If I leave the u off the docstring it goes wrong in python 2.7. I also tried to
put an encoding onto the file and use the actual utf8 characters ie
# -*- coding: utf-8 -*-
def func(a):
"""
>>> print(func(u'aaa\u020b'))
aaaȋ
"""
return a
def _doctest():
import doctest
doctest.testmod()
and that works in python3, but fails in python 2 with this
> (py27) C:\code\hg-repos>python tdt1.py
> C:\python\Lib\doctest.py:1531: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - in
> terpreting them as being unequal
> if got == want:
> C:\python\Lib\doctest.py:1551: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - in
> terpreting them as being unequal
> if got == want:
> **********************************************************************
> File "tdt1.py", line 4, in __main__.func
> Failed example:
> print(func(u'aaa\u020b'))
> Expected:
> aaaȋ
> Got:
> aaaȋ
> **********************************************************************
> 1 items had failures:
> 1 of 1 in __main__.func
> ***Test Failed*** 1 failures.
--
Robin Becker
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-01-17 15:27 +0000 |
| Message-ID | <52d94bea$0$29999$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #64158 |
On Fri, 17 Jan 2014 12:12:35 +0000, Robin Becker wrote:
> On 17/01/2014 11:41, Steven D'Aprano wrote:
>> def func(a):
>> """
>> >>> print(func(u'aaa'))
>> aaa
>> """
>> return a
>
> I think this approach seems to work if I turn the docstring into unicode
>
> def func(a):
> u"""
> >>> print(func(u'aaa\u020b'))
> aaa\u020b
> """
> return a
Good catch! Without the u-prefix, the \u... is not interpreted as an
escape sequence, but as a literal backslash-u.
> If I leave the u off the docstring it goes wrong in python 2.7. I also
> tried to put an encoding onto the file and use the actual utf8
> characters ie
>
> # -*- coding: utf-8 -*-
> def func(a):
> """
> >>> print(func(u'aaa\u020b'))
> aaaȋ
> """
> return a
There seems to be some mojibake in your post, which confuses issues.
You refer to \u020b, which is LATIN SMALL LETTER I WITH INVERTED BREVE.
At least, that's what it ought to be. But in your post, it shows up as
the two character mojibake, ╚ followed by ï (BOX DRAWINGS DOUBLE UP AND
RIGHT followed by LATIN SMALL LETTER I WITH DIAERESIS). It appears that
your posting software somehow got confused and inserted the two
characters which you would have got using cp-437 while claiming that they
are UTF-8. (Your post is correctly labelled as UTF-8.)
I'm confident that the problem isn't with my newsreader, Pan, because it
is pretty damn good at getting encodings right, but also because your
post shows the same mojibake in the email archive:
https://mail.python.org/pipermail/python-list/2014-January/664771.html
To clarify: you tried to show \u020B as a literal. As a literal, it ought
to be the single character ȋ which is a lower case I with curved accent on
top. The UTF-8 of that character is b'\xc8\x8b', which in the cp-437 code
page is two characters ╚ ï.
py> '\u020b'.encode('utf8').decode('cp437')
'ȋ'
Hence, mojibake.
> def _doctest():
> import doctest
> doctest.testmod()
>
> and that works in python3, but fails in python 2 with this
>> (py27) C:\code\hg-repos>python tdt1.py C:\python\Lib\doctest.py:1531:
>> UnicodeWarning: Unicode equal comparison failed to convert both
>> arguments to Unicode - in terpreting them as being unequal
>> if got == want:
>> C:\python\Lib\doctest.py:1551: UnicodeWarning: Unicode equal comparison
>> failed to convert both arguments to Unicode - in terpreting them as
>> being unequal
I cannot replicate this specific exception. I think it may be a side-
effect of you being on Windows. (I'm on Linux, and everything is UTF-8.)
>> if got == want:
>> **********************************************************************
>> File "tdt1.py", line 4, in __main__.func Failed example:
>> print(func(u'aaa\u020b'))
>> Expected:
>> aaaȋ
>> Got:
>> aaaȋ
The difficulty here is that it is damn near impossible to sort out which,
if any, bits are mojibake inserted by your posting software, which by
your editor, your terminal, which by Python, and which are artifacts of
the doctest system.
The usual way to debug these sorts of errors is to stick a call to repr()
just before the print.
print(repr(func(u'aaa\u020b')))
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Robin Becker <robin@reportlab.com> |
|---|---|
| Date | 2014-01-17 16:17 +0000 |
| Message-ID | <mailman.5646.1389975458.18130.python-list@python.org> |
| In reply to | #64171 |
On 17/01/2014 15:27, Steven D'Aprano wrote:
..........
>>
>> # -*- coding: utf-8 -*-
>> def func(a):
>> """
>> >>> print(func(u'aaa\u020b'))
>> aaaȋ
>> """
>> return a
>
> There seems to be some mojibake in your post, which confuses issues.
>
> You refer to \u020b, which is LATIN SMALL LETTER I WITH INVERTED BREVE.
> At least, that's what it ought to be. But in your post, it shows up as
> the two character mojibake, ╚ followed by ï (BOX DRAWINGS DOUBLE UP AND
> RIGHT followed by LATIN SMALL LETTER I WITH DIAERESIS). It appears that
> your posting software somehow got confused and inserted the two
> characters which you would have got using cp-437 while claiming that they
> are UTF-8. (Your post is correctly labelled as UTF-8.)
>
> I'm confident that the problem isn't with my newsreader, Pan, because it
> is pretty damn good at getting encodings right, but also because your
> post shows the same mojibake in the email archive:
>
> https://mail.python.org/pipermail/python-list/2014-January/664771.html
>
> To clarify: you tried to show \u020B as a literal. As a literal, it ought
> to be the single character ȋ which is a lower case I with curved accent on
> top. The UTF-8 of that character is b'\xc8\x8b', which in the cp-437 code
> page is two characters ╚ ï.
when I edit the file in vim with ut88 encoding I do see your ȋ as the literal.
However, as you note I'm on windows and no amount of cajoling will get it to
work reasonably so my printouts are broken. So on windows
(py27) C:\code\hg-repos>python -c"print(u'aaa\u020b')"
aaaȋ
on my linux
$ python2 -c"print(u'aaa\u020b')"
aaaȋ
$ python2 tdt1.py
/usr/lib/python2.7/doctest.py:1531: UnicodeWarning: Unicode equal comparison
failed to convert both arguments to Unicode - interpreting them as being unequal
if got == want:
/usr/lib/python2.7/doctest.py:1551: UnicodeWarning: Unicode equal comparison
failed to convert both arguments to Unicode - interpreting them as being unequal
if got == want:
**********************************************************************
File "tdt1.py", line 4, in __main__.func
Failed example:
print(func(u'aaa\u020b'))
Expected:
aaaȋ
Got:
aaaȋ
**********************************************************************
1 items had failures:
1 of 1 in __main__.func
***Test Failed*** 1 failures.
robin@everest ~/tmp:
$ cat tdt1.py
# -*- coding: utf-8 -*-
def func(a):
"""
>>> print(func(u'aaa\u020b'))
aaaȋ
"""
return a
def _doctest():
import doctest
doctest.testmod()
if __name__ == "__main__":
_doctest()
robin@everest ~/tmp:
so the error persists with our without copying errors.
Note that on my putty terminal I don't see the character properly (I see unknown
glyph square box), but it copies OK.
--
Robin Becker
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web