Running HTML doctests
One of the interesting modules in the lxml.html package deals with
doctests. It can be hard to compare two HTML pages for equality, as
whitespace differences aren't meaningful and the structural formatting
can differ. This is even more a problem in doctests, where output is
tested for equality and small differences in whitespace or the order
of attributes can let a test fail. And given the verbosity of
tag-based languages, it may take more than a quick look to find the
actual differences in the doctest output.
Luckily, lxml provides the lxml.doctestcompare module that
supports relaxed comparison of XML and HTML pages and provides a
readable diff in the output when a test fails. The HTML comparison is
most easily used by importing the usedoctest module in a doctest:
>>>import lxml.html.usedoctest
Now, if you have an HTML document and want to compare it to an expected result
document in a doctest, you can do the following:
>>>import lxml.html
>>>html = lxml.html.fromstring('''\
...
...
Hi !
...
...''')
>>>print lxml.html.tostring(html)
Hi !
>>>print lxml.html.tostring(html)
Hi !
>>>print lxml.html.tostring(html)
Hi !
In documentation, you would likely prefer the pretty printed HTML output, as
it is the most readable. However, the three documents are equivalent from the
point of view of an HTML tool, so the doctest will silently accept any of the
above. This allows you to concentrate on readability in your doctests, even
if the real output is a straight ugly HTML one-liner.
Note that there is also an lxml.usedoctest module which you can
import for XML comparisons. The HTML parser notably ignores
namespaces and some other XMLisms.