报错如下:
lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: meta line 4 and
例子:
html_str = ''' <html lang="en"> <head> <meta charset="UTF-8"> <title>The Dormouse's story</title> </head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters;and their names were <a href="http://example.com/elsie" class="sister" id="link1"><!--Elsie--></a>, <a href="http://example.com/lacie" class="sister" id="link2"><!--Lacie--></a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the boottom of a well. </p> <p class="story">...</p> </body> </html> '''
from lxml import etree
html = etree.parse('index.html')
result = etree.tostring(html,pretty_print=True)
print result
改为
from lxml import etree
parser = etree.HTMLParser(encoding='utf-8')
html = etree.parse('index.html',parser=parser)
result = etree.tostring(html,pretty_print=True)
print result
参考文档:
https://blog.csdn.net/qq_38418803/article/details/108630379