lxml解析
from lxml import etree
text='''
The Dormouse's storyThe Dormouse's story
Once upon a time there were three little sisters; and their names were
Lacie and
and they lived at the bottom of a well.
'''
html=etree.HTML(text)
#读取文件
#html=etree.parse('test.html')
result=etree.tostring(html)
print(result)
输出结果,补全了html的标签
The Dormouse's storyThe Dormouse's story
Once upon a time there were three little sisters; and their names were
Lacie and
and they lived at the bottom of a well.
获取a标签和a的class
print html.xpath('//a')
#[, , ]
print html.xpath('//a/@href')
#['http://example.com/elsie', 'http://example.com/lacie', 'http://example.com/tillie']