“”"
html = etree.HTML(wb_data)
print(html)
result = etree.tostring(html)
print(result.decode(“utf-8”))
3、获取某个标签的内容(基本使用),注意,获取a标签的所有内容,a后面就不用再加正斜杠,否则报错。
写法一
html = etree.HTML(wb_data)
html_data = html.xpath(‘/html/body/div/ul/li/a’)
print(html)
for i in html_data:
print(i.text)
<Element html at 0x12fe4b8> first item second item third item fourth item fifth item 写法二(直接在需要查找内容的标签后面加一个/text()就行) html = etree.HTML(wb_data) html_data = html.xpath(‘/html/body/div/ul/li/a/text()’) print(html) for i in html_data: print(i)
<Element html at 0x138e4b8> first item second item third item fourth item fifth item 4、打开读取html文件 #使用parse打开html的文件 html = etree.parse(‘test.html’) html_data = html.xpath(‘//*’) #打印是一个列表,需要遍历 print(html_data) for i in html_data: print(i.text) html = etree.parse(‘test.html’) html_data = etree.tostring(html,pretty_print=True) res = html_data.decode(‘utf-8’) print(res)