瞎哔哔
写小爬虫用xpath取dom数据,直接re切太麻烦了,记录一下
import
anaconda里面自带了,没有就自己pip一下
from lxml import etree
使用
#先爬一个网页下来,就百度吧
response=requests.get(url='https://www.baidu.com/')
#做成dom树
baidu = etree.HTML(response.text)
#用xpath取节点,取出来是list
baidu_div = html.xpath('//input')
#list里面取一个打印出来
baidu_data = etree.tostring(baidu_div[0],pretty_print=True,encoding='utf-8')
#解码打印
print(baidu_data.decode('utf-8'))