解析原理:
-实例化etree对象
-调用xpath方法实现标签定位内容捕获
环境安装
-pip install lxml
实例化etree对象
-本地html源码数据加载到etree中
etree.parse(filePath)
-互联网源码加载其中
etree.Html('page_text')
- xpath('xpath表达式')
爬取美女网站:
# -*- coding: UTF-8 -*-
import os.path
import requests
from lxml import etree
if __name__=='__main__':
url='https://pic.netbian.com/4kmeinv/'
headers = {
'referer': 'https://www.qiushibaike.com/imgrank/',
'User-Agent': 'Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KH