案例: 使用request + lxml 爬取糗事百科每页的标题数据
代码如下图:
import requests
from lxml import etree
for i in range(0, 3):
# 设置user-agent
ua = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36" }
# 生成URL
url = "https://www.qiushibaike.com/8hr/page/" + str(i+1) + "/"
print(url)
# 获取网页数据
response = requests.get(url=url, headers=ua).text
# 将html转换成tree对象
tree = etree.HTML(response)
# 通过xpath表达式,获取标题文本信息
title_lst = tree.xpath('//a[@class="recmd-content"]/text()')
print('--------------------', len(title_lst))
# 打印具体的标题数据
for title in title_lst:
print(title)
输出结果:打印出页面上的标题数、具体的标题文本信息