python xpath 爬虫小试牛刀

最新推荐文章于 2024-08-09 08:31:03 发布

青停

最新推荐文章于 2024-08-09 08:31:03 发布

阅读量156

点赞数 1

分类专栏： Python爬虫文章标签： python

本文链接：https://blog.csdn.net/qq_34020468/article/details/120042166

版权

Python爬虫专栏收录该内容

1 篇文章 0 订阅

订阅专栏

案例：使用request + lxml 爬取糗事百科每页的标题数据

代码如下图：

import requests
from lxml import etree



for i in range(0, 3):
    # 设置user-agent
    ua = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36" }

    # 生成URL
    url = "https://www.qiushibaike.com/8hr/page/" + str(i+1) + "/"
    print(url)

    # 获取网页数据
    response = requests.get(url=url, headers=ua).text

    # 将html转换成tree对象
    tree = etree.HTML(response)

    # 通过xpath表达式，获取标题文本信息
    title_lst = tree.xpath('//a[@class="recmd-content"]/text()')
    print('--------------------', len(title_lst))

    # 打印具体的标题数据
    for title in title_lst:
        print(title)

输出结果：打印出页面上的标题数、具体的标题文本信息

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

青停

关注关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python xpath 爬虫小试牛刀

案例：使用request + lxml 爬取糗事百科每页的标题数据代码如下图：import requestsfrom lxml import etreefor i in range(0, 3): # 设置user-agent ua = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Sa
复制链接

扫一扫