爬取糗事百科段子（xpath）

最新推荐文章于 2021-12-26 03:04:01 发布

qq_43784519

最新推荐文章于 2021-12-26 03:04:01 发布

阅读量374

点赞数 2

分类专栏：爬虫文章标签： xpath python

本文链接：https://blog.csdn.net/qq_43784519/article/details/107408249

版权

爬虫专栏收录该内容

10 篇文章 0 订阅

订阅专栏

爬取糗事百科段子（xpath）

import requests
from lxml import etree

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)  Chrome/83.0.4103.116 Safari/537.36'
}   #加入请求头

url = 'https://www.qiushibaike.com/text/'
res = requests.get(url,headers = headers)
selector = etree.HTML(res.text)

url_infos = selector.xpath('//div[@class="article block untagged mb15 typs_hot"]')
for url_info in url_infos:
    id = url_info.xpath('div[1]/a[2]/h2/text()')[0].strip("\n")     #id
    age = url_info.xpath('div[1]/div/text()')[0]    #年龄
    content = url_info.xpath('a[1]/div/span/text()')[0].strip("\n") #内容
    like = url_info.xpath('div[2]/span[1]/i/text()')[0]     #点赞
    comment = url_info.xpath('div[2]/span[2]/a/i/text()')[0]    #评论数
    
    print("网名："+ id)
    print("年龄："+age)
    print(content)
    print("点赞："+ like+'\t'+"评论数："+comment+'\n')