python爬取苏宁商品评论

最新推荐文章于 2021-11-02 13:30:58 发布

夜的乄第七章

最新推荐文章于 2021-11-02 13:30:58 发布

阅读量895

点赞数

文章标签： python json 爬虫

本文链接：https://blog.csdn.net/coffeetogether/article/details/114296930

版权

python爬取苏宁商品评论

爬取其他电商物品评论的案例如下：

https://blog.csdn.net/coffeetogether/article/details/114296159
https://blog.csdn.net/coffeetogether/article/details/114274960?spm=1001.2014.3001.5501

以苏宁家电为例

1.找到目标的url：

2.检查响应结果

在这里插入图片描述

3.解析数据

注：需要手动将json数据中的干扰信息去除，（还有最后的小括号)。在代码中通过正则去除干扰信息

在这里插入图片描述

4.找到翻页规律：

http://review.suning.com/ajax/cluster_review_lists/cluster-37502374-000000012031487720-0000000000-total-1-default-10-----reviewList.htm?callback=reviewList
http://review.suning.com/ajax/cluster_review_lists/cluster-37502374-000000012031487720-0000000000-total-2-default-10-----reviewList.htm?callback=reviewList
http://review.suning.com/ajax/cluster_review_lists/cluster-37502374-000000012031487720-0000000000-total-3-default-10-----reviewList.htm?callback=reviewList

通过对比url发现，不同页url的规律在于参数total之后的数字。

解析完毕，上代码：

import requests
import re
import json
import jsonpath

if __name__ == '__main__':
    # 手动输入要爬取的页数
    pages = int(input('请输入要爬取的页数：'))
    # 创建for循环进行翻页操作
    for i in range(pages):
        page = i+1
        # 确认目标的url
        url_ = f'http://review.suning.com/ajax/cluster_review_lists/cluster-37502374-000000012031487720-0000000000-total-{page}-default-10-----reviewList.htm?callback=reviewList'
        # 创建请求头参数
        headers = {
            'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'
        }
        # 发送请求，获取相应
        response = requests.get(url_,headers=headers)
        # 通过正则去除多余的信息
        str_data = re.findall(r'reviewList\((.*?)\)',response.text)[0]
        # 将数据转换为python 数据
        py_data = json.loads(str_data)
        # 提取用户id和评论
        id_list = jsonpath.jsonpath(py_data,'$..nickName')
        comment_list = jsonpath.jsonpath(py_data,'$.commodityReviews[*].content')
        # 创建字典，保存id和评论
        for i in range(len(id_list)):
            dict_ = {}
            dict_[id_list[i]] = comment_list[i]
            json_data = json.dumps(dict_,ensure_ascii=False)+',\n'
            with open('翻页苏宁商品评论.json','a',encoding='utf-8')as f:
                f.write(json_data)

爬取了三页

执行结果如下：

在这里插入图片描述

夜的乄第七章

关注

0
点赞
踩
9

收藏

觉得还不错? 一键收藏
打赏
2
评论
python爬取苏宁商品评论

python爬取苏宁商品评论爬取其他电商物品评论的案例如下：https://blog.csdn.net/coffeetogether/article/details/114296159https://blog.csdn.net/coffeetogether/article/details/114274960?spm=1001.2014.3001.5501以苏宁家电为例1.找到目标的url：2.检查响应结果3.解析数据注：需要手动将json数据中的干扰信息去除，（还有最后的小括号)。在代码中
复制链接

扫一扫