记录一下JD爬商品评论的过程

最新推荐文章于 2023-07-12 09:57:02 发布

沐鐸丶

最新推荐文章于 2023-07-12 09:57:02 发布

阅读量665

点赞数

分类专栏： Python学习日记文章标签： python 爬虫

本文链接：https://blog.csdn.net/ahlb_hl/article/details/122271438

版权

快代理动态IP爬JD商品评论

摘要由CSDN通过智能技术生成

需求是爬商品的好中差评以及追评，并存入MySQL

满满的干货，直接上代码，伸手党修改一下逻辑、参数可用

由于会检测IP，频繁抓取会短时间屏蔽IP,这里使用了快代理动态获取IP，我用的是时效一分钟的IP，这种写法比较浪费，可以自行修改

import re
import requests
import json
import pymysql
from datetime import datetime
import random

# JD商品类型
# score=0 全部商品  score=1差评   score=2中评  score=3好评  score=4晒图   score=5追评  score=7视频晒单
# sortType=5 默认排序  sortType=6时间排序

product_id = "XX"  # jd商品 编号
score_list = [1,2,3,5]  # 取差评，中评，好评的数据
number = 0


# 取商品评论方法 参数-起始页数-最大页数-商品id-评价类型
def SaveCommentData(minIndex,maxIndex,productId,i):

    scoreTypeSelect = i
    try:
        url = "https://club.jd.com/comment/skuProductPageComments.action?callback=fetchJSON_comment98&productId=" + product_id + "&score=" + str(scoreTypeSelect) + "&sortType=5&page={}&pageSize=10&isShadowSku=0&fold=1"
        # API接口返回的ip
        proxy_ip = requests.get(api_url).json()['data']['proxy_list']
        print(proxy_ip);
        # 通过requests模块获取到网页信息,这里用到了快代理获取动态ip
        proxies = {
            "https": "http://%(user)s:%(pwd)s@%(proxy)s/" % {'user': username, 'pwd': password,'proxy': random.choice(proxy_ip)}
        }

        # 定义用于接收数据的列表
        id, guid, nickname, referenceTime, referenceName, productContent, productColor, productSize, score, scoreType, replyCount, usefulVoteCount,userClient, mobileVersion,days,afterDays= [], [], [], [], [], [], [], [], [],[], [], [], [], [], [], []
        afterUserComment, afterUserCreatedTime = [],[]
        # 循环评价页数
        for index in range(minIndex, maxIndex):
            content = requests.get(url=url.format(index), prox