爬虫学习日记：爬取京东网站商品评论的实例代码

最新推荐文章于 2024-03-18 19:27:06 发布

S1901

最新推荐文章于 2024-03-18 19:27:06 发布

阅读量1.5k

点赞数 4

分类专栏：爬虫文章标签：爬虫

本文链接：https://blog.csdn.net/S1901/article/details/117412044

版权

爬虫专栏收录该内容

11 篇文章 3 订阅

订阅专栏

爬取JD网站商品评论的实例代码

以爬取JD网站商品评论为例，并将实例代码附在下面，对于代码文末有逐行解释
代码：

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
import csv
from lxml import etree

c = open("dytttest.csv", "w",encoding='utf-8',newline='')
writer = csv.writer(c)
writer.writerow(['星级','评价','评价时间'])

class JDSpider(object):
    driver_path=r"G:\chromedriver\chromedriver.exe"
    def __init__(self):
        self.driver=webdriver.Chrome(executable_path=JDSpider.driver_path)
        self.url="https://item.jd.com/30532212324.html"
    def run(self):
        self.driver.get(self.url)
        comment_button=self.driver.find_element_by_xpath(
            "//div[@id='detail']/div[1]/ul/li[5]")
        WebDriverWait(self.driver, 10).until(
            EC.presence_of_all_elements_located((By.XPATH, "//div[@id='detail']/div[1]/ul/li[5]")))
        self.driver.execute_script("arguments[0].click()", comment_button)
        while True:
            WebDriverWait(self.driver, 10).until(
                EC.presence_of_all_elements_located((By.XPATH, "//div[@id='comment-0']/div[12]/div/div/a[last()]")))
            source = self.driver.page_source
            self.parse_detail_page(source)
            next_button=self.driver.find_element_by_xpath("//div[@id='comment-0']/div[12]/div/div/a[last()]")
            if "ui-page-curr" in next_button.get_attribute("class"):
                break
            else:
                self.driver.execute_script("arguments[0].click()", next_button)
                time.sleep(2)
        self.driver.close()
    def parse_detail_page(self,source):
        html=etree.HTML(source)
        for i in range(0, 9):
            templist = []
            stars=html.xpath("//div[@id='comment-0']/div/div[2]/div[1]/@class")[i].replace("comment-star ","")
            templist.append(stars)
            contain=html.xpath("//div[@id='comment-0']/div/div[2]/p")[i]
            contain1=str(contain.xpath("./text()")).replace("[","").replace("]","").replace(",","").replace(" ","").replace("'","")
            templist.append(contain1)
            times=html.xpath("//div[@id='comment-0']/div/div[2]/div/div[1]/span[5]/text()")[i]
            templist.append(times)
            position={"stars":stars,"contain":contain1,"time":times}
            writer.writerow(templist)
            print(position)

if __name__ == '__main__':
    JD=JDSpider()
    JD.run()

c.close()

通过运行如上代码，可以得到运行结果：
在这里插入图片描述
以及相对应的csv文件：

如上图，我们成功爬取到商品的好评星数、评论内容、评论时间，并生成了相对应的csv文件。
代码分析：
（1）1~4行
导入selenium组件的webdriver自动测试工具。
（2）5~6行
导入time包，csv包
（3）第7行
etree.HTML()方法可以用来解析字符串格式的HTML文档对象，使用xpath()方法获取html
（4）9~11行
打开test.csv文件，并注入标题
（5）第13行
定义JDSpider类，并定义 __init__和run 方法
（6）第14行
确定chromedriver.exe浏览器内核驱动的路径
（7）15~17行
驱动chromedriver打开指定的网页
（8）18~36行
模拟人的点击行为，打开商品评论界面，并点击下一页。在循环点击下一页的时候，调用parse_detail_page方法，爬取指定的信息
（9）37~50行
编写parse_detail_page方法，通过循环方法爬取指定的信息
（10）52~56行
运行JDSpider和run方法，最后关闭csv文件

本文章仅供学习讨论交流，请勿做其它用途，侵删，谢谢。

S1901

关注

4
点赞
踩
28

收藏

觉得还不错? 一键收藏
打赏
3
评论
爬虫学习日记：爬取京东网站商品评论的实例代码

爬取JD网站商品评论的实例代码以爬取糗事百科为例，并将实例代码附在下面，对于代码文末有逐行解释。代码：from selenium import webdriverfrom selenium.webdriver.support.wait import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as ECfrom selenium.webdriver.common.by import Byimp
复制链接

扫一扫