python爬虫-京东商品评论

最新推荐文章于 2024-05-12 12:52:39 发布

山林里的迷路人

最新推荐文章于 2024-05-12 12:52:39 发布

阅读量408

点赞数

分类专栏： python爬虫

本文链接：https://blog.csdn.net/weixin_44822403/article/details/119707560

版权

python爬虫专栏收录该内容

5 篇文章 4 订阅

订阅专栏

对京东某商品的评论进行爬虫并保存

import random
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
import time
from bs4 import BeautifulSoup
import pandas as pd
from selenium.webdriver.common.keys import Keys

browser = webdriver.Chrome(executable_path='C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe')
browser.get('https://item.jd.com/100007264815.html')



browser.switch_to.window(browser.window_handles[0])
time.sleep(3)
browser.switch_to.window(browser.window_handles[-1])

comments = []

for k in range (0,100):

    try:
        js = "var q=document.documentElement.scrollTop=1000000"
        browser.execute_script(js)  # 使用javascript命令，网页下拉到最底部
        elements = browser.find_element_by_xpath('//*[@id="comment-4"]')
        InnerElement = elements.get_attribute('innerHTML')
        soup = BeautifulSoup(InnerElement, 'lxml')
        comments_data = soup.find_all('p', {'class': 'comment-con'})

        if not comments_data:
            print('无评论')
            break

        for j in comments_data:
            comments.append(j.get_text())
            print(comments)
        time.sleep(random.random() * 5 + 1)

    except Exception as e:
        print(e)

    try:
        element = browser.find_element_by_css_selector(
            '#comment-4 > div.com-table-footer > div > div > a.ui-pager-next')

        browser.execute_script("arguments[0].click();",
                               element)  # comment-4 > div.com-table-footer > div > div > a.ui-pager-next
        time.sleep(1)
    except Exception as e:
        print(e)
df = pd.DataFrame(comments)

df.to_csv('狗粮好评.csv', index=False, encoding='utf_8_sig')

在这里插入图片描述

山林里的迷路人

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
0
评论
python爬虫-京东商品评论

对京东某商品的评论进行爬虫并保存import randomfrom selenium import webdriverfrom selenium.webdriver import ActionChainsfrom selenium.webdriver.support.ui import WebDriverWaitimport timefrom bs4 import BeautifulSoupimport pandas as pdfrom selenium.webdriver.common.k
复制链接

扫一扫