爬取携程中评论的数据

最新推荐文章于 2024-07-19 15:37:43 发布

LKIDTI数据

最新推荐文章于 2024-07-19 15:37:43 发布

阅读量2.6k

点赞数 3

文章标签：爬虫 selenium python 旅游评论

本文链接：https://blog.csdn.net/weixin_45014634/article/details/103372214

版权

爬取携程中评论的数据

1、爬取评论的发布者
2、爬取评论发布的时间
3、爬取评论的内容

在爬取这个携程数据时，将使用selenium自动化的去获取网页数据将网页数据下载下来，使用的是chrom驱动程序，打开网页，如果不会配置，请在评论区提出，我会补录此段：望本文对您有所帮助：

from scrapy import Selector
from selenium import webdriver
import time

# 声明浏览器
browser = webdriver.Chrome ()
browser.get ("URL(请自行补充携程网页地址)")


def parse_page():
    sel = Selector (text=browser.page_source)
    time.sleep (1)

    authors = sel.xpath ('//div[@class="user-date"]/span/text()').extract ()
    # write_times=sel.xpath('//div[@class="user-date"]/span/text()').extract()[i]
    comments = sel.xpath (' //ul[@class="comments"]/li/p/text()').extract ()
    # print (authors)
    # # print(write_times)
    # print (comments)
    author = authors[::3]
    # print (author)
    time_comments = authors[2::3]

    for author, time_comment, comment in zip (author, time_comments, comments):
        with open ('评论.txt', 'a+', encoding='utf-8') as f:
            f.write (
                "评论人：" + author + '\t' + "评论时间" + time_comment + '\t' + "评论内容：" + comment.strip (
                    '\n') + '\n')

    bonwon = browser.find_element_by_xpath ('//ul[@class="pkg_page"]/a[last()]')
    bonwon.click ()

    for i in range (0, 15):
        parse_page ()


if __name__ == '__main__':
    parse_page ()