爬取携程中评论的数据
1、爬取评论的发布者
2、爬取评论发布的时间
3、爬取评论的内容
在爬取这个携程数据时,将使用selenium自动化的去获取网页数据将网页数据下载下来,使用的是chrom驱动程序,打开网页,如果不会配置,请在评论区提出,我会补录此段:望本文对您有所帮助:
from scrapy import Selector
from selenium import webdriver
import time
# 声明浏览器
browser = webdriver.Chrome ()
browser.get ("URL(请自行补充携程网页地址)")
def parse_page():
sel = Selector (text=browser.page_source)
time.sleep (1)
authors = sel.xpath ('//div[@class="user-date"]/span/text()').extract ()
# write_times=sel.xpath('//div[@class="user-date"]/span/text()').extract()[i]
comments = sel.xpath (' //ul[@class="comments"]/li/p/text()').extract ()
# print (authors)
# # print(write_times)
# print (comments)
author = authors[::3]
# print (author)
time_comments = authors[2::3]
for author, time_comment, comment in zip (author, time_comments, comments):
with open ('评论.txt', 'a+', encoding='utf-8') as f:
f.write (
"评论人:" + author + '\t' + "评论时间" + time_comment + '\t' + "评论内容:" + comment.strip (
'\n') + '\n')
bonwon = browser.find_element_by_xpath ('//ul[@class="pkg_page"]/a[last()]')
bonwon.click ()
for i in range (0, 15):
parse_page ()
if __name__ == '__main__':
parse_page ()