scrapy 服务器中使用无头selenium 解析网页

最新推荐文章于 2023-03-27 18:27:15 发布

wto882dim

最新推荐文章于 2023-03-27 18:27:15 发布

阅读量585

点赞数

分类专栏： ubuntu Python 爬虫文章标签： selenium scrapy

本文链接：https://blog.csdn.net/wto882dim/article/details/101217643

版权

ubuntu 同时被 3 个专栏收录

126 篇文章 0 订阅

订阅专栏

Python

114 篇文章 2 订阅

订阅专栏

爬虫

11 篇文章 2 订阅

订阅专栏

scrapy 使用 selenium

以下是middlewares.py代码

from selenium import webdriver
import time
from scrapy.http.response.html import HtmlResponse

class SeleniumDownloadMiddleware(object):
    def __init__(self):
        chromeOptions = webdriver.ChromeOptions()
        # 加载无窗口浏览器
        chromeOptions.add_argument('--headless')
        chromeOptions.add_argument('--disable-dev-shm-usage')
        chromeOptions.add_argument('--no-sandbox')  # 以根用户打身份运行Chrome，使用-no-sandbox标记重新运行Chrome,禁止沙箱启动
        self.driver = webdriver.Chrome(chrome_options=chromeOptions)


    def process_request(self,request, spider):
        self.driver.get(request.url)
        time.sleep(1)
        try:
            while True:
                showMore = self.driver.find_element_by_class_name('show-more')
                showMore.click()
                time.sleep(0.3)
                if not showMore:
                    break
        except:
            pass
        source = self.driver.page_source
        response = HtmlResponse(url=self.driver.current_url, body=source, request=request, encoding='utf-8')
        return response

wto882dim

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
scrapy 服务器中使用无头selenium 解析网页

scrapy使用selenium以下是middlewares.py代码from selenium import webdriverimport timefrom scrapy.http.response.html import HtmlResponseclass SeleniumDownloadMiddleware(object): def __init__(self...
复制链接

扫一扫

专栏目录