scrapy+splash 的使用

最新推荐文章于 2023-09-27 09:27:17 发布

qq_45911550

最新推荐文章于 2023-09-27 09:27:17 发布

阅读量168

点赞数

分类专栏：网络爬虫 python

本文链接：https://blog.csdn.net/qq_45911550/article/details/113663475

版权

python 同时被 2 个专栏收录

47 篇文章 0 订阅

订阅专栏

网络爬虫

14 篇文章 2 订阅

订阅专栏

使用splash需要安装docker，我这里使用的是ubuntu20.04.

1、安装docker

sudo apt install docker.io
sudo docker pull scrapinghub/splash

1、开放端口

安装完成之后需要开放本机的8050和8051端口,不然运行不了爬虫。

sudo docker run -p 8050:8050 -p 8051:8051 scrapinghub/splash

3、在爬虫中使用splash

3.1 安装splash

pip3 install scrapy-splash

3.2 在scrapy的setting.py文件中添加以下内容
在这里插入图片描述

# splash server url address
SPLASH_URL="http://localhost:8050"

# open splash's two downloader and adjust the sort of HttpCompressionMiddleware
DOWNLOADER_MIDDLEWARES={
    'scrapy_splash.SplashCookiesMiddleware':723,
    'scrapy_splash.SplashMiddleware':725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware':810,
}

# setting the filter
DUPEFILTER_CLASS='scrapy_splash.SplashAwareDupeFilter'

# to support cache_args (could choose)
#SPIDER_MIDDLEWARES={
#    'scrapy_splash.SplashDeduplicateArgsMiddlewares':100,
#}

3.3 编写spider
以下是我的各个文件（只修改过setting.py和quotes.py）
在这里插入图片描述
3.4 运行爬虫

scrapy crawl quotes -o quotes.csv

查看quote.csv

cat -n quotes.csv

4、运行结果

在这里插入图片描述

5、补充

在这里插入图片描述

qq_45911550

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
scrapy+splash 的使用

使用splash需要安装docker，我这里使用的是ubuntu20.04.1、安装dockersudo apt install docker.iosudo docker pull scrapinghub/splash1、开放端口安装完成之后需要开放本机的8050和8051端口,不然运行不了爬虫。sudo docker run -p 8050:8050 -p 8051:8051 scrapinghub/splash3、在爬虫中使用splash3.1 安装splashpip3 insta
复制链接

扫一扫

专栏目录