Scrapy框架中结合splash 解析js ——环境配置

环境配置:

http://splash.readthedocs.io/en/stable/install.html

 

pip install scrapy-splash
 

 service docker start

 

docker pull scrapinghub/splash

 

docker run -p 8050:8050 scrapinghub/splash

 

----

 

settings.py

#--
SPLASH_URL = 'http://localhost:8050'
#--
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
#--
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
#--
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
#--
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'
import scrapy
from scrapy_splash import SplashRequest

class MySpider(scrapy.Spider):
    start_urls = ["http://example.com", "http://example.com/foo"]

    def start_requests(self):
        for url in self.start_urls:
            yield SplashRequest(url, self.parse, args={'wait': 0.5})

    def parse(self, response):
        # response.body is a result of render.html call; it
        # contains HTML processed by a browser.
        # ...       
参考链接: https://germey.gitbooks.io/python3webspider/content/7.2-Splash%E7%9A%84%E4%BD%BF%E7%94%A8.html
      http://blog.csdn.net/qq_23849183/article/details/51287935
      http://ae.yyuap.com/pages/viewpage.action?pageId=919763

  

转载于:https://www.cnblogs.com/fh-fendou/p/7612119.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值