Splash学习笔记

Splash文档:
https://splash.readthedocs.io/en/stable/scripting-ref.html?highlight=proxy#splash-on-request
Splash中文文档:
https://splash-cn-doc.readthedocs.io/zh_CN/latest/at-last.html
Splash使用手册
https://blog.zhangkunzhi.com/2019/04/21/Splash%E4%BD%BF%E7%94%A8%E6%89%8B%E5%86%8C/index.html

设置窗口大小

splash:set_viewport_size(1920,1080)

禁用图片加载

splash.images_enabled = false

assert(splash:wait(0))似乎也能执行

鼠标点击

next_page = splash:select("#root > div > div.ant-row.c10-Cg > div:nth-child(1) > div > div.ant-col-20.ant-col-push-4.c1z9Ut > div.c3gNPq > div > ul > li.ant-pagination-next")
  next_page:mouse_click()

设置代理:

function main(splash, args)
  splash:on_request(function(request)
      request:set_proxy{
        host = "196.18.16.128",
        port = 8800
      }
	end)
  assert(splash:go(args.url))
  assert(splash:wait(0))
  return {
    html = splash:html(),
    png = splash:png(),
    har = splash:har(),
  }
end

Scrapy中设置代理:

setting:

SPLASH_URL = 'http://10.12.5.66:8050'
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'

ROBOTSTXT_OBEY = False

SPIDER_MIDDLEWARES = {
   # 'scrapySplash.middlewares.ScrapysplashSpiderMiddleware': 543,
    'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}

DOWNLOADER_MIDDLEWARES = {
   # 'scrapySplash.middlewares.ScrapysplashDownloaderMiddleware': 543,
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
    'scrapySplash.middlewares.FirstsplashSpiderMiddleware': 543,
}

spider.py

start_urls = ['http://httpbin.org/get']

    def start_requests(self):
        for url in self.start_urls:
            yield SplashRequest(url=url, callback=self.parse, args={'lua_source': script, 'proxy': 'http://196.18.16.128:8800'})

或者
middleware.py

class FirstsplashSpiderMiddleware:
    def process_request(self, request, spider):
        print("进入代理")
        request.meta['splash']['args']['proxy'] = "http://196.18.16.128:8800"

Python爬虫:splash+requests简单示例

https://blog.csdn.net/mouday/article/details/82843401

Splash翻页并截图对比

function main(splash, args)
  splash:set_viewport_size(1920,7000)
  splash:on_request(function(request)
      request:set_proxy{
        host = "86.62.56.137",
        port = 8800
      }
	end)
	splash.images_enabled = false
  assert(splash:go(args.url))
  assert(splash:wait(0.5))
  png1 = splash:png()
  next_page = splash:select("#root > div > div.ant-row.c10-Cg > div:nth-child(1) > div > div.ant-col-20.ant-col-push-4.c1z9Ut > div.c3gNPq > div > ul > li.ant-pagination-next")
  next_page:mouse_click()
  assert(splash:wait(0.5))
  return {
    html = splash:html(),
    png1 = png1,
    png = splash:png()
  }
end
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值