Splash文档:
https://splash.readthedocs.io/en/stable/scripting-ref.html?highlight=proxy#splash-on-request
Splash中文文档:
https://splash-cn-doc.readthedocs.io/zh_CN/latest/at-last.html
Splash使用手册
https://blog.zhangkunzhi.com/2019/04/21/Splash%E4%BD%BF%E7%94%A8%E6%89%8B%E5%86%8C/index.html
设置窗口大小
splash:set_viewport_size(1920,1080)
禁用图片加载
splash.images_enabled = false
assert(splash:wait(0))似乎也能执行
鼠标点击
next_page = splash:select("#root > div > div.ant-row.c10-Cg > div:nth-child(1) > div > div.ant-col-20.ant-col-push-4.c1z9Ut > div.c3gNPq > div > ul > li.ant-pagination-next")
next_page:mouse_click()
设置代理:
function main(splash, args)
splash:on_request(function(request)
request:set_proxy{
host = "196.18.16.128",
port = 8800
}
end)
assert(splash:go(args.url))
assert(splash:wait(0))
return {
html = splash:html(),
png = splash:png(),
har = splash:har(),
}
end
Scrapy中设置代理:
setting:
SPLASH_URL = 'http://10.12.5.66:8050'
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'
ROBOTSTXT_OBEY = False
SPIDER_MIDDLEWARES = {
# 'scrapySplash.middlewares.ScrapysplashSpiderMiddleware': 543,
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DOWNLOADER_MIDDLEWARES = {
# 'scrapySplash.middlewares.ScrapysplashDownloaderMiddleware': 543,
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
'scrapySplash.middlewares.FirstsplashSpiderMiddleware': 543,
}
spider.py
start_urls = ['http://httpbin.org/get']
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url=url, callback=self.parse, args={'lua_source': script, 'proxy': 'http://196.18.16.128:8800'})
或者
middleware.py
class FirstsplashSpiderMiddleware:
def process_request(self, request, spider):
print("进入代理")
request.meta['splash']['args']['proxy'] = "http://196.18.16.128:8800"
Python爬虫:splash+requests简单示例
https://blog.csdn.net/mouday/article/details/82843401
Splash翻页并截图对比
function main(splash, args)
splash:set_viewport_size(1920,7000)
splash:on_request(function(request)
request:set_proxy{
host = "86.62.56.137",
port = 8800
}
end)
splash.images_enabled = false
assert(splash:go(args.url))
assert(splash:wait(0.5))
png1 = splash:png()
next_page = splash:select("#root > div > div.ant-row.c10-Cg > div:nth-child(1) > div > div.ant-col-20.ant-col-push-4.c1z9Ut > div.c3gNPq > div > ul > li.ant-pagination-next")
next_page:mouse_click()
assert(splash:wait(0.5))
return {
html = splash:html(),
png1 = png1,
png = splash:png()
}
end