3中方式任选一种即可
1、lua中脚本设置代理和请求头:
function main(splash, args)
-- 设置代理
splash:on_request(function(request)
request:set_proxy{
host = "27.0.0.1",
port = 8000,
}
end)
-- 设置请求头
splash:set_user_agent("Mozilla/5.0")
-- 自定义请求头
splash:set_custom_headers({
["Accept"] = "application/json, text/plain, */*"
})
splash:go("https://www.baidu.com/")
return splash:html()
2、scrapy中设置代理
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url,
endpoint='execute',
args={'wait': 5,
'lua_source': source,
'proxy': 'http://proxy_ip:proxy_port'
}
scrapy中设置请求头一样的在headers中设置
3、中间件中设置代理
class ProxyMiddleware(object):
def process_request(self, request, spider):
request.meta['splash']['args']['proxy'] = proxyServer
request.headers["Proxy-Authorization"] = proxyAuth
参考: