python接入讯代理_scrapy中使用讯代理动态转发

最新推荐文章于 2024-06-13 14:47:28 发布

weixin_39609622

最新推荐文章于 2024-06-13 14:47:28 发布

阅读量276

点赞数

文章标签： python接入讯代理

scrapy源代码中查找http11.py文件，相对路径为：

Lib/site-packages/scrapy/core/downloader/handlers/http11.py

找到下面内容，注释掉：

if isinstance(agent, self._TunnelingAgent):

headers.removeHeader(b'Proxy-Authorization')

否则proxy-authorization会被去除，动态转发失效。

自定义下载中间件：

class ProxyIPMiddleware(object):

'''

随机更换代理ip

'''

def __init__(self):

self.orderno = "xxxxxxxxxxxx" # 订单号

self.secret = "xxxxxxxxxxx" # 秘钥

def process_request(self,request,spider):

print('====ProxyIPMiddleware====')

protocal = request.url.split(':')[0].strip().lower()

print(request.url,'protocal:',protocal)

ip = "forward.xdaili.cn" # 代理ip

port = "80" # 端口号

ip_port = ip + ":" + port

proxy = {"http": "http://" + ip_port, "https": "https://" + ip_port}

timestamp = str(int(time.time())) # 时间戳

string = "orderno=" + self.orderno + "," + "secret=" + self.secret + "," + "timestamp=" + timestamp

md5_string = hashlib.md5(string.encode()).hexdigest() # md5哈希，得到固定长度的字符串

sign = md5_string.upper() # 转换成大写字母

# 认证信息

auth = "sign=" + sign + "&" + "orderno=" + self.orderno + "&" + "timestamp=" + timestamp

print('auth:',auth)

request.headers['Proxy-Authorization'] = auth

#HTTP代理，只代理HTTP网站，HTTPS代理，只代理HTTPS网站

request.meta['proxy'] = proxy[protocal]

原文链接:https://blog.csdn.net/Kwoky/article/details/105417716

weixin_39609622

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python接入讯代理_scrapy中使用讯代理动态转发

scrapy源代码中查找http11.py文件，相对路径为：Lib/site-packages/scrapy/core/downloader/handlers/http11.py找到下面内容，注释掉：if isinstance(agent, self._TunnelingAgent):headers.removeHeader(b'Proxy-Authorization')否则proxy-autho...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。