1.初识指纹
一天遇到一个网站,忽然发现无论如何如何更换UA和代理请求都是403
通过Wireshark抓包发现居然使用ja3指纹
JA3指纹创建过程
经过多方搜索资料如何解决发现了一个比较好的三方库curl_cffi
pip install curl_cffi
用法如下
from curl_cffi import requests
# Notice the impersonate parameter
r = requests.get("https://tls.browserleaks.com/json", impersonate="chrome110")
print(r.json())
# output: {..., "ja3n_hash": "aa56c057ad164ec4fdcb7a5a283be9fc", ...}
# the js3n fingerprint should be the same as target browser
# http/socks proxies are supported
proxies = {"https": "http://localhost:3128"}
r = requests.get("https://tls.browserleaks.com/json", impersonate="chrome110", proxies=proxies)
proxies = {"https": "socks://localhost:3128"}
r = requests.get("https://tls.browserleaks.com/json", impersonate="chrome110", proxies=proxies)
2.安装scrapy-fingerprint
查阅官方文档发现有异步用法,自己写了个指纹中间件,并打了个包
只需要安装即可
pip install scrapy-fingerprint==0.1.3
打开下载中间件
'scrapy_fingerprint.fingerprintmiddlewares.FingerprintMiddleware': 543,
将spider中scrapy.Request改写
yield FingerprintRequest(url=url, callback=self.parse)
具体请查看,可以正常获取到数据了.