关于Python3绕过指纹识别解决ja3指纹的案例

最新推荐文章于 2024-03-28 09:23:34 发布

水兵没月

最新推荐文章于 2024-03-28 09:23:34 发布

阅读量537

点赞数 2

文章标签： python

本文链接：https://blog.csdn.net/weixin_43124425/article/details/136328626

版权

注意！！！！某XX网站实例仅作为学习案例，禁止其他个人以及团体做谋利用途！！！

场景

Python采集某网址页面内容

aHR0cHM6Ly9jcmVkaXRiai5qeGouYmVpamluZy5nb3YuY24vY3JlZGl0LXBvcnRhbC9jcmVkaXRfc2VydmljZS9wdWJsaWNpdHkvcmVjb3JkL2JsYWNr

报错信息

requests.exceptions.SSLError: HTTPSConnectionPool(host='creditbj.jxj.beijing.gov.cn', port=443): Max retries exceeded with url: /credit-portal/api/publicity/record/BLACK/0 (Caused by SSLError(SSLError("bad handshake: Error([('elliptic curve routines', 'ecx_key_op', 'invalid encoding'), ('SSL routines', 'tls_process_ske_ecdhe', 'bad ecpoint')],)",),))

问题溯源

正常使用requests 请求，总是报上述的错误。早前担心是headers内容不全和代理不稳定以及网络等外界因素。在各个条件齐全的条件下报错依旧存在。

通过各种查资料了解到这种报错是JA3 TLS指纹反爬的表现。本人能力有限还仅限于了解表层了解，深入的知识点请自行解决。

问题解决方法

使用 curl_cffi 库

curl_cffi: 支持原生模拟浏览器 TLS/JA3 指纹的 Python 库(建议3.7及以上的Python)

from curl_cffi import requests as requests1
def get_req(url, headers, proxies, method, data=None):
    # impersonate 参数，指定了模拟哪个浏览器
    s = requests1.Session()
    if method.lower() in ["payload"]:
        res = s.post(url=url, headers=headers, data=json.dumps(data), verify=False, proxies=proxies, impersonate="chrome101")
    elif method.lower() in ["post"]:
        res = s.post(url=url, headers=headers, data=data, verify=False, proxies=proxies,
                     impersonate="chrome101")
    else:
        res = s.get(url=url, headers=headers, verify=False, proxies=proxies, impersonate="chrome101")

    res.encoding='utf-8'
    return res

if __name__ == '__main__':
    url = "https://XXXX/ZXXX"
    headers = {
        "Accept":"application/json, text/javascript, */*; q=0.01",
            "Content-Type":"application/json",
            "Referer":"https://XXXXXXXXXXXX",
            "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    }
    proxies = ""  # 代理
    method = "GET" # 请求方式 GET，POST，PAYLOAD
    data = {}  # 请求参数 可不填
    res = get_req(url, headers, proxies, method, data=None)
    print(res)