python程序使用代理IP，出现407错误如何解决

最新推荐文章于 2023-07-10 15:22:50 发布

亿牛云爬虫专家

最新推荐文章于 2023-07-10 15:22:50 发布

阅读量1.1k

点赞数

分类专栏：爬虫代理爬虫技术 python 文章标签： python tcp/ip 爬虫

本文链接：https://blog.csdn.net/ip16yun/article/details/128314895

版权

爬虫代理同时被 3 个专栏收录

166 篇文章 1 订阅

订阅专栏

爬虫技术

134 篇文章 0 订阅

订阅专栏

python

87 篇文章 0 订阅

订阅专栏

python（版本2.23.0）爬虫程序使用代理IP，代理服务器返回407错误，这种情况下一般是认证信息传递失败导致，是因为httplib(python2)/http.client(python3) 模块的函数中，如果连接尝试不成功，它会引发 OSError ，需要进行方法封装专门对代理服务器的请求进行处理，用于设置认证信息传递，如下所示：

import re
import requests
from requests.utils import get_auth_from_url
from requests.auth import HTTPDigestAuth
from requests.utils import parse_dict_header
from urllib3.util import parse_url

def get_proxy_autorization_header(proxy, method):
    username, password = get_auth_from_url(proxy)
    auth = HTTPProxyDigestAuth(username, password)
    proxy_url = parse_url(proxy)
    proxy_response = requests.request(method, proxy_url, auth=auth)
    return proxy_response.request.headers['Proxy-Authorization']


class HTTPSAdapterWithProxyDigestAuth(requests.adapters.HTTPAdapter):
    def proxy_headers(self, proxy):
        headers = {}
        proxy_auth_header = get_proxy_autorization_header(proxy, 'CONNECT')
        headers['Proxy-Authorization'] = proxy_auth_header
        return headers


class HTTPAdapterWithProxyDigestAuth(requests.adapters.HTTPAdapter):
    def proxy_headers(self, proxy):
        return {}

    def add_headers(self, request, **kwargs):
        proxy = kwargs['proxies'].get('http', '')
        if proxy:
            proxy_auth_header = get_proxy_autorization_header(proxy, request.method)
            request.headers['Proxy-Authorization'] = proxy_auth_header



class HTTPProxyDigestAuth(requests.auth.HTTPDigestAuth):

    def init_per_thread_state(self):
        # Ensure state is initialized just once per-thread
        if not hasattr(self._thread_local, 'init'):
            self._thread_local.init = True
            self._thread_local.last_nonce = ''
            self._thread_local.nonce_count = 0
            self._thread_local.chal = {}
            self._thread_local.pos = None
            self._thread_local.num_407_calls = None

    def handle_407(self, r, **kwargs):
        """
        Takes the given response and tries digest-auth, if needed.
        :rtype: requests.Response
        """

        # If response is not 407, do not auth
        if r.status_code != 407:
            self._thread_local.num_407_calls = 1
            return r

        s_auth = r.headers.get('proxy-authenticate', '')

        if 'digest' in s_auth.lower() and self._thread_local.num_407_calls < 2:
            self._thread_local.num_407_calls += 1
            pat = re.compile(r'digest ', flags=re.IGNORECASE)
            self._thread_local.chal = requests.utils.parse_dict_header(
                    pat.sub('', s_auth, count=1))

            # Consume content and release the original connection
            # to allow our new request to reuse the same one.
            r.content
            r.close()
            prep = r.request.copy()
            requests.cookies.extract_cookies_to_jar(prep._cookies, r.request, r.raw)
            prep.prepare_cookies(prep._cookies)

            prep.headers['Proxy-Authorization'] = self.build_digest_header(prep.method, prep.url)
            _r = r.connection.send(prep, **kwargs)
            _r.history.append(r)
            _r.request = prep

            return _r

        self._thread_local.num_407_calls = 1
        return r

session = requests.Session()
    def __call__(self, r):
        # Initialize per-thread state, if needed
        self.init_per_thread_state()
        # If we have a saved nonce, skip the 407
        if self._thread_local.last_nonce:
            r.headers['Proxy-Authorization'] = self.build_digest_header(r.method, r.url)

        r.register_hook('response', self.handle_407)
        self._thread_local.num_407_calls = 1

        return r
        
# 代理服务器(产品官网 www.16yun.cn)
proxyHost = "t.16yun.cn"
proxyPort = "31111"

# 代理验证信息
proxyUser = "username"
proxyPass = "password"

proxyMeta = "http://%(user)s:%(pass)s@%(host)s:%(port)s" % {
        "host" : proxyHost,
        "port" : proxyPort,
        "user" : proxyUser,
        "pass" : proxyPass,
}

session.proxies = {
    'http': proxyMeta,
    'https':proxyMeta,
}
session.trust_env = False

session.mount('http://', HTTPAdapterWithProxyDigestAuth())
session.mount('https://', HTTPSAdapterWithProxyDigestAuth())

response_http = session.get("http://www.baidu.com/")
print(response_http.status_code)

response_https = session.get("https://www.baidu.com")
print(response_https.status_code)

上面代码这段代码定义了一系列用于HTTP 代理服务器进行身份验证的类。例如，它定义了 HTTPSAdapterWithProxyDigestAuth 类，该类可用于代理服务器进行 HTTPS 请求的身份验证。它还定义了 HTTPAdapterWithProxyDigestAuth 类，该类可用于代理服务器进行 HTTP 请求的身份验证。此外，它还定义了 HTTPProxyDigestAuth 类，该类继承自 requests.auth.HTTPDigestAuth 类，并扩展了它，以便支持在处理 407 错误响应时进行多次身份验证。
欢迎私信联系索取更多资料