爬虫使用代理出现的问题

最新推荐文章于 2024-05-19 11:12:10 发布

qq_19294857

最新推荐文章于 2024-05-19 11:12:10 发布

阅读量4w

点赞数 7

分类专栏：爬虫文章标签：爬虫代理

本文链接：https://blog.csdn.net/qq_19294857/article/details/99653889

版权

爬虫专栏收录该内容

2 篇文章 0 订阅

订阅专栏

requests.exceptions.ProxyError:HTTPSConnectionPool(host=‘www.baidu.com’, port=443): Max retries exceeded with url: / (Caused by ProxyError(‘Cannot connect to proxy.’, OSError(‘Tunnel connection failed: 400 Bad Request’)))

刚学习爬虫代理的时候，测试了下面代码。IP地址是在网上找的免费代理

import requests
proxy = '222.66.94.130:80'
proxies = {
    'http': 'http://' + proxy,
    'https': 'https://' +proxy
}
try:
    res = requests.get('http://httpbin.org/get', proxies=proxies)
    print(res.text)
except requests.exceptions.ConnectionError as e:
    print('Error', e.args)

输出也很正常：

D:\software\python.exe E:/code/pycharm/py_project/python3/9.1-3.py
{
  "args": {}, 
  "headers": {
    "Host": "httpbin.org", 
    "User-Agent": "lua-resty-http/0.10 (Lua) ngx_lua/10007"
  }, 
  "origin": "222.66.127.248, 222.66.127.248", 
  "url": "https://httpbin.org/get"
}

接着想着用代理的IP地址进行爬虫，示例如下

import requests
url = 'https://www.baidu.com'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'
}
proxy = '222.66.94.130:80'
proxies = {
    'http': 'http://' + proxy,
    'https': 'https://' +proxy
}
response = requests.get(url=url, proxies=proxies, headers=headers)
print(response.status_code)

结果
requests.exceptions.ProxyError:HTTPSConnectionPool(host=‘www.baidu.com’, port=443): Max retries exceeded with url: / (Caused by ProxyError(‘Cannot connect to proxy.’, OSError(‘Tunnel connection failed: 400 Bad Request’)))
在这里，我原本以来这个IP地址没有问题，结果就在网上搜集各种出错来源
1、http连接太多没有关闭导致的
2、.访问次数频繁，被禁止访问
其实找来找去最后发现还是这个IP地址的问题。。。（真的便宜没好货）
后面多试了几个高匿IP地址，
比如：proxy = ‘111.231.92.21:8888’
结果发现就没有报错了

这里再科普下透明代理，普通匿名代理和高匿代理
透明代理：可以说是最没用的IP代理，在你访问的时候，对方服务器知道你使用了代理服务器，也知道你的真实IP地址。
普通匿名代理：比透明代理稍微好一点，访问的时候对方知道你使用了代理，但是一般不清楚你真实IP地址（有一定几率能追查到）。
高匿代理：这个就比较高级了，不仅能隐藏自身IP地址，还能以假乱真，让服务器以为代理IP地址就是你的真实IP地址。

各位有兴趣还可以用我给的第一个程序去验证，可以发现高匿IP地址能隐藏客户端真实IP地址。

参考：
1、http://www.mamicode.com/info-detail-2109743.html
2、https://blog.csdn.net/wdh315172/article/details/80491668
3、python 3 网络爬虫开发实战（崔庆才）

qq_19294857

关注

7
点赞
踩
7

收藏

觉得还不错? 一键收藏
2
评论
爬虫使用代理出现的问题

requests.exceptions.ProxyError:HTTPSConnectionPool(host=‘www.baidu.com’, port=443): Max retries exceeded with url: / (Caused by ProxyError(‘Cannot connect to proxy.’, OSError(‘Tunnel connection failed...
复制链接

扫一扫