python爬虫常见异常及处理方法

最新推荐文章于 2024-07-11 09:09:18 发布

woyaojinqu

最新推荐文章于 2024-07-11 09:09:18 发布

阅读量1.2w

点赞数 3

分类专栏：个人日志 python 文章标签： python 异常爬虫 stack overflow exception

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/woyaojinqu/article/details/78161487

版权

个人日志同时被 2 个专栏收录

7 篇文章 0 订阅

订阅专栏

2 篇文章 0 订阅

订阅专栏

在编写python爬虫时经常会遇到异常中断的情况，导致爬虫意外终止，一个理想的爬虫应该能够在遇到这些异常时继续运行。下面就谈谈这几种常见异常及其处理方法：

异常1：requests.exceptions.ProxyError

对于这个错误，stackoverflow给出的解释是
The ProxyError exception is not actually the requests.exceptions exception; it an exception with the same name from the embedded urllib3 library, and it is wrapped in a MaxRetryError exception.
翻译过来就是这个错误实际上不是requests.exceptions中的异常，这是嵌入到urllib2库中的同名异常，这个异常是封装在MaxRetryError当中的。补充一点，通常在代理服务器不通时出现这个异常。
异常2：requests.exceptions.ConnectionError

对于这个错误，stackoverflow给出的解释是
In the event of a network problem (e.g. DNS failure, refused connection, etc), Requests will raise a ConnectionError exception.
翻译过来就是说这是网络问题出现的异常事件（如DNS错误，拒绝连接，等等），这是Requests库中自带的异常
一种解决办法是捕捉基类异常，这种方法可以处理所有的异常情况:
try:
r = requests.get(url, params={’s’: thing})
except requests.exceptions.RequestException as e: # This is the correct syntax
print e
sys.exit(1)
另外一种解决办法是分别处理各种异常，这里面有三种异常：
try:
r = requests.get(url, params={’s’: thing})
except requests.exceptions.Timeout:
except requests.exceptions.TooManyRedirects:
except requests.exceptions.RequestException as e:
print e
sys.exit(1)
异常3：requests.exceptions.ChunkedEncodingError

对于这个错误，stackoverflow给出的解释是
The link you included in your question is simply a wrapper that executes urllib’s read() function, which catches any incomplete read exceptions for you. If you don’t want to implement this entire patch, you could always just throw in a try/catch loop where you read your links.
问题中给出的链接是执行urllib’s库的read函数时，捕捉到了读取不完整数据导致的异常。如果你不想实现这个完整的不动，只要在读取你的链接时抛出一个try/catch循环即可：
try:
page = urllib2.urlopen(urls).read()
except httplib.IncompleteRead, e:
page = e.partial

对于上面的异常，还有一个比较简单易用的解决方法，就是直接在处理异常时返回函数原型，这样就可以在捕捉到异常后继续运行下去，直到不出现异常为止，具体的实现方法如下：

def myfunc(para)
try:
     your code
except your except:
    print(your except)
    return myfunc

关注

3
点赞
踩
21

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论 1

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。