urllib.error
- URLError产生原因:
- 网络断开
- 服务器连接失败
- 找不到指定服务器
- 是OSError的子类
- HTTPError:
- 两者区别:
- HTTPError是对应http请求的返回码错误
- 如果返回的错误码是400以上的,则引发httperror
- urlerror对应一般是网络出现问题,包括url问题
- 隶属关系:OSError > URLError > HTTPError
from urllib import request,error
if __name__ == "__main__":
url = "https://www.bbbbbbbbbaidu.com"
url = "http://www.sipo.gov.cn/hehehaha"
try:
req = request.Request(url)
rsp = request.urlopen(req)
html = rsp.read().decode()
print(html)
except error.HTTPError as e:
print("HTTPError:{}".format(e.reason))
print("HTTPError:{}".format(e))
except error.URLError as e:
print("URLError:{}".format(e.reason))
print("URLError:{}".format(e))
except Exception as e:
print(e)
HTTPError:Precondition Failed
HTTPError:HTTP Error 412: Precondition Failed
UserAgent
- useragent:用户代理简称UA,隶属于headers的一部分,服务器通过UA来判断访问者的身份
- 常用UA值,可以直接复制粘贴,也可以用浏览器访问时抓包
- 设置UA可以通过以下两种方式
'''
访问一个网址
更改自己的UA
'''
from urllib import request,error
if __name__ == "__main__":
url = "https://www.baidu.com/"
try:
'''
# 方法一:使用header方法伪装UA
headers = {}
headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0"
req = request.Request(url,headers=headers)
'''
req = request.Request(url)
req.add_header("User_Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0")
rsp = request.urlopen(req)
html = rsp.read().decode()
print(html)
except error.HTTPError as e:
print(e)
except error.URLError as e:
print(e)
except Exception as e:
print(e)
Proxy Handler处理(代理服务器)
- 伪装自己的IP地址,是爬虫的常用手段
- 获取代理服务器的地址:
- www.xicidaili.com
- www.goubanjia.com
- 代理在真是访问时也不允许频繁地访问同一个网站,所以代理要多个切换才行
- 基本使用步骤:
- 设置代理地址
- 创建ProxyHandler
- 创建Opener
- 安装Opener
from urllib import request,error
if __name__ == '__main__':
url = "http://www.baidu.com"
proxy = {"http":"120.194.18.90:81"}
proxy_handler = request.ProxyHandler(proxy)
opener = request.build_opener(proxy_handler)
request.install_opener(opener)
try:
rsp = request.urlopen(url)
html = rsp.read().decode()
print(html)
except error.URLError as e:
print(e)
except error.HTTPError as e:
print(e)
except Exception as e:
print(e)