所以我想在python3中获取页面的URL ...
如果我执行以下操作,
from urllib.request import urlopen
html = urlopen("http://google.com/")
html.read()
我得到所需的html。但是,如果要选择其他网址,如下所示,
from urllib.request import urlopen
html = urlopen("http://www.stackoverflow.com/")
html.read()
第二行后出现以下错误:
Traceback (most recent call last):
File "", line 1, in
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 461, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 574, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 499, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 433, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 582, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
有什么想法为什么会发生以及如何解决?
解决方案
如果仔细查看错误消息,您会发现它是一个HTTP错误,也是一个特殊的错误:
HTTP Error 403: Forbidden
因此,您与服务器进行了交谈并获得了答复,但您不知道为什么遭到拒绝。
您可以在服务器返回的HTML中获得更详细的消息,如下所示:
from urllib.request import urlopen
from urllib.error import HTTPError
try:
html = urlopen("http://www.stackoverflow.com/")
except HTTPError as e:
print(e.read().decode('utf-8'))
html.read()
对我来说,它说:
What happened?
The owner of this website (www.stackoverflow.com) has banned your access based on your browser's signature (213702c58d2116a6-ua48).
尽管是一个异常(URLError的子类),但HTTPError也可以用作非异常的类似于文件的返回值(与urlopen()返回的东西相同)。在处理异常的HTTP错误(例如身份验证请求)时,此功能很有用。