python urlretrieve 失败,如何在urllib.urlretrieve中捕获404错误

最新推荐文章于 2022-05-04 17:00:00 发布

咯嗯

最新推荐文章于 2022-05-04 17:00:00 发布

阅读量371

点赞数

文章标签： python urlretrieve 失败

Background: I am using urllib.urlretrieve, as opposed to any other function in the urllib* modules, because of the hook function support (see reporthook below) .. which is used to display a textual progress bar. This is Python >=2.6.

>>> urllib.urlretrieve(url[, filename[, reporthook[, data]]])

However, urlretrieve is so dumb that it leaves no way to detect the status of the HTTP request (eg: was it 404 or 200?).

>>> fn, h = urllib.urlretrieve('http://google.com/foo/bar')

>>> h.items()

[('date', 'Thu, 20 Aug 2009 20:07:40 GMT'),

('expires', '-1'),

('content-type', 'text/html; charset=ISO-8859-1'),

('server', 'gws'),

('cache-control', 'private, max-age=0')]

>>> h.status

>>>

What is the best known way to download a remote HTTP file with hook-like support (to show progress bar) and a decent HTTP error handling?

解决方案

Check out urllib.urlretrieve's complete code:

def urlretrieve(url, filename=None, reporthook=None, data=None):

global _urlopener

if not _urlopener:

_urlopener = FancyURLopener()

return _urlopener.retrieve(url, filename, reporthook, data)

In other words, you can use urllib.FancyURLopener (it's part of the public urllib API). You can override http_error_default to detect 404s:

class MyURLopener(urllib.FancyURLopener):

def http_error_default(self, url, fp, errcode, errmsg, headers):

# handle errors the way you'd like to

fn, h = MyURLopener().retrieve(url, reporthook=my_report_hook)

咯嗯

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python urlretrieve 失败,如何在urllib.urlretrieve中捕获404错误

Background: I am using urllib.urlretrieve, as opposed to any other function in the urllib* modules, because of the hook function support (see reporthook below) .. which is used to display a textual pr...
复制链接

扫一扫