python urlretrieve 失败,如何在urllib.urlretrieve中捕获404错误

Background: I am using urllib.urlretrieve, as opposed to any other function in the urllib* modules, because of the hook function support (see reporthook below) .. which is used to display a textual progress bar. This is Python >=2.6.

>>> urllib.urlretrieve(url[, filename[, reporthook[, data]]])

However, urlretrieve is so dumb that it leaves no way to detect the status of the HTTP request (eg: was it 404 or 200?).

>>> fn, h = urllib.urlretrieve('http://google.com/foo/bar')

>>> h.items()

[('date', 'Thu, 20 Aug 2009 20:07:40 GMT'),

('expires', '-1'),

('content-type', 'text/html; charset=ISO-8859-1'),

('server', 'gws'),

('cache-control', 'private, max-age=0')]

>>> h.status

''

>>>

What is the best known way to download a remote HTTP file with hook-like support (to show progress bar) and a decent HTTP error handling?

解决方案

Check out urllib.urlretrieve's complete code:

def urlretrieve(url, filename=None, reporthook=None, data=None):

global _urlopener

if not _urlopener:

_urlopener = FancyURLopener()

return _urlopener.retrieve(url, filename, reporthook, data)

In other words, you can use urllib.FancyURLopener (it's part of the public urllib API). You can override http_error_default to detect 404s:

class MyURLopener(urllib.FancyURLopener):

def http_error_default(self, url, fp, errcode, errmsg, headers):

# handle errors the way you'd like to

fn, h = MyURLopener().retrieve(url, reporthook=my_report_hook)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值