爬虫系列6下载一个网页（异常处理，用户代理，重试次数）

最新推荐文章于 2023-12-12 19:35:25 发布

豆豆orz

最新推荐文章于 2023-12-12 19:35:25 发布

阅读量377

点赞数

分类专栏：爬虫系列文章标签：爬虫

本文链接：https://blog.csdn.net/runnoob_1115/article/details/78510641

版权

爬虫系列专栏收录该内容

20 篇文章 0 订阅

订阅专栏

# -*- coding: utf-8 -*-
import urllib2
# 下载网页
def download(url):
    return urllib2.urlopen(url).read()

# 可能会遇到一些无法遇见的错误，可能会抛出异常

# 捕捉异常版
def download(url):
    print 'Downloading:', url
    try:
        html = urllib2.urlopen(url).read()
    except urllib2.URLError as e:
        print 'Download Error:', e.reason
        html = None
    return html

# 重试下载版（有些错误是临时的，我们可以尝试重新下载，5xx服务器端问题）
def download(url, num_retries=2):
    print 'Downloading:',url
    try:
        html = urllib2.urlopen(url).read()
    except urllib2.URLError as e:
        print 'Download Error:', e.reason
        html = None
        if num_retries > 0:
            if hasattr(e, 'code') and 500 <= e.code < 600:
                return download(url, num_retries-1)
    return html

# 设置用户代理,重试次数
def download(url, user_aget='wswp', num_retries=2):
    print 'Downloading:',url
    headers = {'User-agent':user_aget}
    request = urllib2.Request(url, headers=headers)
    try:
        html = urllib2.Request(request).read()
    except urllib2.URLError as e:
        print 'Download Error:', e.reason
        html = None
        if num_retries > 0:
            if hasattr(e, 'code') and 500 <= e.code < 600:
                return download(url, num_retries-1)
    return html

豆豆orz

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
爬虫系列6下载一个网页（异常处理，用户代理，重试次数）

# -*- coding: utf-8 -*-import urllib2# 下载网页def download(url): return urllib2.urlopen(url).read()# 可能会遇到一些无法遇见的错误，可能会抛出异常# 捕捉异常版def download(url): print 'Downloading:', url try:
复制链接

扫一扫