python爬虫（数据下载）

最新推荐文章于 2024-08-03 19:27:22 发布

liouyi250

最新推荐文章于 2024-08-03 19:27:22 发布

阅读量2.7k

点赞数 2

分类专栏：爬虫文章标签： python

本文链接：https://blog.csdn.net/liouyi250/article/details/104969185

版权

python爬虫（数据下载）

import urllib.request
import urllib.error
import chardet
import sys
def download(url):
    print('Downloading:',url)
    try:
        html=urllib.request.urlopen(url).read()
        encode=chardet.detect(html)#获取网页编码
    except urllib.error.URLError as e:
        print('Download error:',e.reason)
        html=None
    return html.decode(encode['encoding'])

下载网页——超时重新下载

当进行网页访问时，会经常出现错误，如4XX或者5XX，这里对于5XX的错误重新下载。

def download(url,retries_time=3):
    print('Downloading:',url)
    try:
        html=urllib.request.urlopen(url).read()
        encode=chardet.detect(html)

最低0.47元/天解锁文章

liouyi250

关注

2
点赞
踩
12

收藏

觉得还不错? 一键收藏
0
评论
python爬虫（数据下载）

python爬虫（数据下载）目录下载网页——直接下载下载网页——超时重新下载设置请求头链接爬虫下载限速爬虫陷阱设置代理目录编写python爬虫，可以使用urllib或requests模块，参考资料如下:requests文档urllib官方文档下载网页——直接下载需要首先安装pip和chardet (pip install chardet)import urllib.requesti...
复制链接

扫一扫