2024年Python最全python多线程爬虫框架

最新推荐文章于 2024-06-19 18:03:00 发布

2401_84584552

最新推荐文章于 2024-06-19 18:03:00 发布

阅读量585

点赞数 10

分类专栏：程序员文章标签： python 爬虫开发语言

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/2401_84584552/article/details/138355299

版权

file.close()

‘’’

if self.cache:

#如果有缓存方式,缓存网页

self.cache[url] = result

print(url,“页面下载完成”)

return result[“html”]

def download(self,url,headers,proxy,num_retries,data=None):

‘’’

用于下载一个页面,返回页面和与之对应的状态码

‘’’

#构建请求

request = urllib.request.Request(url,data,headers or {})

request.add_header(“Cookie”,“finger=7360d3c2; UM_distinctid=15c59703db998-0f42b4b61afaa1-5393662-100200-15c59703dbcc1d; pgv_pvi=653650944; fts=1496149148; sid=bgsv74pg; buvid3=56812A21-4322-4C70-BF18-E6D646EA78694004infoc; CNZZDATA2724999=cnzz_eid%3D214248390-1496147515-https%253A%252F%252Fwww.baidu.com%252F%26ntime%3D1496805293”)

request.add_header(“Upgrade-Insecure-Requests”,“1”)

opener = self.opener or urllib.request.build_opener()

if proxy:

#如果有代理IP,使用代理IP

opener = urllib.request.build_opener(urllib.request.ProxyHandler(proxy))

try:

#下载网页

response = opener.open(request)

print(“code是”,response.code)

html = response.read().decode()

code = response.code

except Exception as e:

print(“下载出现错误”,str(e))

html = ‘’

if hasattr(e,“code”):

code =e.code

if num_retries > 0 and 500<code<600:

#如果错误不是未找到网页,则重新下载num_retries次

return self.download(url,headers,proxy,num_retries-1,data)

else:

code = None

print(html)

return {“html”:html,“code”:code}

class Throttle:

‘’’

按照延时,请求,代理IP等下载网页,处理网页中的link的类

‘’’

def __init__(self, delay):

self

最低0.47元/天解锁文章

关注

10
点赞
踩
27

收藏

觉得还不错? 一键收藏
0
评论
2024年Python最全python多线程爬虫框架

’’#如果有缓存方式,缓存网页print(url,“页面下载完成”)‘’’用于下载一个页面,返回页面和与之对应的状态码‘’’#构建请求if proxy:#如果有代理IP,使用代理IPtry:#下载网页print(“code是”,response.code)print(“下载出现错误”,str(e))html = ‘’#如果错误不是未找到网页,则重新下载num_retries次else:‘’’按照延时,请求,代理IP等下载网页,处理网页中的link的类‘’’‘’’
复制链接

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。