爬虫 403 增加header和代理ip也没用？有可能是cloudflare在搞事情

最新推荐文章于 2024-04-24 15:45:08 发布

VIP文章随手笔记_000333

最新推荐文章于 2024-04-24 15:45:08 发布

阅读量1.1w

点赞数

文章标签：爬虫 python

本文链接：https://blog.csdn.net/SuperYR_210/article/details/120674405

版权

当爬虫遇到了403，有可能的原因主要有：

1. 你的User-Agent暴露了你，解决方案，增加header

import requests
import cfscrape
from urllib import request
from urllib import parse
from http.cookiejar import CookieJar

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36"
}

target_url = "www.baidu.com"
resp = requests.get(target_url, headers=headers)
print(resp)
print(resp.status_code)
print(resp.text)

关于如何获取user-agent：

1. 自动获取：使用现成的库，https://github.com/hellysmile/fake-useragent

2.手动获取：打开你需要爬虫的网页，右键检查，刷新网页，network下随便点一个，在request headers中获取

最低0.47元/天解锁文章

优惠劵

随手笔记_000333

关注关注

0
点赞
踩
7

收藏

觉得还不错? 一键收藏
1
评论
爬虫 403 增加header和代理ip也没用？有可能是cloudflare在搞事情

当爬虫遇到了403，有可能的原因主要有：1. 你的User-Agent暴露了你，解决方案，增加headerimport requestsimport cfscrapefrom urllib import requestfrom urllib import parsefrom http.cookiejar import CookieJarheaders = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) App
复制链接

扫一扫