python 百度云识别验证码_python爬虫遇到【安全检查! | 百度云加速】的解决方案...-CSDN博客

本帖最后由 aiai 于 2020-5-25 22:31 编辑

首先假设我们还不知道网站有百度云加速检查，先直接获取。网址因某些原因屏蔽，但是不影响整体思路

[Python] 纯文本查看复制代码shareurl = 'https://************/**************************'

headers = {

'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'

}

response = httpx.get(url=shareurl, headers=headers)

print(response.text)

查看打印的结果，可以看到有【安全检查! | 百度云加速】

[JavaScript] 纯文本查看复制代码

安全检查! | 百度云加速

那么通过打印的结果，要做的就很明显了，需要获取一些参数，以及验证码，最后一齐请求

首先是响应头中的参数

[Python] 纯文本查看复制代码cookie = response.headers['set-cookie'].split(';')[0]

ray = response.headers['cf-ray'].split('-')[0]

然后是响应体的参数

[Python] 纯文本查看复制代码posturl = '/'.join(shareurl.split('/')[:3])+html.unescape(re.findall('(?<=action=").+?(?=")', response.text)[0])

r = re.findall('(?<=value=").+?(?=")', response.text)[0]

最后还需要一个验证码，这里的pub参数多次抓包发现是不变的，所以就直接写死

首先获取一个用于获取验证码图片的参数session

[Python] 纯文本查看复制代码url = 'https://captcha.su.baidu.com/session_cb?pub=377e4907e1a3b419708dbd00df9e8f79'

headers = {

'Host': 'captcha.su.baidu.com',

'Referer': shareurl,

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'

}

response = httpx.get(url, headers=headers).text

session = response.split('"')[-2]

此时通过session以及前面的pub可以获得验证码图片，保存到本地再手动输入

[Python] 纯文本查看复制代码url = 'https://captcha.su.baidu.com/image?session='+session+'&pub=377e4907e1a3b419708dbd00df9e8f79'

response = httpx.get(url, headers=headers).content

with open('验证码.jpg', 'wb') as f:

f.write(response)

yanzhengma = input('请输入同目录下的验证码：')

最后构造请求头和请求体，发出请求即可得到目标网页数据

[Python] 纯文本查看复制代码headers = {

'content-type': 'application/x-www-form-urlencoded',

'cookie': cookie,

'referer': shareurl,

'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',

}

data = {

'r': r,

'id': ray,

'captcha_challenge_field': session,

'manual_captcha_challenge_field': yanzhengma,

}

response = httpx.post(posturl, headers=headers, data=data)

print(response.text)

再次查看打印的内容，获取正确

TIM截图20200525222547.jpg (8.88 KB, 下载次数: 0)

2020-5-25 22:26 上传

附上完整代码

[Python] 纯文本查看复制代码shareurl = 'https://************/**************************'

headers = {

'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',

}

response = httpx.get(shareurl, headers=headers)

cookie = response.headers['set-cookie'].split(';')[0]

ray = response.headers['cf-ray'].split('-')[0]

posturl = '/'.join(shareurl.split('/')[:3])+html.unescape(re.findall('(?<=action=").+?(?=")', response.text)[0])

r = re.findall('(?<=value=").+?(?=")', response.text)[0]

url = 'https://captcha.su.baidu.com/session_cb?pub=377e4907e1a3b419708dbd00df9e8f79'

headers = {

'Host': 'captcha.su.baidu.com',

'Referer': shareurl,

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'

}

response = httpx.get(url, headers=headers).text

session = response.split('"')[-2]

url = 'https://captcha.su.baidu.com/image?session='+session+'&pub=377e4907e1a3b419708dbd00df9e8f79'

response = httpx.get(url, headers=headers).content

with open('验证码.jpg', 'wb') as f:

f.write(response)

yanzhengma = input('请输入同目录下的验证码：')

headers = {

'content-type': 'application/x-www-form-urlencoded',

'cookie': cookie,

'referer': shareurl,

'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',

}

data = {

'r': r,

'id': ray,

'captcha_challenge_field': session,

'manual_captcha_challenge_field': yanzhengma,

}

response = httpx.post(posturl, headers=headers, data=data)

print(response.text)

python 百度云 识别验证码_python爬虫遇到【安全检查! | 百度云加速】的解决方案...

python 百度云识别验证码_python爬虫遇到【安全检查! | 百度云加速】的解决方案...