爬虫反爬之验证码

最新推荐文章于 2024-05-02 08:41:25 发布

听说这有个小白

最新推荐文章于 2024-05-02 08:41:25 发布

阅读量508

点赞数

分类专栏：爬虫 python 文章标签：爬虫

本文链接：https://blog.csdn.net/weixin_46466247/article/details/108939419

版权

python 同时被 2 个专栏收录

42 篇文章 0 订阅

订阅专栏

爬虫

3 篇文章 0 订阅

订阅专栏

本文章使用的识别验证码工具为超级鹰

通过开发文档页面获取超级鹰python语言demo
解压压缩包，获取chaojiying.py文件，放置到项目文件夹中
提取chaojiying.py中的main主程序

	if __name__ == '__main__':
			chaojiying = Chaojiying_Client('超级鹰用户名', '超级鹰用户名的密码', '96001')	#用户中心>>软件ID 生成一个替换 96001
			im = open('a.jpg', 'rb').read()		#本地图片文件路径 来替换 a.jpg 有时WIN系统须要//
			print (chaojiying.PostPic(im, 1902)	)#1902 验证码类型

将main代码移植到爬虫文件中并重新定义

from Chaojiying import Chaojiying_Client
def get_text(imgPath,imgType):
    chaojiying = Chaojiying_Client('超级鹰用户名', '超级鹰用户密码', '软件id')  # 用户中心>>软件ID 生成一个替换 96001
    im = open(imgPath, 'rb').read()  # imgPath为下载到本地的验证码图片
    return chaojiying.PostPic(im, imgType)['pic_str'] #imgType为验证码类型，例如：英文数字类型 1902，中文字符 2001等

爬虫代码

from Chaojiying import Chaojiying_Client
import requests
from lxml import etree
s = requests.Session()
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'
}
# 获取验证码
def get_text(imgPath,imgType):
    def get_text(imgPath,imgType):
	    chaojiying = Chaojiying_Client('超级鹰用户名', '超级鹰用户密码', '软件id')  # 用户中心>>软件ID 生成一个替换 96001
	    im = open(imgPath, 'rb').read()  # imgPath为下载到本地的验证码图片
	    return chaojiying.PostPic(im, imgType)['pic_str'] #imgType为验证码类型，例如：英文数字类型 1902，中文字符 2001等

# print(get_text('./a.jpg',1902))

# 古诗文页面
url = 'https://so.gushiwen.cn/user/login.aspx?from=http://so.gushiwen.cn/user/collect.aspx'
page_text = s.get(url,headers=headers).text
tree = etree.HTML(page_text)
# 获取验证码图片地址
img_src = 'https://so.gushiwen.cn'+tree.xpath('//*[@id="imgCode"]/@src')[0]
img_data = s.get(img_src,headers=headers).content
# 保存验证码图片
with open('./code.jpg','wb') as fp:
    fp.write(img_data)
# 将动态变化的请求参数解析出来
__VIEWSTATE = tree.xpath('//*[@id="__VIEWSTATE"]/@value')[0]
__VIEWSTATEGENERATOR = tree.xpath('//*[@id="__VIEWSTATEGENERATOR"]/@value')[0]
# 通过超级鹰识别验证码
result = get_text('./code.jpg',1902)
print(result)
# 登录验证页面
login_url = 'https://so.gushiwen.cn/user/login.aspx?from=http%3a%2f%2fso.gushiwen.cn%2fuser%2fcollect.aspx'
data = {
    '__VIEWSTATE': __VIEWSTATE,
    '__VIEWSTATEGENERATOR': __VIEWSTATEGENERATOR,
    'from': 'http://so.gushiwen.cn/user/collect.aspx',
    'email': '13102165156@163.com',
    'pwd': '123456',
    'code': result,
    'denglu': '登录',
}
page_text = s.post(url=login_url,headers=headers,data=data).text
print(page_text)
with open('./login.html','w',encoding='utf-8') as fp:
    fp.write(page_text)

听说这有个小白

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
爬虫反爬之验证码

本文章使用的识别验证码工具为超级鹰通过开发文档页面获取超级鹰python语言demo解压压缩包，获取chaojiying.py文件，放置到项目文件夹中提取chaojiying.py中的main主程序 if __name__ == '__main__': chaojiying = Chaojiying_Client('超级鹰用户名', '超级鹰用户名的密码', '96001') #用户中心>>软件ID 生成一个替换 96001 im = open('a.jpg', 'r..
复制链接

扫一扫