python爬虫---DAY5----模拟登录

最新推荐文章于 2022-05-09 16:26:48 发布

传说中的懿痕

最新推荐文章于 2022-05-09 16:26:48 发布

阅读量158

点赞数

分类专栏： python爬虫系列

本文链接：https://blog.csdn.net/yihen0214/article/details/119652648

版权

python爬虫系列专栏收录该内容

14 篇文章 1 订阅

订阅专栏

python爬虫—DAY5----模拟登录

模拟登录：爬取基于某些用户的用户信息。

需求：对古诗文网进行模拟登录

分析

分析登录流程，观察登录时发送的请求和请求地址（post上面为请求地址）
观察请求携带的参数（底部）
除了验证码外，其他参数可以固定，验证码通过打码平台识别获取

编码流程

1、验证码识别，获取验证码图片的文字数据
2、对post请求进行发送（处理请求参数）
3、对响应数据进行持久化存储

补充

实例化一个session对象，可以保存cookie，代替手动添加cookie

代码

import requests
from lxml import  etree
from ChaoJiYin import Chaojiying_Client

#1. 封装识别验证码函数
def getCodeText(imgPath, codeType):
    chaojiying = Chaojiying_Client('1257965244', '123445', '919873')  # 用户中心>>软件ID 生成一个替换 96001
    im = open(imgPath, 'rb').read()  # 本地图片文件路径 来替换 a.jpg 有时WIN系统须要//
    print(chaojiying.PostPic(im, codeType))
    return chaojiying.PostPic(im, codeType)

#2. 指定url
url='https://so.gushiwen.cn/user/login.aspx'

headers={
        "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36 Edg/91.0.864.67',
}

#3. 实例化一个session对象，代替requests发送请求
# session 发送请求可以捕获cookie
session=requests.Session()

#4. 发送请求，获取验证码图片
page_text=session.get(url=url,headers=headers).text

#5. 定位解析出验证码图片
tree=etree.HTML(page_text)
code_img_src='https://so.gushiwen.org'+tree.xpath('//*[@id="imgCode"]/@src')[0]
code_img=session.get(url=code_img_src,headers=headers).content
#6. 本地存储验证码
with open('./code.jpg' ,'wb') as fp:
    fp.write(code_img)

#7. 使用打码平台识别验证码
result=getCodeText("./code.jpg",1902)["pic_str"]

#8. 封装post请求参数表
data={
'__VIEWSTATE': 'H1kOgNOrVD1NjN3Ge0EVkVx2j0VCESjdSCZ+xpysVRuatfxlO0c1vMFhDNYz9cGKJmR075LTXg7FkSBG6T4Q7YnAXcYhe9M4YPHwW1FRz8ZslaybqkUWEBFBSvk=',
'__VIEWSTATEGENERATOR': 'C93BE1AE',
'from':' http://so.gushiwen.cn/user/collect.aspx',
'email': '19118415578', #填写自己的账号
'pwd': '1234567', #填写自己的密码
'code': result, # 验证码
'denglu': '登录',
}

#9. 指定登录url
login_url="https://so.gushiwen.cn/user/login.aspx?from=http%3a%2f%2fso.gushiwen.cn%2fuser%2fcollect.aspx"

#10. 发起post请求模拟登录
login_page_text=session.post(url=login_url,data=data,headers=headers).text

#11. 持久化存储
with open('./古诗.html', 'w', encoding='utf-8') as fp:
    fp.write(login_page_text)