python爬虫模拟登陆

最新推荐文章于 2024-08-15 12:14:28 发布

It's possible

最新推荐文章于 2024-08-15 12:14:28 发布

阅读量391

点赞数

分类专栏：爬虫文章标签： python

本文链接：https://blog.csdn.net/weixin_45691962/article/details/106195487

版权

在Python爬虫过程中，针对需要登录的网站，本文以古诗文网为例，详细介绍了如何进行模拟登录。首先，利用打码平台如超级鹰识别验证码，接着创建软件并获取软件ID。然后，通过分析请求参数，尤其是动态变化的部分，解决登录过程中的难点。最后，通过携带cookie实现成功登录。

摘要由CSDN通过智能技术生成

使用python爬虫爬取网络数据时，某些页面需要登录成功之后才能访问，需要携带账号、密码以及验证码发起请求，以古诗文网为例进行爬取，验证码使用打码平台超级鹰进行识别。

验证码识别
线上的打码平台进行验证码识别
云打码：http://www.yundama.com/about.html
超级鹰（使用）：http://www.chaojiying.com/about.html
打码兔
超级鹰
注册：身份【用户中心】
登录：身份【用户中心】
创建一个软件：软件ID-》生成一个软件ID（899370）
下载示例代码：开发文档-》python
超级鹰提供的示例代码

#!/usr/bin/env python
# coding:utf-8

import requests
from hashlib import md5

class Chaojiying_Client(object):

    def __init__(self, username, password, soft_id):
        self.username = username
        password =  password.encode('utf8')
        self.password = md5(password).hexdigest()
        self.soft_id = soft_id
        self.base_params = {
   
            'user': self.username,
            'pass2': self.password,
            'softid': self.soft_id,
        }
        self.headers = {
   
            'Connection': 'Keep-Alive',
            'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)',
        }

    def PostPic(self, im, codetype):
        """
        im: 图片字节
        codetype: 题目类型 参考 http://www.chaojiying.com/price.html
        """
        params = {
   
            'codetype': codetype,
        }
        params.update(self.base_params)
        files = {
   'userfile': ('ccc.jpg', im)}
        r = requests.post('http://upload.chaojiying.net/Upload/Processing.php', data=params, files=files, headers=self.headers)
        return r.json()

    def ReportError(self, im_id):
        """
        im_id:报错题目的图片ID
        """
        params = {
   
            'id': im_id,
        }
        params