python-爬虫初识-自动登录(二)

最新推荐文章于 2024-02-25 16:42:58 发布

磊-

最新推荐文章于 2024-02-25 16:42:58 发布

阅读量283

点赞数

分类专栏： python-爬虫(全)

本文链接：https://blog.csdn.net/duanlei123456/article/details/100285196

版权

python-爬虫(全) 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

一，BeautifulSoup模块详细介绍

二，自动登录github

一，BeautifulSoup模块详细介绍

BeautifulSoup是一个模块，该模块用于接收一个HTML或XML字符串，然后将其进行格式化，之后遍可以使用他提供的方法进行快速查找指定元素，从而使得在HTML或XML中查找指定元素变得简单。

1，安装

pip3 install beautifulsoup4

详细文档

https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html

二，自动登录github

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import requests
from bs4 import BeautifulSoup

# ############## 方式一 ##############

# 1. 访问登陆页面，获取 authenticity_token
i1 = requests.get('https://github.com/login')
soup1 = BeautifulSoup(i1.text, features='lxml')
tag = soup1.find(name='input', attrs={'name': 'authenticity_token'})
authenticity_token = tag.get('value')
# 获取第一次访问的cookies
c1 = i1.cookies.get_dict()
i1.close()

# 1. 携带authenticity_token和用户名密码等信息，发送用户验证
form_data = {
    "authenticity_token": authenticity_token,
    "utf8": "",
    "commit": "Sign in",
    "login": "duanlei123",
    'password': 'xxxx'
}

i2 = requests.post('https://github.com/session', data=form_data, cookies=c1)
# 获取第二次cookies
c2 = i2.cookies.get_dict()
# 更新cookie
c1.update(c2)

i3 = requests.get('https://github.com/settings/repositories', cookies=c1)
soup3 = BeautifulSoup(i3.text, features='lxml')
list_group = soup3.find(name='div', class_='listgroup')

from bs4.element import Tag

for child in list_group.children:
    if isinstance(child, Tag):
        project_tag = child.find(name='a', class_='mr-1')
        size_tag = child.find(class_='text-small')
        temp = "项目:%s(%s); 项目路径:%s" % (project_tag.get('href'), size_tag.string, project_tag.string, )
        print(temp)

磊-

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python-爬虫初识-自动登录(二)

目录一，BeautifulSoup模块详细介绍二，自动登录github一，BeautifulSoup模块详细介绍BeautifulSoup是一个模块，该模块用于接收一个HTML或XML字符串，然后将其进行格式化，之后遍可以使用他提供的方法进行快速查找指定元素，从而使得在HTML或XML中查找指定元素变得简单。1，安装pip3 install beautifulsoup4...
复制链接

扫一扫

专栏目录