前言
哈喽,好久不见了吧,各位新年好!博主春节也是比较忙的,没时间去写文章和"coding"。最近我们学校也是初九就开学了,所以更加没时间创作了🤣
言归正传,本次写这篇文章算是想要对我个人的一次小总结吧。
本篇文章会围绕B站的登录请求来进行一次Python爬虫实战,用到的也是异步爬虫库httpx,我也是刚学爬虫不久,文章有什么错误欢迎各位大佬指正,指导
开始
1.准备
你需要准备的东西有这些:
- httpx
- qrcode
- 浏览器
- 勤劳的双手
其中1,2两点都是python第三方拓展库,httpx主要用来爬取数据,qrcode负责生成二维码,在此提示一下,本文章适合有一定爬虫基础的同学阅读,如果你是新手可以先去学习一点基础再来阅读本文章效果更佳
2.理清思路
我们先来理解一下b站的二维码生成机制,先上流程图
大概流程清晰了吧
接着我们来到B站官网,打开F12,使用开发者调试器找请求,找到我用方框框起来的请求,这个就是请求登录验证链接以及qrcode_key的接口
我们可以看到请求完接口是这样的
那么思路有了,接口也找到了,接下来该干正事了
3.开干!
首先随便新建一个py文件,然后写入以下代码
import json
import httpx
def get_qrurl() -> list:
"""返回qrcode链接以及token"""
with httpx.Client() as client:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE'
}
url = 'https://passport.bilibili.com/x/passport-login/web/qrcode/generate?source=main-fe-header'
data = client.get(url=url, headers=headers)
total_data = data.json()
qrcode_url = total_data['data']['url']
qrcode_key = total_data['data']['qrcode_key']
data = {}
data['url'] = qrcode_url
data['qrcode_key'] = qrcode_key
return data
if __name__ == "__main__":
print(get_qrurl())
接着我们运行一下,打印返回的值,发现,返回的值就是我们在浏览器看到的验证链接和qrcode
至此我们已经完成了向哔哩哔哩服务器发送请求的步骤,接下来就该生成二维码了,这里我们需要借助qrcode库来实现二维码的生成
话不多说,贴代码
import json
import httpx
import qrcode
def get_qrurl() -> list:
"""返回qrcode链接以及token"""
with httpx.Client() as client:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE'
}
url = 'https://passport.bilibili.com/x/passport-login/web/qrcode/generate?source=main-fe-header'
data = client.get(url=url, headers=headers)
total_data = data.json()
qrcode_url = total_data['data']['url']
qrcode_key = total_data['data']['qrcode_key']
data = {}
data['url'] = qrcode_url
data['qrcode_key'] = qrcode_key
return data
def make_qrcode():
"""制作二维码"""
data = get_qrurl()
qr = qrcode.QRCode(
version=5,
error_correction=qrcode.constants.ERROR_CORRECT_L,
box_size=10,
border=4,
)
qr.add_data(data['url'])
qr.make(fit=True)
# fill_color和back_color分别控制前景颜色和背景颜色,支持输入RGB色,注意颜色更改可能会导致二维码扫描识别失败
img = qr.make_image(fill_color="black")
img.show()
if __name__ == "__main__":
make_qrcode()
在这里我们定义了一个make_qrcode的函数,作用是生成二维码,可以发现,在该函数最后一行调用了show方法,作用是显示二维码
注意! 程序运行的函数发生改变,请注意更改
我们来看看运行后是怎样的效果
我们扫描生成的二维码会发现使用的协议是网页端的
至此我们已经成功一大半了吧
如果你要用作qq机器人的话还需要加个二维码状态判断,这里给大家总结一下我发现的各种二维码状态码及对应状态
这里给大家贴上我总结的状态码对照表
那么二维码搞定了,接下来该如何进行?
我们需要保存扫码完成后的cookie值,像这样
import json
import httpx
import qrcode
import os
def get_qrurl() -> list:
"""返回qrcode链接以及token"""
with httpx.Client() as client:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE'
}
url = 'https://passport.bilibili.com/x/passport-login/web/qrcode/generate?source=main-fe-header'
data = client.get(url=url, headers=headers)
total_data = data.json()
qrcode_url = total_data['data']['url']
qrcode_key = total_data['data']['qrcode_key']
data = {}
data['url'] = qrcode_url
data['qrcode_key'] = qrcode_key
return data
def make_qrcode(data):
"""制作二维码"""
qr = qrcode.QRCode(
version=5,
error_correction=qrcode.constants.ERROR_CORRECT_L,
box_size=10,
border=4,
)
qr.add_data(data['url'])
qr.make(fit=True)
# fill_color和back_color分别控制前景颜色和背景颜色,支持输入RGB色,注意颜色更改可能会导致二维码扫描识别失败
img = qr.make_image(fill_color="black")
img.show()
def sav_cookie(data, id):
"""用于储存cookie"""
try:
with open(f'./bilibili_login/cookie/{id}.json', 'w') as f:
json.dump(data, f, ensure_ascii=False)
except FileNotFoundError:
os.mkdir('./bilibili_login/cookie')
with open(f'./bilibili_login/cookie/{id}.json', 'w') as f:
json.dump(data, f, ensure_ascii=False)
def main_run():
"""主函数"""
data = get_qrurl()
token = data['qrcode_key']
make_qrcode(data)
with httpx.Client() as client:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE'
}
url = f"https://passport.bilibili.com/x/passport-login/web/qrcode/poll?qrcode_key={token}&source=main-fe-header"
data_login = client.get(url=url, headers=headers) # 请求二维码状态
data_login = json.loads(data_login.text)
code = int(data_login['data']['code'])
if code == 0:
cookie = dict(client.cookies)
sav_cookie(cookie, 'test')
if __name__ == "__main__":
main_run()
运行完毕后会在当前脚本目录下生成一个bilibili_login文件夹,里面有cookie文件夹,里面拥有一个test.json存放用户的cookie,这样我们便完成了cookie的获取以及存储,接下来就是带着cookie访问哔哩哔哩获取个人信息了
我们先定义一个读取cookie的函数,像这样
def load_cookie(id) -> dict:
"""用于加载cookie"""
try:
file = open(f'./bilibili_login/cookie/test.json', 'r')
cookie = dict(json.load(file))
except FileNotFoundError:
msg = '未查询到用户文件,请确认资源完整'
cookie = 'null'
print(msg)
return cookie
接着我们定义一个person函数,用作返回个人信息,像这样
def person():
"""获取个人资料"""
url = 'https://api.bilibili.com/x/web-interface/nav'
cookie = load_cookie()
with httpx.Client() as client:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE'
}
data = client.get(url=url, headers=headers, cookies=cookie)
data = data.json()
person_data = data['data'] # 获取个人信息
user_name = person_data['uname'] # 用户名
coin_num = str(person_data['money']) # 硬币数量
level = str(person_data['level_info']['current_level']) # 等级
face = str(person_data['face']) # 头像链接
print(person_data)
一切准备就绪!我们试着运行一下,可以看到返回的个人数据是json格式的,这里就不展示了,还请各位大佬自行尝试
好的,我们贴上完整代码
import json
import httpx
import qrcode
import os
def get_qrurl() -> list:
"""返回qrcode链接以及token"""
with httpx.Client() as client:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE'
}
url = 'https://passport.bilibili.com/x/passport-login/web/qrcode/generate?source=main-fe-header'
data = client.get(url=url, headers=headers)
total_data = data.json()
qrcode_url = total_data['data']['url']
qrcode_key = total_data['data']['qrcode_key']
data = {}
data['url'] = qrcode_url
data['qrcode_key'] = qrcode_key
return data
def make_qrcode(data):
"""制作二维码"""
qr = qrcode.QRCode(
version=5,
error_correction=qrcode.constants.ERROR_CORRECT_L,
box_size=10,
border=4,
)
qr.add_data(data['url'])
qr.make(fit=True)
# fill_color和back_color分别控制前景颜色和背景颜色,支持输入RGB色,注意颜色更改可能会导致二维码扫描识别失败
img = qr.make_image(fill_color="black")
img.show()
def sav_cookie(data, id):
"""用于储存cookie"""
try:
with open(f'./bilibili_login/cookie/{id}.json', 'w') as f:
json.dump(data, f, ensure_ascii=False)
except FileNotFoundError:
os.mkdir('./bilibili_login/cookie')
with open(f'./bilibili_login/cookie/{id}.json', 'w') as f:
json.dump(data, f, ensure_ascii=False)
def main_run():
"""主函数"""
data = get_qrurl()
token = data['qrcode_key']
make_qrcode(data)
with httpx.Client() as client:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE'
}
url = f"https://passport.bilibili.com/x/passport-login/web/qrcode/poll?qrcode_key={token}&source=main-fe-header"
data_login = client.get(url=url, headers=headers) # 请求二维码状态
data_login = json.loads(data_login.text)
code = int(data_login['data']['code'])
if code == 0:
cookie = dict(client.cookies)
sav_cookie(cookie, 'test')
def load_cookie() -> dict:
"""用于加载cookie"""
try:
file = open(f'./bilibili_login/cookie/test.json', 'r')
cookie = dict(json.load(file))
except FileNotFoundError:
msg = '未查询到用户文件,请确认资源完整'
cookie = 'null'
print(msg)
return cookie
def person():
"""获取个人资料"""
url = 'https://api.bilibili.com/x/web-interface/nav'
cookie = load_cookie()
with httpx.Client() as client:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE'
}
data = client.get(url=url, headers=headers, cookies=cookie)
data = data.json()
person_data = data['data'] # 获取个人信息
user_name = person_data['uname'] # 用户名
coin_num = str(person_data['money']) # 硬币数量
level = str(person_data['level_info']['current_level']) # 等级
face = str(person_data['face']) # 头像链接
print(person_data)
if __name__ == "__main__":
person()
这里有一点需要注意的是,我们需要先运行main_run来获取和储存cookie,接着在运行person进行个人信息获取,注意顺序不要搞混,否则可能会报错
结尾
恭喜大家ヾ(≧▽≦*)o,至此我们已经完成了个人信息的获取,本教程仅作为学习用途,希望这篇文章能帮到正在学习或者想要学习爬虫的大家,最后,我想说 Code changes the world!
结尾撒花(o゜▽゜)o☆