一、背景
没怎么试过Websocket进行爬虫过!所以最近了解了一下。因此写了这篇文章!至于Websocket是什么,这里不做详细说明,请自行去百度查阅!
该文章主要提供交流学习使用,请勿利用其进行不当行为!
如本篇文章侵犯了贵公司的隐私,请联系我马上删除!
如因滥用解密技术而产生的风险与本人无关!
二、正文
2.1调试
打开Chrome调试工具,点击network调试面板查看,找到状态为101的请求:
备注:status 101:协议切换,响应101即说明服务端已确定切换!
打开Message,了解请求规则:
在握手成功之后,客户端向服务器发送了三条消息:
'{"UserInfo": {"Url": "live.611.com", "Version": "[1625308217000]{\\"chrome\\":true,\\"version\\":\\"86.0.4240.183\\",\\"webkit\\":true}"}, "action": "Web", "command": "RegisterInfo", "ids": []}'
'{"action": "SoccerLiveOdd", "command": "JoinGroup", "ids": []}'
'{"action": "SoccerLive", "command": "JoinGroup", "ids": []}'
然后服务器开始不停的给客户端发送消息!
备注:
- 箭头向上:客户端->服务端
- 箭头向下:服务端->客户端
2.2、模拟规则
逻辑:握手->发送三条信息->获取数据
代码:
import asyncio
import logging
from datetime import datetime
import time
from aiowebsocket.converses import AioWebSocket
async def startup(uri):
async with AioWebSocket(uri) as aws:
converse = aws.manipulator
data1 = '{"UserInfo": {"Url": "live.611.com", "Version": "[' + str(int(time.time()) * 1000) + ']{\\"chrome\\":true,\\"version\\":\\"86.0.4240.183\\",\\"webkit\\":true}"}, "action": "Web", "command": "RegisterInfo", "ids": []}'
data2 = '{"action": "SoccerLiveOdd", "command": "JoinGroup", "ids": []}'
data3 = '{"action": "SoccerLive", "command": "JoinGroup", "ids": []}'
await converse.send(data1)
await converse.send(data2)
await converse.send(data3)
while True:
mes = await converse.receive()
print('{time}-Client receive: {rec}'.format(time=datetime.now().strftime('%Y-%m-%d %H:%M:%S'),rec=mes))
if __name__ == '__main__':
remote = 'wss://push.611.com:6119/794c1c6893b74e5d8d2d5970b2a58422'
try:
asyncio.get_event_loop().run_until_complete(startup(remote))
except KeyboardInterrupt as exc:
logging.info('Quit.')
运行之后结果如下:
这很明显就是WebSocket 设置反爬了,我们再仔细看看请求逻辑!我们发现url存在一串序列
我们再看看这个请求前有没有其他获取到此key的请求。我们发现在此之前有一条请求获取这个key:
也就是说,我们在握手之前需要先请求这个API获取key,拼接成字符串再继续进行!逻辑如下:
逻辑:获取key->拼接uri->握手->发送三条信息->获取数据
代码如下:
import asyncio
import logging
from datetime import datetime
import requests
import json
import time
from aiowebsocket.converses import AioWebSocket
async def startup(uri):
async with AioWebSocket(uri) as aws:
converse = aws.manipulator
data1 = '{"UserInfo": {"Url": "live.611.com", "Version": "[' + str(int(time.time()) * 1000) + ']{\\"chrome\\":true,\\"version\\":\\"86.0.4240.183\\",\\"webkit\\":true}"}, "action": "Web", "command": "RegisterInfo", "ids": []}'
data2 = '{"action": "SoccerLiveOdd", "command": "JoinGroup", "ids": []}'
data3 = '{"action": "SoccerLive", "command": "JoinGroup", "ids": []}'
await converse.send(data1)
await converse.send(data2)
await converse.send(data3)
while True:
mes = await converse.receive()
print('{time}-Client receive: {rec}'.format(time=datetime.now().strftime('%Y-%m-%d %H:%M:%S'),rec=mes))
def get_uri():
url = 'https://live.611.com/Live/GetToken'
response = requests.get(url)
return json.loads(response.text)['Data']
if __name__ == '__main__':
remote = 'wss://push.611.com:6119/{key}'.format(key = get_uri())
try:
asyncio.get_event_loop().run_until_complete(startup(remote))
except KeyboardInterrupt as exc:
logging.info('Quit.')
运行之后,成功获取:
三、扩展
除了上面的一些反爬虫以外,还了解其他一些反爬,例如:Ping 反爬!!!
上图来源于网络!可阅读《Python如何爬取实时变化的WebSocket数据》