py-wuhao/ks_barragegithub.com
- 建立websocket连接
连接地址:wss://http://live-ws-pg-group3.kuaishou.com/websocket
首先客户端向服务端发送一次数据告诉服务端要接收哪个房间的弹幕
python È Xoobhv8gqySwoX93lhC+54lnNGE82yNFqH0BIy+Qe/HMMwettAiCOFwLEwkHQzv/KaRiK/WsePupD6T+5Nh29BQ== QwNPlKE3svQ:7efOCCI9eZ1Yx5ct_1567477628900
发现由三个有效数据组成
token:oobhv8gqySwoX93lhC+54lnNGE82yNFqH0BIy+Qe/HMMwettAiCOFwLEwkHQzv/KaRiK/WsePupD6T+5Nh29BQ==
stream_id:QwNPlKE3svQ
page_id:7efOCCI9eZ1Yx5ct_1567477628900
其中stream_id,page_id保存在浏览器的Session Storage中
token可以在房间页源码找到。
服务端知道你要获取哪个房间弹幕,就给你回弹幕数据了。大约每两秒钟客户端要给服务端发送一次心跳。
也就是说只要替换这三个字段数据就可以爬指定房间弹幕了。
def on_open(ws):
part1 = [0x08, 0xC8, 0x01, 0x1A, 0x87, 0x01, 0x0A, 0x58] # 不变的头
token = "oobhv8gqySwoX93lhC+54lnNGE82yNFqH0BIy+Qe/HMMwettAiCOFwLEwkHQzv/Khhxtm5MNOpR0syxixhAyag=="
part2 = [ord(c) for c in token]
part3 = [0x12, 0x0B] #
stream_id = "XW35rOrWOs8"
part4 = [ord(c) for c in stream_id]
part5 = [0x3A, 0x1E]
page_id = "bGV3QBMQuDiH2efc_1567476653678"
part6 = [ord(c) for c in page_id]
d = part1 + part2 + part3 + part4 + part5 + part6
ws.send(d, websocket.ABNF.OPCODE_BINARY)
def run():
time.sleep(2)
ws.send([8, 1, 26, 7, 8, 184, 232, 190, 199, 220, 44], websocket.ABNF.OPCODE_BINARY)
thread.start_new_thread(run, ())
def on_message(ws, message):
data = [m for m in message]
print(data)
这里收到是十六进制数组,要根据协议解码。下面分析js逆推。
- 在common..........js里面
对数据decode,追进去
python实现:
def decode(self):
"""只处理弹幕"""
length = len(self)
while self.pos < length:
t = self.int()
tt = t >> 3
if tt == 1:
self.message['payloadType'] = self.int()
if self.message['payloadType'] != 310: # 非弹幕
return False
elif tt == 2:
self.message['compressionType'] = self.int()
elif tt == 3:
self.message['payload'] = self.bytes()
else:
self.skipType(t & 7)
return True
第8行,取当前位置的值