Python3 快手视频爬取

  • 前提

我们有一些具体的快手播放地址例如:

https://live.kuaishou.com/u/shengxue1111/3xwgehu7uyudyeq

打开后出现如下

  • 目的

拿到视频的播放地址

  • 解决过程

首先是F12看见返回的网页里面在最后有一个json串

但是在用代码请求的时候没有这个东西,根据地址栏发生了变化 变为了

https://live.kuaishou.com/u/shengxue1111/3xwgehu7uyudyeq?did=web_975948772fda54ca569800162f04e530

猜测可能是有一些跳转,于是清空了 cookie 和缓存的文件,重新请求下发现了端倪

返回的结果里面也有具体的MP4的播放地址

 

观察其实首页进行了跳转,在response里面有 set cookie

接下来要啥啥,就不用多说了吧,上代码

# coding:utf-8

import pymysql
import requests
import re
import time
import json


headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36',
    'Host': 'live.kuaishou.com',
    'content-type': 'application/json',
}

url = data[1]

print(f"开始请求 {url}")

response = None


response = requests.get(url, headers=headers)

text = response.text

cookie = response.cookies.get_dict()
did = cookie['did']

if not did:
	print(f"未获取到did {sid}")
	return

time.sleep(3)

cookie_str = ''

for key in cookie:
	cookie_str += key + ":" + cookie[key] + ";"

headers['cookie'] = cookie_str
headers['Referer'] = url + '?csr=true'

params = {
	"operationName": "FeedQuery",
	"query": "query FeedQuery($principalId: String, $photoId: String) {\n  feedById(principalId: $principalId, photoId: $photoId) {\n    currentWork {\n      id\n      thumbnailUrl\n      poster\n      workType\n      type\n      useVideoPlayer\n      imgUrls\n      imgSizes\n      magicFace\n      musicName\n      caption\n      location\n      liked\n      onlyFollowerCanComment\n      relativeHeight\n      timestamp\n      width\n      height\n      counts {\n        displayView\n        displayLike\n        displayComment\n        __typename\n      }\n      user {\n        id\n        eid\n        name\n        avatar\n        __typename\n      }\n      expTag\n      playUrl\n      __typename\n    }\n    status\n    errMsg\n    __typename\n  }\n}\n",
	"variables": {
		"principalId": author_id,
		"photoId": video_id
	}
}

response = requests.post('https://live.kuaishou.com/m_graphql', headers=headers, json=params)

text = response.text

j = json.loads(text)

playUrl = None

try:
	playUrl = j.get('data').get('feedById').get('currentWork').get('playUrl')
except Exception as e:
	pass

if not playUrl:
	print(f"没有找到地址 {sid},{url}")
	return;

print(playUrl)

大致意思就是请求具体的链接,从响应里面获取cookie,再请求JSON数据,需要注意的是快手的链接

https://live.kuaishou.com/u/shengxue1111/3xwgehu7uyudyeq

shengxue1111 就是用户ID

3xwgehu7uyudyeq就是具体的视频ID

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值