最近在看腾讯视频时,想把它下载下来,发现 video src 是 blob 加密的源,不能像以前那样看到一个mp4的 url
<video preload="auto" src="blob:https://v.qq.com/de0d7ebd-0cdb-4bdc-af52-5968cc0703ae"></video>
那就写个 python 脚本来下载,
1) 在 chrome 中右键视频,选择【复制调试信息】
2)在这个调试信息中找到 vurl 字段,
"vurl": "https%3A%2F%2Fapd-f7b9eff46dec7f445b83ad7ce66881e7.v.smtcdns.com%2Fnewsts.tc.qq.com%2FAxX2IVBFSKZKljuQlR16Q4Rtu1YM8yt4MCVVBpAU2qyc%2F5QSx4cOPVnrsEZoXel9kLPQqwzotq_YqO9mgVXJarwwHJ_lx_OfPdNcpa7aV3wnQiA1hoGX6T_rNTqBgcLjf-863H4xe50swNpbEPHPuaQ8vVuK7H-u6wNx8DTUocvYSMJDlKChZyKQF_zszSylZSZDfM235OXV0%2Fq0023x6yiss.321002.ts.m3u8%3Fver%3D4",
把它 decodeURIComponent 后的可以看到 vurl 是
https://apd-f7b9eff46dec7f445b83ad7ce66881e7.v.smtcdns.com/newsts.tc.qq.com/AxX2IVBFSKZKljuQlR16Q4Rtu1YM8yt4MCVVBpAU2qyc/5QSx4cOPVnrsEZoXel9kLPQqwzotq_YqO9mgVXJarwwHJ_lx_OfPdNcpa7aV3wnQiA1hoGX6T_rNTqBgcLjf-863H4xe50swNpbEPHPuaQ8vVuK7H-u6wNx8DTUocvYSMJDlKChZyKQF_zszSylZSZDfM235OXV0/q0023x6yiss.321002.ts.m3u8?ver=4
3)脚本原理
【注意】这里不解释 视频领域相关的概念,如 m3u8、mp4 box、流 等等,请自行搜索学习。
从上面vurl 下载下来的是一个 m3u8 文件,内容如下
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-TARGETDURATION:18
#EXT-X-PLAYLIST-TYPE:VOD
#EXTINF:8.040,
00_q0023x6yiss.321002.1.ts?index=0&start=0&end=8040&brs=0&bre=246655&ver=4
#EXTINF:12.000,
01_q0023x6yiss.321002.1.ts?index=1&start=8040&end=20040&brs=246656&bre=709887&ver=4
#EXTINF:12.000,
02_q0023x6yiss.321002.1.ts?index=2&start=20040&end=32040&brs=709888&bre=1416015&ver=4
#EXTINF:12.000,
03_q0023x6yiss.321002.1.ts?index=3&start=32040&end=44040&brs=1416016&bre=2027391&ver=4
这些都是把一个大视频分割成小段视频后的 url,那么需要做的是把这些小段下载下来,组合成一个大的mp4视频文件,这个就是所要下载的视频文件了。
附上脚本内容,目前脚本代码只满足我当前这次的下载需求,没有做成可支持命令行选项形式。
#!/usr/bin/python3
# In[1]:
import requests
import re
ua = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
# In[2]:
headers = {
'User-Agent': ua,
}
response = requests.get('https://apd-f7b9eff46dec7f445b83ad7ce66881e7.v.smtcdns.com/newsts.tc.qq.com/AxX2IVBFSKZKljuQlR16Q4Rtu1YM8yt4MCVVBpAU2qyc/5QSx4cOPVnrsEZoXel9kLPQqwzotq_YqO9mgVXJarwwHJ_lx_OfPdNcpa7aV3wnQiA1hoGX6T_rNTqBgcLjf-863H4xe50swNpbEPHPuaQ8vVuK7H-u6wNx8DTUocvYSMJDlKChZyKQF_zszSylZSZDfM235OXV0/q0023x6yiss.321002.ts.m3u8?ver=4', headers=headers)
# In[6]:
headers = {
'Pragma': 'no-cache',
'Origin': 'https://v.qq.com',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7',
'User-Agent': ua,
'Accept': '*/*',
'Referer': 'https://v.qq.com/x/page/d0019qdukl5.html',
'Connection': 'keep-alive',
'Cache-Control': 'no-cache',
}
# In[8]:
result = re.findall(r'^\d+.*index=\d+.*$', response.text, re.M)
# In[10]:
fmp4 = open('0.mp4', 'wb')
for i,r in enumerate(result):
print(i, end=', ', flush=True) # 进度
rsp = requests.get('https://apd-vliveachy.apdcdn.tc.qq.com/newsts.tc.qq.com/AxX2IVBFSKZKljuQlR16Q4Rtu1YM8yt4MCVVBpAU2qyc/5QSx4cOPVnrsEZoXel9kLPQqwzotq_YqO9mgVXJarwwHJ_lx_OfPdNcpa7aV3wnQiA1hoGX6T_rNTqBgcLjf-863H4xe50swNpbEPHPuaQ8vVuK7H-u6wNx8DTUocvYSMJDlKChZyKQF_zszSylZSZDfM235OXV0/'+r, headers=headers)
fmp4.write(rsp.content)
f = open('{0:0>8}.ts'.format(i), 'wb')
f.write(rsp.content)
f.close()
fmp4.close()