爬虫思路
- 找到视频链接的地址
某站视频需要分别提取音频地址和视频地址 - 通过音频、视频地址下载音频视频
- 合成音频与视频
具体实现
找到音频、视频地址
像这种就是二进制数字,显示出来是乱码的就是音频或者视频
通过headers里面的url,截取一段再全局搜索
直接网页里面搜索,选择匹配出来的第一个,找出它的前缀
往前划一下,找出它的前缀标签
直接在原地址网页爬取,然后用正则截取上面那段
headers = {
"referer": "https://www.bilibili.com/", # 防盗链,用来解决403的代码错误
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/103.0.0.0 Safari/537.36 "
}
content = requests.get(url, headers=headers).text
result = re.findall("<script>window.__playinfo__=(.*?)</script>", content)[0]
temp = json.loads(result)
由于temp的输出很乱,可以使用pprint使其变整齐,需导入pprint库
pprint.pprint(temp)
结果为:
这样就可以很好的发现数据结构特点,我们需要的音频url和视频url就可以根据json方式来获取。因为这里有很多url,当不知道是哪个的时候都爬出来进行对比,很多都是重复,另外根据清晰度的不同,选择最高清晰度的url。
autio_urls = temp['data']['dash']['audio'][0]['baseUrl']
video_urls = temp['data']['dash']['video'][0]['baseUrl']
这样就获取到了地址,再进行一次网页爬取,直接存到文档,命名的时候设置MP3或者MP4
audio_content = requests.get(autio_urls, headers=headers).content
video_content = requests.get(video_urls, headers=headers).content
with open('video.mp4', mode='wb') as f:
f.write(video_content)
with open('audio.mp3', mode='wb') as f:
f.write(audio_content)
最后只用合并音频和视频就好了
os.system("ffmpeg.exe -i audio.mp3 -i video.mp4 -acodec copy -vcodec copy output2.mp4")
完整代码
import json
import requests
import re
import pprint
import os
import subprocess
def GetUrl(url):
headers = {
"referer": "https://www.bilibili.com/", # 防盗链,用来解决403的代码错误
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/103.0.0.0 Safari/537.36 "
}
content = requests.get(url, headers=headers).text
result = re.findall("<script>window.__playinfo__=(.*?)</script>", content)[0]
temp = json.loads(result)
# pprint.pprint(temp)
autio_urls = temp['data']['dash']['audio'][0]['baseUrl']
video_urls = temp['data']['dash']['video'][0]['baseUrl']
audio_content = requests.get(autio_urls, headers=headers).content
video_content = requests.get(video_urls, headers=headers).content
with open('video.mp4', mode='wb') as f:
f.write(video_content)
with open('audio.mp3', mode='wb') as f:
f.write(audio_content)
command = f"ffmpeg -i audio.mp3 -i video.mp4 -c:v copy -c:a aac -strict experimental output.mp4"
subprocess.run(command, shell=True)
# os.system("ffmpeg.exe -i audio.mp3 -i video.mp4 -acodec copy -vcodec copy output2.mp4")
if __name__ == '__main__':
GetUrl("https://www.bilibili.com/video/BV1ca41157v7?spm_id_from=333.1007.tianma.2-1-3.click&vd_source"
"=cb66ed1375a6f769d8f52fb7b44e90c9")