提示:爬虫必备,m3u8常见视频格式的爬取!🙊
来源:麦当的日常笔记👑
前言
由于外链原因,爬虫类文章不好通过,所以我也没详细展开的说,大概就是这这个模板,有什么疑问直接喊我就行,我看见就会回复。
所需模块
from tqdm import tqdm
import re
import requests
import os
剩余代码
url_0 = 'https://www.acfun.cn'
url_basic = 'https://www.acfun.cn/u/53405503'
header = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 Edg/108.0.1462.46'
}
res = requests.get(url=url_basic, headers=header).text
# print(res)
a_id_list = re.findall(r'<a href="(.*?)" target="_blank" class="ac-space-video', res)
for a_id in a_id_list:
a_url = url_0 + a_id
a_html = requests.get(url=a_url, headers=header).text
a_m3u8_title = re.findall(r'<title >(.*?)</title>',a_html)[0].split(' - ')[0]
filename = f'D:\\a站批量爬取\\{a_m3u8_title}\\'
if not os.path.exists(filename):
os.mkdir(filename)
a_m3u8_url = re.findall(r'"backupUrl(.*?)\"]', a_html)[0].replace('"', '').split('\\')[-2]
m3u8_res = requests.get(url=a_m3u8_url,headers=header).text
m3u8_res = re.sub(r'#EXTM3U','',m3u8_res)
m3u8_res = re.sub(r'#EXT-X-VERSION:\d','',m3u8_res)
m3u8_res = re.sub(r'#EXT-X-TARGETDURATION:\d','',m3u8_res)
m3u8_res = re.sub(r'#EXT-X-MEDIA-SEQUENCE:\d','',m3u8_res)
m3u8_res = re.sub(r'#EXTINF:\d.\d+,','',m3u8_res).split()
for i in tqdm(m3u8_res):
m3u8_url_f = 'https://ali-safety-video.acfun.cn/mediacloud/acfun/acfun_video/'+i
m3u8_content = requests.get(url=m3u8_url_f,headers=header).content
with open(filename+a_m3u8_title+'.mp4',mode='ab')as f:
f.write(m3u8_content)
print(f'{a_m3u8_title}视频保存成功')
这是一页a站up主的搞笑视频,想爬其他的稍微改一下就下行,代码虽然不多,但都是基础,很多重要的爬虫知识点,可以当练习做!