用python做youtube自动化下载器 代码
根据 savefrom条例
本实例及教程只用于学习交流用,权利归savefrom.net所有
最后代码+注释大概100行左右,具体代码以github代码为主(可以会在上面修复bug),本文只做具体讲解
项目地址
思路
流程
1. post
根据思路里的第一步,我们首先需要用post
方式取到加密后的js字段,笔者使用了requests
第三方库来执行,关于爬虫可以参考我之前的文章
i. 先把post中的headers格式化
# set the headers or the website will not return information
# the cookies in here you may need to change
headers = {
"cache-Control": "no-cache",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,"
"*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"accept-encoding": "gzip, deflate, br",
"accept-language": "zh-CN,zh;q=0.9,en;q=0.8",
"content-type": "application/x-www-form-urlencoded",
"cookie": "lang=en; country=CN; uid=fd94a82a406a8dd4; sfHelperDist=72; reference=14; "
"clickads-e2=90; poropellerAdsPush-e=63; promoBlock=64; helperWidget=92; "
"helperBanner=42; framelessHdConverter=68; inpagePush2=68; popupInOutput=9; "
"_ga=GA1.2.799702638.1610248969; _gid=GA1.2.628904587.1610248969; "
"PHPSESSID=030393eb0776d20d0975f99b523a70d4; x-requested-with=; "
"PHPSESSUD=islilfjn5alth33j9j8glj9776; _gat_helperWidget=1; _gat_inpagePush2=1",
"origin": "https://en.savefrom.net",
"pragma": "no-cache",
"referer": "https://en.savefrom.net/1-youtube-video-downloader-4/",
"sec-ch-ua": "\"Google Chrome\";v=\"87\", \"Not;A Brand\";v=\"99\",\"Chromium\";v=\"87\"",
"sec-ch-ua-mobile": "?0",
"sec-fetch-dest": "iframe",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "same-origin",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/87.0.4280.88 Safari/537.36"}
其中cookie
部分可能要改,然后最好以你们浏览器上的为主,具体每个参数的含义不是本文范围,可以自行去搜索引擎搜
ii.然后把参数也格式化
# set the parameter, we can get from chrome
kv = {
"sf_url": url,
"sf_submit": "",
"new": "1",
"lang": "en",
"app": "",
"country": "cn",
"os": "Windows",
"browser": "Chrome"}
其中sf_url
字段是我们要下载的youtube视频的url,其他参数都不变
iii. 最后再执行requests
库的post请求
# do the POST request
r = requests.post(url="https://en.savefrom.net/savefrom.php", headers=headers,
data=kv)
r.raise_for_status()
注意是data=kv
iv. 封装成一个函数
import requests
def gethtml(url):
# set the headers or the website will not return information
# the cookies in here you may need to change
headers = {
"cache-Control": "no-cache",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,"
"*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"accept-encoding":