需要下载内容的网站要求通过 Cookie 验证用户身份。需从浏览器导出当前登录的账号Cookie 文件。
操作步骤:
在Edge或Chrome 安装GetCookies插件: Get Cookies.txt Locally
安装后在已经登录的网站上,点击扩展图标找到GetCookies插件,选择导出当前页面的全部 Cookie 文件(export all cookies)
验证 Cookie 有效性:确保导出的 Cookie 文件包含 SID
、HSID
、LOGIN_INFO
等关键字段
下文针对在echomimic_v2项目中下载EMTD数据集遇到下面报错给出解决方法
(echomimic):~/echomimic_v2/EMTD_dataset$ python download.py
Fail to download https://www.…….com/watch?app=desktop&v=R7jm0-R9N_o, error info:
ERROR: [……] R7jm0-R9N_o: Sign in to confirm you’re not a bot. Use --cookies-from-browser or --cookies for the authentication. See https://github.com/yt-dlp/yt-dlp/wiki/FAQ#how-do-i-pass-cookies-to-yt-dlp for how to manually pass cookies. Also see https://github.com/yt-dlp/yt-dlp/wiki/Extractors#exporting-youtube-cookies for tips on effectively exporting …… cookies
原因在于download.py中没有处理网站的身份验证机制,导致触发机器人检测(需登录验证)。需添加 Cookie 认证参数,例如我这里下载的cookies文件名为cookies.txt,将文件放到~/echomimic_v2/EMTD_dataset文件夹里面,然后在代码的command
列表中添加 --cookies
参数,指定 Cookie 文件路径:
command = [
'yt-dlp',
'--cookies', 'cookies.txt', # 新增参数,需提前导出 Cookie 文件
'-f', 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]',
'--merge-output-format', 'mp4',
'--output', output_path,
video_url
]
重新运行 python download.py,不再触发
(echomimic) :~/echomimic_v2/EMTD_dataset$ python download.py
Download https://www.…….com/watch?v=FpiWSFcL3-c successfully!
Download https://www.…….com/watch?v=rVNb53lkBuc successfully!
Download https://www.…….com/watch?v=U-BHz_UIOfs successfully!
Download https://www.…….com/watch?v=Z3HJCQJ2Lmo successfully!
……