更新:发疯了,这个绕不过反爬,如果基数太大网络ip就会被封,重新跑小样本也不行了。
解决方法是加一个反爬的步骤,我自己没有解决,找大佬给调整了一下。所以这部分代码只可以做简单下载,不能做批量下载爬虫视频,感觉没什么场景可以应用。
内容:通过视频的真实链接,批量下载并命名为特定序号。
主要代码
pip install requests
pip install pandas
import pandas as pd
import requests
import os
# 读取 Excel 文件
excel_path = r'C:\Users\26534\Desktop\downloaded_videos.xlsx' # Excel 文件路径
df = pd.read_excel(excel_path)
# 创建保存视频的文件夹
if not os.path.exists(r'C:\Users\26534\Desktop\downloaded_videos'):
os.makedirs(r'C:\Users\26534\Desktop\downloaded_videos')
# 下载视频并保存到本地
for index, row in df.iterrows():
video_number = row['id']
video_url = row['lianjie']
file_path = os.path.join(r'C:\Users\26534\Desktop\downloaded_videos', f'{video_number}.mp4')
try:
response = requests.get(video_url, stream=True)
response.raise_for_status()
with open(file_path, 'wb') as file:
for chunk in response.iter_content(chunk_size=8192):
file.write(chunk)
print(f'视频 {video_number} 下载完成')
except requests.exceptions.RequestException as e:
print(f'视频 {video_number} 下载失败:{e}')
print('所有视频下载完成')
补充:下载的时候注意网络的顺畅,不然可能会出现下载失败的情况
再次下载所有下载失败的视频
弊端:需要对所有的视频再过一遍,如果基数比较大会比较耗时
import os
import pandas as pd
import requests
# 读取 Excel 文件
excel_path = r'自己更新文件地址'
df = pd.read_excel(excel_path)
# 创建保存视频的文件夹
output_folder = r'自己更新保存路径'
if not os.path.exists(output_folder):
os.makedirs(output_folder)
# 记录成功下载的视频编号
successful_downloads = set()
# 下载视频并保存到本地
for index, row in df.iterrows():
video_number = row['id']
video_url = row['lianjie']
file_path = os.path.join(output_folder, f'{video_number}.mp4')
# 检查视频是否已经成功下载
if video_number in successful_downloads:
print(f'视频 {video_number} 已经成功下载,跳过')
continue
try:
response = requests.get(video_url, stream=True)
response.raise_for_status()
with open(file_path, 'wb') as file:
for chunk in response.iter_content(chunk_size=8192):
file.write(chunk)
# 记录成功下载
successful_downloads.add(video_number)
print(f'视频 {video_number} 下载完成')
except requests.exceptions.RequestException as e:
print(f'视频 {video_number} 下载失败:{e}')
print('所有视频下载完成')