实现流程简要概括:
- 抓取样本videos
- 视频内容切片为frame(每帧或每几帧)
- Conv3D神经网络(视频信息嵌入)
- 全连层 sigmoid+binary CE 多标签分类
优点是实现端对端预测,可直接用于下游任务:分类、打标等等
缺点是未考虑frame的时序信息,切分类结果通常较general,且依赖大量样本
1. Frame 提取方式
import cv2
import numpy as np
import os
def mkdir(path):
folder = os.path.exists(path)
if not folder:
os.makedirs(path)
def v2frame(videoPath, svPath, num_frame=450, size=120):
# 保留所有帧,每个视频取450frame,不足的以黑画面补全
cap = cv2.VideoCapture(videoPath)
suc, frame = cap.read()
frame_count = 0
while(frame_count<num_frame):
if(suc):
frame=cv2.resize(frame,(size,size),interpolation=cv2.INTER_AREA)
else:
frame = np.zeros((size,size,3), np.uint8)
cv2.imwrite(svPath+'/%d.jpg' % frame_count, frame)#, params)
if(suc):
suc, frame = cap.read()
frame_count += 1
cap.release()
filenames = tuple(os.listdir("videos"))
filenames = [x.split(".")[0] for x in filenames]
for filename in filenames