环境
python3.7
-
pytorch1.1.0
-
torchvision 0.3.0
-
cuda 9.0以上
##项目框架 -
Audio-and-video-demo
- bgm (背景语音播报文件)
- images
- ffempeg-img
- rec-img
- model (自训练模型保存)
- video (输入输出视频文件)
- bgm.py
- combination.py
- ffempeg-img-recognition.py
- gesture-recognition.py
- main.py
- putlabel.py
模块
ffempeg-img-recognition.py
将手势视频按帧分解为图片并保存
def ffmpeg_img_extract(videopath):
container = av.open(videopath)
stream = container.streams.video[0]
stream.codec_context.skip_frame = 'NONKEY'
for frame in container.decode(stream):
#savepath = 'C:/Users/hp/Desktop/Audio_and_video_processing/Audio_and_video_demo/images/ffmpeg_img/' +'%d.jpg'%frame.index
savepath = 'images/ffmpeg_img/' +'%d.jpg'%frame.index
frame.to_image().save(savepath,quality=80)
def img_to_video(videopath):
#转换为每帧
container = av.open(videopath)
for frame in container.decode(video=0):
#savepath = 'C:/Users/hp/Desktop/Audio_and_video_processing/Audio_and_video_demo/images/ffmpeg_img/' +'%d.jpg'%frame.index
savepath = 'images/ffmpeg_img/' +'%d.jpg'%frame.index
frame.to_image().save(savepath)
gesture-recognition.py
利用训练好的模型对手势图像进行识别,并用label_flag矩阵记录标签。这里使用的是googlenet预训练模型对我们的数据集进行训练,采用学习率降低法多次迭代训练,得到的模型对手势图像识别正确率在95%以上。
def gesture_recognition(filepath):
fileList = os.listdir(filepath)
count = 0
for filename in fileList:
count += 1
#背景音乐标签
bgm_label = []
for i in range(count):
filename = filepath+str(i)+'.jpg'
#图片读取
input_image = Image.open(filename)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#导入测试图片
input_image = Image.open(filename)
preprocess = transforms.Compose([
transforms.Resize(256),
#transforms.CenterCrop(224),
transforms.RandomRotation(20),
#transforms.ColorJitter(contrast=3),
transforms.ToTensor()