tensorflow分类实践：按场景拆分视频

最新推荐文章于 2022-06-10 22:11:46 发布

Memory_and_Dream

最新推荐文章于 2022-06-10 22:11:46 发布

阅读量595

点赞数

本文链接：https://blog.csdn.net/Memory_and_Dream/article/details/114642273

版权

小白的学习文，大佬勿看。。。

背景

虽然两年前看过一下下tensorflow的基础介绍，然而当时的感觉是这玩意怎么安装起来就辣么麻烦，而且到底怎么用也是一脸懵逼。最近又看了看tensorflow官方文档，结果惊喜地发现现在tensorflow比以前好入门太多了。于是决定用一个最近想到的需求对着官方文档边学边做。
基本上我是从零学习，所以大佬勿喷。。

需求描述

通常一个视频都是由多个场景衔接而成的，市面上也有各种软件支持自动将一个视频按场景切换切割成一段段的小视频。一般的算法都是基于前后两帧色彩和亮度的突然变化来识别场景切换。但是我打算用神经网络来训练一个模型出来，而不是根据自己的主观判断来定标准。

数据准备

首先我需要准备一批带标注的数据，搞一堆视频来切割显然是过于费劲的操作，所以我的做法是弄一些单一场景的视频切成图片，然后每两张图片的组合就对应了不同场景或者同一场景的图片组合。

def get_imgs():
    videos = get_videos()
    gen_video_frames(videos)


def get_videos():
    video_list = []
    file_path_list = walkFile(VIDEO_DIR)
    for file_path in file_path_list:
        if os.path.splitext(file_path)[1] in VIDEO_SUFFIXS:
            # path_md5= get_md5(file_path)
            video_list.append(file_path)
    return video_list
    
def gen_video_frames(video_path_list):
    for video_path in video_path_list:
        print(video_path)
        video_md5 = get_md5(video_path)
        clip = VideoFileClip(video_path)

        duration = clip.duration
        start = int(clip.duration / 3)  # 不从开头取

        seg_num = 0
        while start < duration:
            # frame = clip.get_frame(start)
            img_name = '{}-{}.jpg'.format(video_md5, int(start * 1000))
            img_path = os.path.join(RESULT_IMG_DIR, img_name)
            clip.save_frame(img_path, start)
            seg_num += 1
            start += SPLIT_DURATION
            if seg_num > MAX_SEG_NUM:
                break

实际训练的时候我将使用每两张图片上下拼接的结果当成样本，标签就是两张图是否属于一个原视频。

训练模型

作为小白，这算是我第一次训练自己的数据模型，所以就模仿着官方文档来了。（当然，后面由于反复测试效果不佳，所以我调整了神经元的数量）。总共是3层结构，输入由于我的图片压缩到了16090的尺寸，所以两张图片组合是160180，RGB颜色维度是3，所以输入层的shape是（180，160，3），由于目标是训练出二分类模型，所以最后一层输出是2.中间层本来按文档是128个神经元，但是后来效果很糟糕，想想我的输入要比官方文档上的例子要复杂许多，所以改成了288个神经元，效果好了很多。

另外，对于二分类模型，平衡两个分类样本的数量貌似对效果有很大提升。

    model = keras.Sequential([
        keras.layers.Flatten(input_shape=(180, 160, 3)),
        # 其实位置信息后面神经元也会记录的 虽然连接的数量一样，但是权重不一样，所以可能就有神经元专门记录第1个和第29个的关系
        keras.layers.Dense(288, activation='relu'),
        keras.layers.Dense(2)  # 2分类
    ])

    model.compile(optimizer='adam',
                  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                  metrics=['accuracy'])  # 设置准确率等参数

    model.fit(train_images, train_labels, epochs=10)  # 训练模型
    test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)

    print('\nTest accuracy:', test_acc)  # 评估准确率

切割视频

最近编辑视频我都用moviepy这个库操作，比自己封装ffmpeg命令好用多了。由于我只是想粗糙地测试拆分结果，所以我的切割策略是去掉场景切换的那一秒，然后只保留两个场景切换之间的video。最后看结果还凑合。

主要代码

模型训练

# -*- coding: utf-8 -*-
# @Time    : 2021/3/3 9:53 上午
# @Author  : meng_zhihao
# @Email   : 312141830@qq.com
# @File    : data_prepare.py

from moviepy.editor import *
import hashlib
from settings import *
import re
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from random import randint, sample
from tensorflow import keras
import tensorflow as tf


def get_md5(x):
    if isinstance(x, str):
        x = x.encode('utf-8')
        return hashlib.md5(x).hexdigest()


# 遍历文件夹
def walkFile(file):
    file_path_list = []
    for root, dirs, files in os.walk(file):

        # root 表示当前正在访问的文件夹路径
        # dirs 表示该文件夹下的子目录名list
        # files 表示该文件夹下的文件list

        # 遍历文件
        for f in files:
            file_path_list.append(os.path.join(root, f))
    return file_path_list


def get_videos():
    video_list = []
    file_path_list = walkFile(VIDEO_DIR)
    for file_path in file_path_list:
        if os.path.splitext(file_path)[1] in VIDEO_SUFFIXS:
            # path_md5= get_md5(file_path)
            video_list.append(file_path)
    return video_list


def gen_video_frames(video_path_list):
    for video_path in video_path_list:
        print(video_path)
        video_md5 = get_md5(video_path)
        clip = VideoFileClip(video_path)

        duration = clip.duration
        start = int(clip.duration / 3)  # 不从开头取

        seg_num = 0
        while start < duration:
            # frame = clip.get_frame(start)
            img_name = '{}-{}.jpg'.format(video_md5, int(start * 1000))
            img_path = os.path.join(RESULT_IMG_DIR, img_name)
            clip.save_frame(img_path, start)
            seg_num += 1
            start += SPLIT_DURATION
            if seg_num > MAX_SEG_NUM:
                break


def getRegex(regex, content):
    if type(content) == type(b''):
        content = content.decode('utf8')
    rs = re.search(regex, content)
    if rs:
        return rs.group(1)
    else:
        return ''


def get_md5_by_img(img_path):
    return getRegex('/(\w+)-\d+\.jpg', img_path)


def get_imgs():
    videos = get_videos()
    gen_video_frames(videos)


def read_imgs():
    img_file_list = walkFile(RESULT_IMG_DIR)
    result = []
    for img_path in img_file_list:
        if not '.jpg' in img_path:
            continue
        img_md5 = get_md5_by_img(img_path)
        print(img_md5)
        # 读入图片
        image = Image.open(img_path)
        image = image.resize((160, 90), resample=Image.BILINEAR)  # 缩放速度opencv更快 因为图片一般是16：9
        image = np.array(image)
        # 查看数据形状，其形状是[H, W, 3]，
        # 其中H代表高度， W是宽度，3代表RGB三个通道
        result.append([img_md5, image])
    return result


def gen_test_data(img_arrays, test_sample_num=100, train_sample_num=1000):  # 这里不同分类的个数不太一致，这个不好
    train_images, train_labels, test_images, test_labels = [], [], [], []

    diff, same = 0, 0
    for i in range(10000):
        a, b = sample(img_arrays, 2)
        label = 1 if a[0] == b[0] else 0
        if label == 1 and same < train_sample_num:
            same += 1
            contact_img = np.r_[a[1], b[1]]  # 沿着矩阵行拼接
            train_images.append(contact_img)
            train_labels.append(label)
        elif diff < train_sample_num:
            diff += 1
            contact_img = np.r_[a[1], b[1]]  # 沿着矩阵行拼接
            train_images.append(contact_img)
            train_labels.append(label)

    for i in range(10000):
        a, b = sample(img_arrays, 2)
        label = 1 if a[0] == b[0] else 0
        if label == 1 and same < test_sample_num:
            same += 1
            contact_img = np.r_[a[1], b[1]]  # 沿着矩阵行拼接
            test_images.append(contact_img)
            test_labels.append(label)
        elif diff < test_sample_num:
            diff += 1
            contact_img = np.r_[a[1], b[1]]  # 沿着矩阵行拼接
            test_images.append(contact_img)
            test_labels.append(label)

    return np.asarray(train_images), np.asarray(train_labels), np.asarray(test_images), np.asarray(test_labels)


def show_image(img):
    # 查看第一张图的像素区间
    plt.figure()
    plt.imshow(img)
    plt.colorbar()
    plt.grid(False)
    plt.show()


def train_classfi(train_images, train_labels, test_images, test_labels):
    show_image(train_images[0])
    show_image(test_images[0])

    # 归一化到0-1
    train_images = train_images / 255.0
    test_images = test_images / 255.0

    model = keras.Sequential([
        keras.layers.Flatten(input_shape=(180, 160, 3)),
        # 其实位置信息后面神经元也会记录的 虽然连接的数量一样，但是权重不一样，所以可能就有神经元专门记录第1个和第29个的关系
        keras.layers.Dense(288, activation='relu'),
        keras.layers.Dense(2)  # 2分类
    ])

    model.compile(optimizer='adam',
                  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                  metrics=['accuracy'])  # 设置准确率等参数

    model.fit(train_images, train_labels, epochs=10)  # 训练模型
    test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)

    print('\nTest accuracy:', test_acc)  # 评估准确率
    model.save('saved_model/my_model')


if __name__ == '__main__':
    # get_imgs()
    img_arrays = read_imgs()
    train_images, train_labels, test_images, test_labels = gen_test_data(img_arrays)
    train_classfi(train_images, train_labels, test_images, test_labels)

使用模型拆分视频

# -*- coding: utf-8 -*-
# @Time    : 2021/3/9 5:54 下午
# @Author  : meng_zhihao
# @Email   : 312141830@qq.com
# @File    : use_model_to_split_scene.py
from moviepy.editor import *
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from random import randint, sample
from tensorflow import keras
import tensorflow as tf
from img_classfi import *
from settings import *

class_names = ['diff', 'same']

new_model = tf.keras.models.load_model('saved_model/my_model')

# 检查其架构
new_model.summary()

# 预测模型
probability_model = tf.keras.Sequential([new_model,
                                         tf.keras.layers.Softmax()])


def do_test(test_images):
    predictions = probability_model.predict(test_images)
    print(predictions[0])
    print(class_names[np.argmax(predictions[0])])
    return np.argmax(predictions[0])


def cut_video(video_path, start, end):
    video_name = 'test-{}-{}.mp4'.format(start, end)
    video_output = os.path.join(VIDEO_CUT_DIR, video_name)
    video = VideoFileClip(video_path).subclip(start, end)
    video.write_videofile(video_output, fps=24)


def test_video(video_path):
    video = VideoFileClip(video_path).resize((160, 90))

    duration = video.duration
    start = 1
    cut_start = 0
    last_frame = None
    while start < duration - 1:
        # frame = clip.get_frame(start)
        np_frame = video.get_frame(start)
        np_frame = np_frame / 255.0
        print(start)
        if start > 1:
            contact_img = np.r_[np_frame, last_frame]
            # show_image(contact_img)
            test_array = np.asarray([contact_img])
            if not do_test(test_array):
                print('cut video ', cut_start, start)
                print(cut_start, start)
                if start - 1 > cut_start:
                    cut_video(video_path, cut_start, start - 1)
                cut_start = start

        last_frame = np_frame
        start += 1


if __name__ == '__main__':
    video_path = '/Users/mengzhihao/videotest/toSpace/resource/video/1.mp4'
    test_video(video_path)

经验总结

1、做训练获取有标注的素材是最重要的，作为程序员的我们没精力去自己一个个标注数据，所以一开始就要获取分好类的数据，或者自己造数据。
2、神经网络算法真的很神奇，虽然不像一般的算法那样很直观，但是只要模型合适，数据量够大，它能自己“蒙”出真实的算法出来。
3、如果训练效果不好，要先问自己数据是不是太少，分类是不是不平衡，神经网络是不是不够宽，不够深。