视频预测论文（GAN）（2）-Generating Videos with Scene Dynamics

最新推荐文章于 2024-07-26 11:49:10 发布

ygh1872774470

最新推荐文章于 2024-07-26 11:49:10 发布

阅读量3.4k

点赞数 3

分类专栏：视频行为预测学习 GAN网络相关

视频行为预测学习同时被 2 个专栏收录

4 篇文章 0 订阅

订阅专栏

GAN网络相关

3 篇文章 0 订阅

订阅专栏

Generating Videos with Scene Dynamics

这是MIT大佬的论文，这篇主要讲的视频生成，附带有视频预测,也是用gan网络

先介绍一下 3D卷积神经网络
(http://blog.csdn.net/sinat_24143931/article/details/78892362)

这里一个核心思想是将视频分成了前景（foreground ）和后景(backgroud)
用一个mask区分（这里有点像LSTM的遗忘门机制）
z表示高斯分布
生成器
这里写图片描述

这里写图片描述
0 ≥ m(z) ≥ 1 spatio-temporal mask that selects either the foreground f(z) model or the background model b(z)

判别器
The discriminator needs to be able to solve two problems: firstly, it must be able to classify realistic scenes from synthetically generated scenes, and secondly, it must be able to recognize realistic motion between frames. We chose to design the discriminator to be able to solve both of these tasks with the same model. We use a five-layer spatio-temporal convolutional network with kernels 4 × 4 × 4 so that the hidden layers can learn both visual models and motion models. We design the architecture to be reverse of the foreground stream in the generator, replacing fractionally strided convolutions with strided convolutions (to down-sample instead of up-sample), and replacing the last layer to output a binary classification (real or not).
鉴别器需要能够解决两个问题：首先，它必须能够从合成生成的场景中分类真实场景，其次，它必须能够识别帧之间的逼真运动。我们选择设计鉴别器以便能够用相同的模型解决这两个任务。我们使用内核为4×4×4的五层时空卷积网络，以便隐藏层可以学习视觉模型和运动模型。我们将该体系结构设计为发生器中前景流的逆向结构，用逐步卷积（down-sample，而不是up-sample）代替分步的卷积，并替换最后一层以输出二进制分类（真实与否）。

结果：