前言:ICCV2017的一篇基于核做视频插帧的文章,adaConv改进版
论文地址:【here】
Video Frame Interpolation via Adaptive Separable Convolution
引言
基于核的方法比基于光流的方法能更好的应对遮挡、模糊、亮度变化等情况,但是基于核的方法是每个像素点得到一个核,核的大小又必须很大因为要处理大的位移变化,因此所需要的内存空间会变得非常大
具体如文章所述
. The convolution kernels jointly account for the two separate steps of motion estimation and re-sampling involved in traditional frame interpolation methods. In order to handle large motion, large kernels are required. For example, Niklaus et al. employ a neural network to output two 41×41 kernels for each output pixel. To generate the kernels for all pixels in a 1080p
video frame, the output kernels alone will require 26 GB of memory. The memory demand increases quadratically with the kernel size and thus limits the maximal motion to be handled.
网络架构
本文的主要思想,将一个2维的核拆成2个一维的核,最后可由两个1维核的乘积得到2维核
即对于前后两帧,则分别需要得到一个2维核,拆开则需要4个一维核
具体文章部分
Our method addresses this problem by estimating a pair of 1D kernels that approximate a 2D kernel. That is, we estimate hk1,v, k1,hi and hk2,v, k2,hi to approximate K1 as k1,v ∗ k1,h and K2 as k2,v ∗ k2,h. Thus, our method reduces the number of kernel parameters from n2 to 2n for each kernel. This enables the synthesis of a high-resolution video frame in one pass and the incorporation of perceptual loss to further improve the visual quality of the interpolation results, as detailed in the following subsections.
实验部分
定量实验
定性实验
同时用了L1损失和感知损失的不同
总结
这篇文章主要提供了一个两个一维核构成一个二维核的方式,降低存储量,更多的是原文adaConv的延伸,改动不大