Deepfake Video Detection Using Recurrent Neural Networks论文阅读笔记

最新推荐文章于 2024-01-03 10:54:01 发布

陌上&未央

最新推荐文章于 2024-01-03 10:54:01 发布

阅读量1.2k

点赞数

分类专栏：论文文章标签：神经网络深度学习

本文链接：https://blog.csdn.net/qq_45631882/article/details/121929090

版权

论文专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Deepfake Video Detection Using Recurrent Neural Networks论文阅读笔记

D. Güera and E. J. Delp, “Deepfake Video Detection Using Recurrent Neural Networks,” 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2018, pp. 1-6, doi: 10.1109/AVSS.2018.8639163.

Introduction

ses a convolu-tional neural network (CNN) to extract frame-level features

使用CNN卷积神经网络提取帧级特征

These features are then used to train a recurrent neural net-work (RNN) that learns to classify if a video has been sub-ject to manipulation or not.

特征提取后，使用RNN递归神经网络，学习如何对视频分类

The main contributions of this work are summarized asfollows. First, we propose a two-stage analysis composedof a CNN to extract features at the frame level followed by atemporally-aware RNN network to capture temporal incon-sistencies between frames introduced by the face-swappingprocess. Second, we have used a collection of 600 videos to evaluate the proposed method, with half of the videos being deepfakes collected from multiple video hosting websites. Third, we show experimentally the effectiveness of the de-
scribed approach, which allows use to detect if a suspect video is a deepfake manipulation with 94% more accuracy than a random detector baseline in a balanced setting

这项工作的主要贡献总结如下。首先，我们提出了一个由CNN组成的两阶段分析，用于在帧级提取特征，然后是一个时间感知RNN网络，用于捕获人脸交换过程中引入的帧之间的时间不一致性。其次，我们收集了600个视频来评估所提出的方法，其中一半的视频是从多个视频托管网站收集的DeepFake视频。第三，我们通过实验证明了所述方法的有效性，该方法允许在平衡设置下检测可疑视频是否为深度假操作，准确率比随机检测器基线高94%

Realted Work

Digital Media Forensics 数字媒体取证

two pre-trained deep CNNs

two different face swapping manipulations using a two-stream network
Face-based Video Manipulation Methods 基于人脸的视频处理方法

Face2Face：a real-time fa-cial reenactment system，capable of altering facial move-ments in different types of video streams

Generative adversarial networks ( 生成性对抗网络GANs)：Gans shows remarkable results in altering face attributes such as age, facial hair or mouth expressions.
Recurrent Neural Networks 循环神经网络

LSTM网络

When a deep learning architecture is equipped with a LSTM combined with a CNN, it is typically considered as “deep in space” and “deep in time” respectively, which can be seen as two distinct system modalities.

当深度学习体系结构配备有LSTM和CNN时，它通常分别被视为“空间深度”和“时间深度”，这可以被视为两种不同的系统模式。

在这里插入图片描述

训练方式(Unet神经网络也是相同的训练方式)

Two sets of training images are required

the original face原始图像

the desired face预期图像

生成方式

pass a latent representation of a face generated from the original subject present in the video to the decoder network trained on faces of the subject we want to insert in the video

缺陷：

边界效果

Because the encoder is not aware of the skin or other scene information it is very common to have boundary effects due to a seamed fusion between the new face and the rest of the frame

最终视频本身的生成过程固有的

Because the autoencoder is used frame-by-frame, it is completely
unaware of any previous generated face that it may have created.

CNN的用处：

The most prominent is an inconsistent choice of illuminants between scenes with frames, with leads to a flickering phenomenon in the face region common to the majority of fake videos. Although this phenomenon can be hard to appreciate to the naked eye in the best manually-tuned deepfake manipulations, it is easily captured by a pixel-level CNN feature extracto

最突出的是场景与帧之间光源的选择不一致，导致大多数假视频常见的脸部区域闪烁现象。虽然这种现象很难用肉眼在最好的人工调节深度伪造操作中欣赏，但它很容易被像素级的CNN特征提取捕捉到

Recurrent Network for Deepfake Detection

在这里插入图片描述

Convolutional LSTM

CNN for frame feature extraction.

CNN帧特征提取

removed to directly output a deep representation of each frame using the ImageNet pre-trained model

The 2048-dimensional feature vec-tors after the last pooling layers are then used as the sequen-tial LSTM input.
LSTM for temporal sequence analysis.

LSTM时间序列分析

2048-wide LSTM takes a sequence of 2048-dimensional ImageNet feature vectors

512 fully-connected layer

a softmax layer to compute the probabilities of the frame sequence being either pristine or deepfake

without the need of auxiliary loss functions.

陌上&未央

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
1
评论
Deepfake Video Detection Using Recurrent Neural Networks论文阅读笔记

Deepfake Video Detection Using Recurrent Neural Networks论文阅读David Güera Edward J. DelpVideo and Image Processing Laboratory (VIPER), Purdue UniversityIntroductionses a convolu-tional neural network (CNN) to extract frame-level features使用CNN卷积神经网络提取帧级
复制链接

扫一扫