深度伪造检测论文 · Combining EfficientNet and Vision Transformers for Video Deepfake Detection

核心方法

提出有两个Convolutional-Transformer混合结构的模型。

  • Efficient ViT
  • Convolutional Cross ViT

在时间上和跨多个人脸上 聚合推断出 视频片段的真伪


Efficient Vit

在这里插入图片描述

由两个模块组成

  • 卷积模块 — 特征提取器:EfficientNet B0
    • 为输入的 7 × 7 7\times 7 7×7 图像块提取视觉特征,以嵌入重要的低级和局部信息
    • 微调,提取更合适的特征
  • Transformer编码器

Convolutional Cross ViT

在这里插入图片描述

DeepFake生成的伪影可能在全局或局部出现,仅使用EfficentNet针对小图像块不够理想。

两个分支处理不同的图像块:

  • S分支 处理小图像块 7 × 7 7\times 7 7×7
  • L分支 处理大图像块 64 × 64 64\times 64 64×64,大感受野

使用交叉注意力组合两个分支的输出,直接交互。

最终将两个分支的输出相加,得到模型预测输出


实验结果

实验设置

多种假脸生成方法:

  • DeepFakes
  • Face2Face
  • FaceShifter
  • FaceSwap
  • NeuralTextures

两个流行的数据集:

  • FaceForensics++
  • DFDC

比较多个SOTA方法:

  • Convolutional ViT(Deepfake video detection using convolutional vision transformer.)
  • ViT with distillation(Deepfake detection scheme based on vision transformer and distillation)
  • Selim EfficientNet B7 (DFDC👑)

性能指标:

  • AUC
  • F1-Score

训练

训练数据:220444张脸(DFDC + FF++)

  • real:116950
  • fake:103494

验证数据:8070张脸(DFDC)

特征提取器:EfficientNet B0 (fine-tuned) 和 Wodajo CNN(fine-tuned)


推理

在这里插入图片描述

提出一个稍微复杂的投票程序:

以演员为单位判断视频是否伪造。


实验结果

DFDC数据测试集上的实验结果

既不是用蒸馏,也不适用模型集成,仅使用1/3的参数量达到近似性能。


在FF++子集上的泛化性能

在这里插入图片描述

NeuralTextures上众模型的性能较差。


总结

本文通过使用EfficientNet作为图像块的特征前置提取器处理ViT的输入,并提出一种(没啥新颖度,用来凑字数的)投票方法。以较小的参数代价,实现了与SOTA方法可比较(实际上差得远)的性能。

Robust controller design involves the synthesis of a controller that can handle uncertainties and disturbances in a system. This is typically done by formulating the problem as an optimization problem, where the goal is to find a controller that minimizes a cost function subject to constraints. One approach to robust controller design involves combining prior knowledge with data. Prior knowledge can come from physical laws, engineering principles, or expert knowledge, and can help to constrain the search space for the controller design. Data, on the other hand, can provide information about the behavior of the system under different conditions, and can be used to refine the controller design. The combination of prior knowledge and data can be done in a number of ways, depending on the specific problem and the available information. One common approach is to use a model-based design approach, where a mathematical model of the system is used to design the controller. The model can be based on physical laws, or it can be derived from data using techniques such as system identification. Once a model is available, prior knowledge can be incorporated into the controller design by specifying constraints on the controller parameters or the closed-loop system response. For example, if it is known that the system has a certain level of damping, this can be used to constrain the controller design to ensure that the closed-loop system response satisfies this requirement. Data can be used to refine the controller design by providing information about the uncertainties and disturbances that the system is likely to encounter. This can be done by incorporating data-driven models, such as neural networks or fuzzy logic systems, into the controller design. These models can be trained on data to capture the nonlinearities and uncertainties in the system, and can be used to generate control signals that are robust to these uncertainties. Overall, combining prior knowledge and data is a powerful approach to robust controller design, as it allows the designer to leverage both physical principles and empirical data to design a controller that is robust to uncertainties and disturbances.
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值