First Order Motion Model for Image Animation 论文笔记

最新推荐文章于 2023-04-25 23:01:57 发布

SGAKonata

最新推荐文章于 2023-04-25 23:01:57 发布

阅读量973

点赞数

分类专栏：深度学习文章标签：深度学习

本文链接：https://blog.csdn.net/weixin_41431170/article/details/109642494

版权

本文探讨了一种无需预训练标注数据的自监督训练方法，使用First Order Motion Model进行图像动画。模型包括运动估计模块和图像生成模块，通过估计稠密运动场和遮罩层来实现。关键点检测器采用自动编码器结构，同时输出关键点和仿射变换。图像生成模块则根据这些信息渲染新的图像。此外，文章还介绍了如何生成遮罩以处理遮挡问题，并使用损失函数优化结果。

摘要由CSDN通过智能技术生成

First Order Motion Model for Image Animation 论文笔记

模型结构
- motion estimation module
- image generation module
细节
效果

文中¹希望能够摒弃使用标注的数据进行预训练，转而使用自监督训练，不直接生成整张图片，通过标注keypoints和遮罩层分别生成。

模型结构

模型主要分为 motion estimation module 与 image generation module 两部分。
在这里插入图片描述

motion estimation module

motion estimation module 同时输出 dense motion field $\hat{\mathcal{T}}_{\mathrm{S}\leftarrow \mathrm{D}}$ 和 occlusion mask $\hat{\mathcal{O}}_{\mathbf{S} \leftarrow \mathbf{D}}$ 。其中 dense motion field 将Driving Video $D$ 中的每一点映射到Source Image图像 $S$ 上，其中应用了反向光流。 occlusion mask 标注了能够通过变换得到的部分与需要生成的部分。在计算过程中，假定了一个中间量 $R$ ，分别计算 $\mathcal{T}_{\mathbf{S} \leftarrow \mathbf{R}}$ 和 $\mathcal{T}_{\mathbf{D} \leftarrow \mathbf{R}}$ 最后合成为 ${\mathcal{T}}_{\mathbf{S} \leftarrow \mathbf{D}}$ 。

计算 ${\mathcal{T}}_{\mathbf{S} \leftarrow \mathbf{D}}$ 时使用了Keypoint Detector 使用了auto-encoder结构，抽出其中的特征keypoints，再使用局部仿射变换。最后同时输出keypoint 与仿射变换。加入仿射变换能使模型应对更复杂的变换。将两组 $\mathcal{T}$ 结合Source Image后属兔Dense Motion输出 $\hat{\mathcal{T}}_{\mathrm{S}\leftarrow \mathrm{D}}$ 与 $\hat{\mathcal{O}}_{\mathbf{S} \leftarrow \mathbf{D}}$ 。

image generation module

根据motion estimation module给出的信息与source image渲染图片。

细节

${\mathcal{T}}_{\mathbf{S} \leftarrow \mathbf{D}}$

将 ${\mathcal{T}}_{\mathbf{S} \leftarrow \mathbf{D}}$ 分解成 $\mathcal{T}_{\mathbf{S} \leftarrow \mathbf{R}}$ 和 $\mathcal{T}_{\mathbf{D} \leftarrow \mathbf{R}}$ ，并将问题转化为 ${\mathcal{T}}_{\mathbf{X} \leftarrow \mathbf{D}}$ ，其中 $X$ 为给定的一张图片。并求关于 $R$ 的keypoints $p_1,...p_k$ 在其领域的一阶泰勒展开。对于图像 $X, S, D$ 中的keypoints用 $z$ 表示。下面是 ${\mathcal{T}}_{\mathbf{X} \leftarrow \mathbf{R}}$ 在 $p_k$ 处的一阶展开：
$\mathcal{T}_{\mathbf{X} \leftarrow \mathbf{R}}(p)=\mathcal{T}_{\mathbf{X} \leftarrow \mathbf{R}}\left(p_{k}\right)+\left(\left.\frac{d}{d p} \mathcal{T}_{\mathbf{X} \leftarrow \mathbf{R}}(p)\right|_{p=p_{k}}\right)\left(p-p_{k}\right)+o\left(\left\|p-p_{k}\right\|\right)$