2021-06-03 【论文笔记】Cross-domain Correspondence Learning for Exemplar-based Image Translation

最新推荐文章于 2024-08-21 10:01:16 发布

kk_罗特

最新推荐文章于 2024-08-21 10:01:16 发布

阅读量399

点赞数 1

文章标签：深度学习

原文链接：https://panzhang0212.github.io/CoCosNet/

版权

论文题目：Cross-domain Correspondence Learning for Exemplar-based Image Translation

论文主页：https://panzhang0212.github.io/CoCosNet/

论文链接：https://arxiv.org/abs/2004.05571

代码链接：https://github.com/microsoft/CoCosNet

摘要

本文提供了一种图像翻译的通用框架，它从输入的语义图像合成真实的照片图像。与常规不同的是这个框架可以再输入一个 exemplar image，以这个 exemplar image的风格来输出最后的真实照片图像。这个exemplar图像给输出图像更多的限制，也提供了更多信息。在这里插入图片描述
这个框架主要由两部分组成，一是解决跨域语义对应的Cross domain correspondence Network，二是解决翻译生成图像的Translation network。传统方法的理论只能处理自然图像直接的关系，无法处理跨域图像，但本框架可以处理跨域图像的问题。
在这里插入图片描述
Cross domain correspondence Network：
首先建立了位于不同领域的input和exemplar image之间的对应关系，并对exemplar image进行了相应的扭曲，使其语义与input一致。具体是把两个域的图像映射到一个中间域，找到对应关系，从而扭曲exemplar image.
在这里插入图片描述
input图像xA属于A域，exemplar图像yB属于B域，作者通过把xA 和yB放入feature pyramid network（利用FPN方法）提取特征，转化为中间域S的xS和yS.
其中 $\theta_{\mathcal{F}}$ 是需要学习的参数
此步骤损失函数为：

$\mathcal{L}_{\text {domain }}^{\ell_{1}}=\left\|\mathcal{F}_{A \rightarrow S}\left(x_{A}\right)-\mathcal{F}_{B \rightarrow S}\left(x_{B}\right)\right\|_{1}$

由于XA和YB是不同域图像，但包含相同语义，他们转化到S域之后应当尽量对其，故损失函数为使两者在S域中的映射之间的差别。应使这个差异最小。

xA和yB都映射到域S之后，计算一个S域中他们俩的相关矩阵，然后通过softmax加权选择yB中最相关的像素。
$\mathcal{M}(u, v)=\frac{\hat{x}_{S}(u)^{T} \hat{y}_{S}(v)}{\left\|\hat{x}_{S}(u)\right\|\left\|\hat{y}_{S}(v)\right\|}$

$r_{y \rightarrow x}(u)=\sum_{v} \operatorname{softmax}_{v}(\alpha \mathcal{M}(u, v)) \cdot y_{B}(v)$

损失函数为： $\mathcal{L}_{r e g}=\left\|r_{y \rightarrow x \rightarrow y}-y_{B}\right\|_{1}$

在这里插入图片描述

Translation Network：
把扭曲的exemplar image合成输出图像。从一个固定的常量z开始，通过卷积逐步扭曲图像的风格信息。

$\alpha_{h, w}^{i}\left(r_{y \rightarrow x}\right) \times \frac{F_{c, h, w}^{i}-\mu_{h, w}^{i}}{\sigma_{h, w}^{i}}+\beta_{h, w}^{i}\left(r_{y \rightarrow x}\right)$

$\alpha^{i}, \beta^{i}=\mathcal{T}_{i}\left(r_{y \rightarrow x} ; \theta_{\mathcal{T}}\right)$

最终生成图像：

$\hat{x}_{B}=\mathcal{G}\left(z, \mathcal{T}_{i}\left(r_{y \rightarrow x} ; \theta_{\mathcal{T}}\right) ; \theta_{\mathcal{G}}\right)$
在这里插入图片描述
最终网络为七层，得到输出图片。
另外的一些损失函数：
第一个是伪参考图像对损失，xB作为真实值，xB’是xB的变形，保持图片内容不变，如翻转等。如果吧xB’作为exemplar image，xA作为input，那么生成图像应接近xB。故损失函数为：

$\mathcal{L}_{\text {feat }}=\sum_{l} \lambda_{l}\left\|\phi_{l}\left(\mathcal{G}\left(x_{A}, x_{B}^{\prime}\right)\right)-\phi_{l}\left(x_{B}\right)\right\|_{1}$

第二个是参考图像转换损失，其中包含两项，perceptual loss和contextual loss。
perceptual loss:

$\mathcal{L}_{\text {perc }}=\left\|\phi_{l}\left(\hat{x}_{B}\right)-\phi_{l}\left(x_{B}\right)\right\|_{1}$

contextual loss:

$\mathcal{L}_{\text {context }}=\sum_{l} \omega_{l}\left[-\log \left(\frac{1}{n_{l}} \sum_{i} \max _{j} A^{l}\left(\phi_{i}^{l}\left(\hat{x}_{B}\right), \phi_{j}^{l}\left(y_{B}\right)\right)\right)\right]$

最后是Adversarial loss：

$\mathcal{L}_{a d v}^{\mathcal{D}}=-\mathbb{E}\left[h\left(\mathcal{D}\left(y_{B}\right)\right)\right]-\mathbb{E}\left[h\left(\mathcal{D}\left(\mathcal{G}\left(x_{A}, y_{B}\right)\right)\right)\right]$

$\mathcal{L}_{a d v}^{\mathcal{G}}=-\mathbb{E}\left[\mathcal{D}\left(\mathcal{G}\left(x_{A}, y_{B}\right)\right)\right]$

最终损失函数为：

$\begin{aligned} \mathcal{L}_{\theta}=\min _{\mathcal{F}, \mathcal{T}, \mathcal{G}} & \max _{\mathcal{D}} \psi_{1} \mathcal{L}_{\text {feat }}+\psi_{2} \mathcal{L}_{\text {perc }}+\psi_{3} \mathcal{L}_{\text {context }} \\ &+\psi_{4} \mathcal{L}_{a d v}^{\mathcal{G}}+\psi_{5} \mathcal{L}_{\text {domain }}^{\ell_{1}}+\psi_{6} \mathcal{L}_{\text {reg }}\end{aligned}$

实验
生成图像对比：
在这里插入图片描述

在这里插入图片描述

跨领域的相关度
利用correlation matrix可以计算输入语义图像和输入参考风格图像之间不同点的对应关系

在这里插入图片描述

图像编辑
给定一张图像及其对应的mask，对语义mask进行修改，再将原图像作为参考风格图像
在这里插入图片描述
方法限制

示例图像中的两辆不同颜色汽车同时与input中的汽车相对应，方法可能会产生混合颜色伪影，与现实不符；此外，在多对一映射（第二行)的情况下，多个实例(图中的枕头)可能使用相同的样式