ELEGANT: Exchanging Latent Encodings with GAN for Transferring Multiple Face Attributes

最新推荐文章于 2023-07-29 15:06:35 发布

qq_36356761

最新推荐文章于 2023-07-29 15:06:35 发布

阅读量849

点赞数

分类专栏： paper reading notes

本文链接：https://blog.csdn.net/qq_36356761/article/details/81025718

版权

paper reading notes 专栏收录该内容

19 篇文章 2 订阅

订阅专栏

ELEGANT: Exchanging Latent Encodings with GAN for Transferring Multiple Face Attributes

Taihong Xiao, Jiapeng Hong, and Jinwen Ma

Abstract

task: face attribute transfer
existing method: image-to-image translation
limitations:(1) failing to make image generation by exemplars; (2) unable to deal with multiple face attributes imultaneously; (3) low-quality generated images
solution: a novel model that receives two images of different attributes as inputs. All the attributes are encoded in the latent space in a disentangled manner(和NICE、Glow有点像？), learn the residual images so as to facilitate training on higher resolution images, generate high-quality images with fi ner details and less artifacts

Introduction

transferring face attributes: $\subset$ conditional image generation. A source face image would be modi ed to contain the targeted attribute, while the person identity should be preserved during the transferring process.
这里写图片描述

existing method:

method	principle	drawbacks
Deep Manifold Traversal	approximate the natural image manifold and compute the attribute vector from the source domain to the target domain by using maximum mean discrepancy (MMD)	suffers from unbearable time and memory cost
Visual Analogy-Making	uses a pair of reference images of the same person but different status to specify the attribute vector. Under the Linear Feature Space assumptions of feature space, image transfering can be formulated as $I_2 = f^{-1}(f(I_1) + v)$ , $f$ is encoding/feature-extractingn function, $v$ is the attribute vector	attribute can be different between inter-classes
GAN-based image-to-image translation	dual learning	according to Invariance of Domain Theorem, inappropriate to GAN
conditional image generation	receive image labels as the condition for generating images with desired attributes	not able do image generation by exemplars
BicycleGAN	introduced a noise term to increase the diversity	unable to generate images of certain attributes

Purpose and Intuition of Our Work

purpose	method
image generation by exemplars	receive a reference for conditional image generation as latent variable/feature
deal with multiple face attributes simultaneously	disentangle multiple attributes
satisfying quality	residual learning

Our Method

The ELEGANT Model

$A \in$ positive set, with the $i$ -th attribute
$B \in$ negative set, without the $i$ -th attribute
$A, B$ are not matched
use an encoder Enc to obtain the latent encodings of images $A$ and $B$

z = Enc (x), x = A, B, z \in R n

$z = \text{Enc}(x), x = A,B, z \in R^n$

z[i] z [ i ] $z[i]$ encodes the information of the

i i $i$ -th attribute of image. split the tensor

z_{A}

$z_A$ into

n n $n$ parts along with its channel dimension
problem: Such disentangled representation has to be learned
solution: iterative training strategy (train the model with respect to a single attribute each time and recurrently go over all attributes). Given

A, B

$A,B$ have different attribute at the

i i $i$ -th position (whatever attributes at other positions), exchange the

i

$i$ -th part in their latent encodings

zA,zB z A , z B $z_A, z_B$ , z_C,z_D = swap(z_A,z_B,i)
decode: learn the residual images rather than the original image.

A' = A + Dec(concat (z A, z A)) C = A + Dec(concat (z C, z A)) B' = B + Dec(concat (z B, z B)) D = B + Dec(concat (z D, z B))

$A'=A+\text{Dec(concat}(z_A,z_A))\\C=A+\text{Dec(concat}(z_C,z_A))\\B'=B+\text{Dec(concat}(z_B,z_B))\\D=B+\text{Dec(concat}(z_D,z_B))$
generator = encoder + decoder
这里写图片描述

discriminator: multi-scale, 2 discriminators that have identical network structure but operate at different image scales. $D_1$ larger, guiding the Enc and Dec to produce ner details; $D_2$ smaller, handling the overall image content so as to avoid generating grimaces.

Loss Functions

L D i = \sum X = A, B E (- log D i (X | Y X)) + \sum X = C, D E (- log (1 - D i (X | Y X))), i = 1, 2 L r e c o n s t r u c t = \sum X = A, B ∥ X - X' ∥ L a d v = \sum i = 1, 2 \sum X = C, D E (- log D i (X | Y X)) L G = L r e c o n s t r u c t + L a d v

$L_{D_i} = \sum_{X = A,B} E(-\log D_i(X|Y^X)) + \sum_{X = C,D} E(-\log (1-D_i(X|Y^X))), i = 1,2\\ L_{reconstruct} = \sum_{X = A,B}\lVert X-X' \rVert \\ L_{adv} = \sum_{i = 1,2}\sum_{X = C,D} E(-\log D_i(X|Y^X))\\ L_G = L_{reconstruct} + L_{adv}$

qq_36356761

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ELEGANT: Exchanging Latent Encodings with GAN for Transferring Multiple Face Attributes

ELEGANT: Exchanging Latent Encodings with GAN for Transferring Multiple Face AttributesTaihong Xiao, Jiapeng Hong, and Jinwen MaAbstracttask: face attribute transfer existing method: image-to-i...
复制链接

扫一扫