[论文速读] : ICCV 2019 少量学习无监督的图像翻译 Few-Shot Unsupervised Image-to-Image Translation

Few-Shot Unsupervised Image-to-Image Translation

[paper] [github]

Fig. 1 Training. The training set consists of images of various object classes (source classes). We train a model to translate images between these source object classes. Deployment. We show our trained model very few images of the target class, which is sufficient to translate images of source classes to analogous images of the target class even though the model has never seen a single image from the target class during training. Note that the FUNIT generator takes two inputs: 1) a content image and 2) a set of target class images. It aims to generate a translation of the input image that resembles images of the target class.

目录

Few-Shot Unsupervised Image-to-Image Translation

Abstract

Overview of the FUNIT

Few-shot Image Translator

Multi-task Adversarial Discriminator

Loss Functions


Abstract

Unsupervised image-to-image translation methods learn to map images in a given class to an analogous image in a different class, drawing on unstructured (non-registered) datasets of images.

While remarkably successful, current methods require access to many images in both source and destination classes at training time. We argue this greatly limits their use.

Drawing inspiration from the human capability of picking up the essence of a novel object from a small number of examples and generalizing from there, we seek a few-shot, unsupervised image-to-image translation algorithm that works on previously unseen target classes that are specified, at test time, only by a few example images.

Our model achieves this few-shot generation capability by coupling an adversarial training scheme with a novel network design.

Through extensive experimental validation and comparisons to several baseline methods on benchmark datasets, we verify the effectiveness of the proposed framework. Our implementation and datasets are available at https://github.com/NVlabs/FUNIT.

 

第一句,背景解释:什么是 Unsupervised Image-to-Image Translation(废话)。

第二、三句,提出问题:虽然目前的方法非常成功,但需要在训练时同时访问源和目标类中的许多图像。我们认为这极大地限制了它们的使用。

第四句,motivation:人类从少量例子中获取新事物的本质并从中归纳的能力,本文根据这个启发,提出了 few-shot 非监督方法,在测试时,仅对几个示例图像指定的、以前未看到的 目标类上进行图像翻译(这句话有点难理解,其实就是说,测试时提供少量的 目标类别,而这些 类别 是在训练时没有出现过的类别)。

第五句,算法简述将对抗学习与一种新的网络设计相结合 实现算法,称之为 FUNIT。

最后两句,实验结论:。。。。

 

Overview of the FUNIT

The proposed FUNIT framework aims at mapping an image of a source class to an analogous image of an unseen target class by leveraging a few target class images that are made available at test time.

To train FUNIT, we use images from a set of object classes (e.g. images of various animal species), called the source classes. We do not assume existence of paired images between any two classes (i.e. no two animals of different species are at exactly the same pose). We use the source class images to train a multi-class unsupervised image-to-image translation model.

During testing, we provide the model few images from a novel object class, called the target class. The model has to leverage the few target images to translate any source class image to analogous images of the target class. When we provide the same model few images from a different novel object class, it has to translate any source class images to analogous images of the different novel object class.

FUNIT 的目的:在测试时,将输入图像从源类别转换到给定类别,这些给定类别是在训练时没有出现过的。

训练时:训练样本是非配对的,利用源类图像训练一个多类无监督图像-图像转换模型。

测试时:给定的图像类别是训练时没出现过的。

 

 

Our framework consists of a conditional image generator G and a multi-task adversarial discriminator D. Unlike the conditional image generators in existing unsupervised image-to-image translation frameworks [55, 29], which take one image as input, our generator G simultaneously takes a content image x and a set of K class images {y1, ..., yK} as input and produce the output image via

We assume the content image belongs to object class cx while each of the K class images belong to object class cy. In general, K is a small number and cx is different from cy. We will refer G as the few-shot image translator.

传统的非监督条件生成器:一张输入图片;

本文的生成器:一张内容图片 x 和一个类别图像集合 K

x 的类别与 K 中所有类别都不同。

 

As shown in Figure 1, G maps an input content image x to an output image , such that looks like an image belonging to object class cy, and and x share structural similarity. Let \mathbb{S} and  \mathbb{T} denote the set of source classes and the set of target classes, respectively. During training, G learns to translate images between two randomly sampled source classes cx, cy ∈ \mathbb{S} with cx \neq cy. At test time, G takes a few images from an unseen target class c ∈ \mathbb{T} as the class images, and maps an image sampled from any of the source classes to an analogous image of the target class c.

 

 

Few-shot Image Translator

The few-shot image translator G consists of a content encoder Ex, a class encoder Ey, and a decoder Fx. The content encoder is made of several 2D convolutional layers followed by several residual blocks [16, 22]. It maps the input content image x to a content latent code zx, which is a spatial feature map. The class encoder consists of several 2D convolutional layers followed by a mean operation along the sample axis. Specifically, it first maps each of the K individual class images {y1, ..., yK} to an intermediate latent vector and then computes the mean of the intermediate latent vectors to obtain the final class latent code zy.

这段没啥可翻译的,描述了网络结构,如下图。

 

Figure 6. Visualization of the generator architecture. To generate a translation output x¯, the translator combines the class latent code zy extracted from the class images y1, ...yK with the content latent code zx extracted from the input content image. Note that nonlinearity and normalization operations are not included in the visualization.

The decoder consists of several adaptive instance normalization (AdaIN) residual blocks [19] followed by a couple of upscale convolutional layers. The AdaIN residual block is a residual block using the AdaIN [18] as the normalization layer. For each sample, AdaIN first normalizes the activations of a sample in each channel to have a zero mean and unit variance. It then scales the activations using a learned affine (仿射) transformation consisting of a set of scalars and biases. Note that the affine transformation is spatially invariant and hence can only be used to obtain global appearance information. The affine transformation parameters are adaptively computed using zy via a two-layer fully connected network.

解码器中的前两层用了 AdaIN 网络,参考链接:(ICCV 2017) Arbitrary style transfer in real-time with adaptive instance normalization

注意,仿射变换是空间不变量,因此只能用于获取全局外观信息。通过两层全连通网络,利用zy自适应计算仿射变换参数。

With Ex, Ey, and Fx, (1) becomes

By using this translator design, we aim at extracting classinvariant latent representation (e.g., object pose) using the content encoder and extracting class-specific latent representation (e.g., object appearance) using the class encoder. By feeding the class latent code to the decoder via the AdaIN layers, we let the class images control the global look (e.g., object appearance), while the content image determines the local structure (e.g., locations of eyes)

通过使用这种转换器设计,其目标是使用内容编码器提取 类不变的潜在表示 (例如,对象姿态),使用类编码器提取 类特定的潜在表示 (例如,对象外观)。通过AdaIN层将类的潜在代码提供给解码器,让类图像控制 全局外观 (例如,对象外观),而内容图像决定 局部结构 (例如,眼睛的位置)

 

Multi-task Adversarial Discriminator

Our discriminator D is trained by solving multiple adversarial classification tasks simultaneously. Each of the tasks is a binary classification task determining whether an input image is a real image of the source class or a translation output coming from G. As there are \mathbb{S} source classes, D produces \mathbb{S} outputs. When updating D for a real image of source class cx, we penalize D if its cxth output is false. For a translation output yielding a fake image of source class cx, we penalize D if its cx-th output is positive. We do not penalize D for not predicting false for images of other classes (\mathbb{S}\{cx}). When updating G, we only penalize G if the cx-th output of D is false. We empirically find this discriminator works better than a discriminator trained by solving a much harder \mathbb{S}-class classification problem.

 

多任务判别器是通过同时解决多个对抗分类任务来训练的。每个任务都是一个二值分类任务,确定输入图像是源类的真实图像还是来自生成器的转换输出。由于有 \mathbb{S} 源类,D 产生 \mathbb{S} 输出。

 

Loss Functions

We train the proposed FUNIT framework by solving a minimax optimization problem given by

where LGAN, LR, and LF are the GAN loss, the content image reconstruction loss, and the feature matching loss.

三个损失函数:GAN,L1,和 特征匹配损失。

这里要注意 L1,可不是直接拿输出与所谓的 GT 比较,这里没有 GT。那怎么用 L1 损失呢?

The content reconstruction loss helps G learn a translation model. Specifically, when using the same image for both the input content image and the input class image (in this case K = 1), the loss encourages G to generate an output image identical to the input

这里,让 类别图像 与 内容图像 相同就可以了。

 

 

 

 

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: "Mit-Cheetah-Note"是一种学习辅助工具,旨在提高学生的学习效率和效果。它结合了MIT(麻省理工学院)的学习方法和猎豹速读技术。 首先,MIT-Cheetah-Note采用了麻省理工学院的学习方法。这些方法包括主题导图,问题解决和概念联系等。主题导图是一种可视化的学习工具,帮助学生整理和理解知识点之间的关系。问题解决则鼓励学生通过提出问题来主动思考和深入理解知识。概念联系是通过将新知识与已有知识相结合,加深学生对知识的理解。 其次,这个学习工具还集成了猎豹速读技术。速读是一种训练阅读效率和记忆力的技巧。通过使用猎豹速读技术,学生可以提高阅读速度和理解能力。这对于大量阅读任务的学生来说尤其有用,如备考、论文写作等。 MIT-Cheetah-Note采用了数码笔和智能设备相结合的方式进行学习记录和储存。学生可以使用数码笔在纸上做笔记,并通过智能设备将这些笔记同步到云端。这样一来,学生可以随时随地访问他们的学习记录,从而更好地回顾和复习。 总而言之,MIT-Cheetah-Note是将麻省理工学院的学习方法和猎豹速读技术融入一体的学习辅助工具。它帮助学生提高学习效率和效果,并通过数字化技术方便学生的学习记录和辅助复习。 ### 回答2: Mit-Cheetah-Note 是一种人工智能语音助手,最初由麻省理工学院(MIT)研发。该技术基于深度学习和自然语言处理,在提供智能语音交互的同时,还具备类似于记事本的功能。 Mit-Cheetah-Note 可以用于多个方面,例如记录会议笔记、制定待办事项、管理日程安排等。用户可以通过语音指令来创建笔记,编辑文本内容或者提醒自己日程。Mit-Cheetah-Note 还能理解自然语言,对语音指令做出准确的响应,从而提高用户的工作效率。 与其他语音助手相比,Mit-Cheetah-Note 的特点是其记事本功能。用户可以通过语音输入方式,较快地记录需要记下的信息,而无需手动键入。此外,Mit-Cheetah-Note 还有一个方便的搜索功能,可通过关键词搜索用户之前创建的笔记内容,帮助用户快速找到所需的信息。 Mit-Cheetah-Note 可以应用于多种场景,如商务会议、学术讲座、个人笔记等。它不仅可以减少记笔记的时间和工作量,还可以提高笔记的准确性和完整性。 总之,Mit-Cheetah-Note 是一种集成了语音助手和记事本功能的人工智能技术,使用户能够通过语音指令快速记录信息和管理日程,提高工作效率。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值