开始训练GAN

最新推荐文章于 2023-05-25 10:44:22 发布

yywxl

最新推荐文章于 2023-05-25 10:44:22 发布

阅读量832

点赞数

分类专栏：论文深度学习 gan

本文链接：https://blog.csdn.net/yywxl/article/details/106838567

版权

深度学习同时被 3 个专栏收录

16 篇文章 1 订阅

订阅专栏

论文

13 篇文章 0 订阅

订阅专栏

gan

1 篇文章 0 订阅

订阅专栏

~~好像针对实际问题没有训练出一个好的网络，菜的扣脚, 这次从cycleGAN开始训练GAN的网络~~

文章目录

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

In this paper, we present a method that can learn to do the same: capturing special characteristics of one image collection and figuring out how these characteristics could be translated into the other image collection, all in the absence of any paired training examples

cycle consistency
$G : X - Y$
$F : Y - X$
如果是unpaired 的数据呢？用什么来约束？只是用GAN的loss，训练很难
使用cycle consistency loss
$F(G(x))\approx x \parallel G(F(y))\approx y$

we also tried replacing the L1 norm in this loss with an adversarial loss between F (G(x)) and x, and between G(F (y)) and y, but did not observe improved performance.

Full Objective

$\lambda=10$
如果像约束颜色还需要加入 Identity Loss 0.5*lambda

NiceGAN

No-IndependentComponent-for-Encoding GAN (NICE-GAN)

abstract
开篇就提到现在的框架再训练完后会丢弃掉D，而文章提出了一个新的方法，复用（reusing）D编码目标域（target domain）的图像—NICE-GAN，相较于以前的方案，有两点优势：

这样不需要独立的编码（encoding）组件或者说编码组件被同时运用在D和G，使得模型更紧凑(compact)
编码的插件是直接使用对抗损失训练的，这样信息更有效利用，训练也更有效，如果使用多尺度的判别器
? 这里的训练过程还是有疑问，D训练的时候更新Encoding吗？
摘要中提到了这个问题，耦合Encoding 的D 使得他在GAN Loss min-max 的不一致，所以Encoding 直在最大化GAN Loss 时训练，训练D 的时候是固定的（frozen）
这里还是有疑问，文章反复说the encoder is directly trained through the discriminative
loss ，但是训练D 的时候不是frozen 这部分吗？有点矛盾啊？与其说encoding 复用了D 的编码部分，不如说D 复用了Encoding 部分？存疑，文章后面有解释这个问题
？什么叫多尺度的判别器（mutil-scale discriminator）
后面也有解释

文章还说NICE-GAN 在FID KID 和 human preference 都达到了 SOTA

看下他这几个评价指标

现有translation framework 主要的组成部分

consist of three components for each domain: an encoder to embed the input image to a low-dimension hidden space, a generator to translate hidden vectors to images of the other domain, and a discriminator for domain alignment by using GAN training

Encoder
Generator
Discriminator

motivation
作者思考了这些组件，他们的作用，他们之间的关系，以及是否有可能改变这些组件构建一个更鲁棒的框架？
作者从Encoding 和 Discriminator 之间的关系入手， Discriminator 区分源域(source domain)的转换图像和目标域(target domain) 真实图像。所以D 做的就是在判别图像true or false 之前对图像进行语义编码。因此D 扮演了两个角色，对图像进行编码和分类（encoding and classifying）
文章基于此出发，复用D 的Encoding 部分作为图像生成的Encoding

Encoding training
传统的方法，encoder 最小化GAN Loss，discriminator 最大化目标函数
NICE-GAN Encoder 训练仅仅和discriminator 相关而独立于generator，这样也可以将Encoding 和generator 的translation 解耦，Encoder 变得更通用

related work

The first kind of development is to enable multimodal generations: MUNIT [12] and DRIT [21] decompose the latent space of images into a domain-invariant content space and a domain-specific style space to get diverse outputs

多峰生成器，将图像分解为域不变的内容空间和特定于域的样式空间，得到不同的输出

Another enhancement of CycleGAN is to perform translation across multiple (more than two) domains simultaneously, such as StarGAN [5]

General Formulation

上面对于supervised 问题由联合概率求解条件概率可以得到唯一解，然而对于unsupervised 问题，由边缘分布求解条件概率可以由无数个解，这是一个病态的问题。需要添加额外的约束，权值耦合，循环一致和恒等映射
NICE-GAN 这个结构将D 分为E_{y}^{D} 和C_{y},这样的想法有道理的，但是使得训练更困难了。。。。
文章说后续会阐述这个问题

Architecture

以上框架好像只能在cycleGAN这种循环结构中使用，因为G_x-y 需要用到D_x 的部分Encoder，而单一的GAN 生成器只有D_y

Multi-Scale Discriminators

多尺度的D
Residual attention mechanism [U-GAT-IT]

UGATIT

关于UGATIT,我还没有看，看这个链接可以了解他的做法

spectral normalization
知乎的一个解释
 CSDN的阐述
使D 满足Lipschitz 约束，训练更稳定，实现到时候看代码

Decoupled Training

adversarial loss
least-square adversarial loss

E_x 在训练D_x 时更新，E_y 在训练D_y 时更新，其他时候即训练G 的时候不更新E, 解耦了E 和G 的训练
identity reconstruction loss
cycle-consistency loss

1，10，10

？关于马to斑马
马to斑马的纹理是怎么出来的，输入马的分布被映射为斑马的一种分布，相当于还是一一对应，而不是我想的随机生成斑马分布，这样每次推断就不一样了23333

训练配置

ablation study

NICE 的latent space可以编码两个域类内更聚集，类间更靠近
使用哪一层作为hinder？实验证明whole default encoder
好像Multi-scale Discriminator 这个策略才是最关键的步骤，而且C^1 和C^2 缺一不可啊
Decoupled training analysis—更稳定了

features learned by a discriminatively trained network tend to be more expressive than those learned by an encoder network trained via maximum likelihood, and thus better suited for inference

？Q

NICE 训练过程使得两个分布的latent vector 更聚集和接近，但是仍旧可分
latent vector 更聚集说明了什么？两个分布编码更接近又代表了什么？
第一个问题和共享latent space 有联系吗？
第二个问题文章给了解释By shortening the transition path between domains in the latent space, 缩短了需要变换域（domain）之间的距离（难度）

AdaLIN
AdaLIN (Adaptive Layer-Instance Normalization) 是?
UGATIT中的图，NICEGAN 将clip 改为softmax

看了一遍论文，这边论文写的很好，看了一遍觉得很多地方说的很有道理，但是还不是很懂，需要在斟酌以下，从行文分析和指标来看，文章很有信息，但是看完SP的生成图对比来看可能NICE-GAN 的提升并不是想象的那么好，就背景来说CycleGAN 好像对背景的保留更有效
看代码吧，试一试再来评价
从车窗反射去除的训练来看，这种NICE + Attention 有好处也有坏处，如果Encoder 和attention提取的特征足够好，那么恢复出来的图像会变好，如果没有提取出来，那么某一部分的特征就抑制/丢失了，一片黑，不知道该怎么调。。。

DUNIT: Detection-based Unsupervised Image-to-Image Translation

我们做车窗增强的目的就是去除伪彩反射，但是最终应用是人脸识别，所以想要保持/恢复/提高人脸部分的细节可能有两个方向：1. 改善cycleGAN的pipeline，对整体细节保留/恢复的更好，但是artifacts 可能会随之而来，2. 局部（人脸部分）改善，要么重新训练一个人脸增强的模型(可能会导致融合后图像风格不一致)，要么加到整体训练框架中，使得网络对人脸进行一个attention
看一下这篇文章能否给出一个合理的答案

Abstract
I2I 被用来处理非成对数据和多模态转换问题，但是大多数方法针对整图包含丰富细节的场景会造成图像不真实
Detection-based Unsupervised Image-to-image Transtration(DUNIT) 将全图和实例分别表示，然后将特征表示融合产生转换图像(translation image), 这样在保留目标实例内容细节的同时产生单一一致性场景的图像
文章还提出了instance consistency loss 保证检测之间的一致性(introduce an instance consistency loss to maintain the coherence between the detections)

Furthermore, by incorporating a detector into our architecture, we can still exploit object instances at test time

intruction
针对现有的方法在转换具有许多不同目标实例的内容丰富的图像时的限制，INIT[34] 和InstaGAN 将目标实例和全局图像/背景分离。
InstaGAN 将实例和背景分离，在转换目标实例的同时保证背景的风格
INIT ？？不同的元素(global image and instances) 使用单独的重建损失

INIT independently translates the global image and the instances, using separate reconstruction losses on these different elements. At test time, INIT then uses the global image translation module only, thus discarding the instance-level information. Further, INIT has not used the instance-boosted feature representation which is shown by the merged feature map in figure 2

DUNIT

怎么看不懂上面训练流程的三张图呢？
input style image 是B domain？
以上确实是的，看论文figure 2 DUNIT architecture 就知道是什么意思了

整合了instance image 和global image，因此在推断时也可以用到instance 信息。另外融合instance-level 和 image-level representation models 可以达到一致性的场景转换
instance-consistency loss—为了在训练阶段充分利用检测器，提出了实例一致性损失，约束原始图像和生成图像的实例
在contribution里面有这样一段话

we only need access to ground-truth detections in a single domain. Therefore, our method can also be thought of as performing unsupervised domain adaptation for object detection.
只需要一个domain的gt，但是对于我们的任务可否修改一下instance-consistency loss 变为一个GAN loss 或者一个perceptual loss 或者一个人脸识别的约束

UNIT

UNIT [25], which replaced the domain-specific latent spaces of cycleGAN with a single latent space shared by the domains

我觉有必要看一下UNIT 和INIT 这两篇文章的思路好像蛮好的

Domain Adapation
文章看到这里，经常出现一个词叫做Domain Adapation 域自适应性，作何解释？

上面没看懂，看下面两个
知乎
Domain Adaptation for Image Dehazing(文章待看)

Architecture

以上网络架构在测试的时候还需要一张B domain 提取style feature吗？
global features 和instance features merge 的策略？bilinear sampling strategy

Training
loss

一个一个解释
content adversarial loss
判别XY 的content features
怎么优化？
domain adversarial loss
这个和cycleGAN 里面的两个判别器器一样的，判别 XY domain
cross-cycle consistency loss
循环一致损失
self-reconstruction loss
重建一致性损失
KL loss
约束 XY 的style feature 接近一个标准分布
其实不太理解
latent regression loss
不懂
instance consistency loss

RetinaNet detector
文章好像过于复杂，在对比实验中，论文和论文的backbone DRIT 在instance 上确实要更胜一筹，给出的示例不多，无法进行更多判断
先看DRIT 吧，不太想训练这个。。。。

DRIT—Diverse Image-to-Image Translation via Disentangled Representations

abstract
I2I 存在两个难点

缺少成对的数据
从单一输入图像可能有多种可能的输出
问题1，cycleGAN 基本解决了
文章针对问题2提出 disentangled representation for producing diverse outputs without paired training images
将图像embed 为两个空间 a domain-invariant content space capturing shared
information across domains and a domain-specific attribute space

content features
latent attribute vector

对比一下，好像很有道理！

content adversarial loss
让content feature 不要携带domain-specific cues
latent regression loss
invertible mapping between the latent attribute vectors and the corresponding outputs
属性向量和相关输出可逆？？
cross-cycle consistency loss
不知道和cycleGAN 有什么区别？

训练和测试时

first perform a cross-domain mapping to obtain intermediate results by swapping the attribute vectors from both images. We can then reconstruct the original input image pair by applying the cross-domain mapping one more time and use the proposed cross-cycle consistency loss to enforce the consistency between the original and the reconstructed images. At test time, we can use either 1) randomly sampled vectors from the attribute space to generate diverse outputs or 2) the transferred attribute vectors extracted from existing images for example-guided translation

看上面的a 的流程怎么和CycleGAN不一样，其实只是写法不一样，G_x 生成猫，G_y 生成狗，这个L^cc 感觉有点东西的

Disentangle Content and Attribute Representation
为了表示的解耦，使用了两个策略

weight-sharing
基于引文[27] 的假设两个域共享同样的隐空间(latent space), 我们共享E{_x}{^c} E{_y}{^c} 的最后一层，G_x G_y 的第一层

force the content representation to be mapped onto the same space

不太懂，通过相同的操作K变换会有用吗？假设share一个恒等变换。。。
2. content discriminator

sharing the same high-level mapping functions cannot guarantee the same content representations encode the same information for both domains

他的意思是上面对分布空间进行约束，这个可以对分布的信息进行约束

但是上面这个式子是认真的吗？同时最大化和最小化一个值，但是他是两个domain 联合训练的
不太懂

Cross-cycle Consistency Loss
看上面的图就好理解了

other loss function

Latent regression loss[49]

上面的说法有点奇怪，但是好像也有点意思，两个共用一个Attribute z,

full object function

training
网络结构和超参设置看文章及代码
针对Content representation使用了 L1 权值正则化 0.01，指的是E^c?
GAN 用的是DCGAN

看结果，竟然还不如CycleGAN realistic。。。。

DRIT++
DRIT 后面放出的文章

添加了mode seeking regularization L_ms
mode seking regularization[37]
Multi-domain I2I framework

不再区分E_x E_y G_x G_y
域分类损失

multi-scale generator-discriminator

这篇文章

有关多域转换，可以用在我们针对多种风格反射任务上
有关HR图像生成是我们现在迫切需求的
有关共享Encoder 和Generator 的问题是否比单一生成器好，待验证
多域分类器
多尺度生成-判别器在高分辨率图像生成的应用
其实文章还有好多东西没懂，看代码，再读文章！！！

20200923 总结

自cycleGAN 以来主要训练的就是他，使用的生成器是ResNet，判别器是多尺度的patchGAN，另外还训练了NICEGAN,DRIT,主要收获和问题：
1，GAN 的损失权重是很不好调的
2，ResNet 生成器的反卷积具有非常严重的网格效应，为什么大家都在用？可不可以用插值+卷积替代
3，学会了使用多尺度的判别器
4，网络的深度和UNet 的跨层连接对于生成器不一定是好的
5，以前不理解DRIT，看了VAE有点理解了
6，NICEGAN 再看起来是一个非常好的idea，当时为什么没有调呢？
7，没有尝试WGAN,后面训练尝试下
8，GAN 至少cycleGAN 是一个非常麻瓜的模型，他学习到的风格较为单一

UNIT—Unsupervised Image-to-Image Translation Networks

开篇又重新复述了这个非监督学习的问题，从边缘分布推断联合分布，条件是不充分的，所以需要对联合分布的结构有额外的假设，即是本文的假设——两者共享隐空间分布

framework

Q: shared-latent space constraint implies the cycle-consistency constraint?
A:

为了实现共享隐空间的假设，进一步约束他们共享中间层的表示h

VAE
weight-sharing
GANs
Cycle-consistency(CC)

LOSS

看代码吧
个人感觉UNIT 代码写的比较多，当然越多对于新手来说就觉得越乱，按照version_1 的代码尝试训练，生成的图像相对模糊，他的模型结构是通过权值共享约束latent space 的均值为零，并不和VAE 一样预测均值和方差，然后约束KL 散度

INIT—Towards Instance-level Image-to-Image Translation

TuiGAN

本来看到摘要-one-short unsupervised learning 感觉还不错，但是i看了效果图，一言难尽…
在这里插入图片描述

yywxl

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
开始训练GAN

好像针对实际问题没有训练出一个好的网络，菜的扣脚, 这次从cycleGAN开始训练GAN的网络文章目录Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial NetworksUnpaired Image-to-Image Translation using Cycle-Consistent Adversarial NetworksIn this paper, we present a method that c
复制链接

扫一扫

专栏目录