Generative Adversarial Networks（CGAN、CycleGAN、CoGAN）

最新推荐文章于 2024-09-03 18:11:44 发布

上杉翔二

最新推荐文章于 2024-09-03 18:11:44 发布

阅读量3.5k

点赞数 1

分类专栏：深度学习文章标签：生成对抗 CGAN CoGAN CycleGAN Python

本文链接：https://blog.csdn.net/qq_39388410/article/details/96306813

版权

深度学习专栏收录该内容

109 篇文章 171 订阅

订阅专栏

在这里插入图片描述
很久前整理了GAN和DCGAN，主要是GAN的基本原理和训练方法，以及DCGAN在图像上的应用，模式崩溃问题等。其核心思想就是通过训练两个神经网络，一个用来生成数据，另一个用于在假数据中分类出真数据，并且同时训练它们使其收敛到某一点，那么在这个点上，训练好的生成器就可以生成“新的且真实”的数据。

Conditional Generative Adversarial Network (条件生成对抗网络，CGAN)
GAN可以从随机噪声中生成数据，如果在猫的数据集中进行训练，网络将会生成猫的图像，如果在狗的图像中进行训练，网络将会生成狗的图像，但是如果同时在这个两个数据集中进行训练时，只能得到“猫狗”的半模糊图像，而不是能够指定生成“猫”or“狗”。同样的在前一篇中生成的数字，也不能够指定生成具体的某个“1”or“2”，是随机生成的。

CGAN如其名“条件”，即为生成器、判别器都额外加入了一个条件，这个条件实际是希望生成的标签，也就是指通过告诉生成器生成某个特定类的图像，如只生成数字“1”的图像。故判别器不仅要判别图像是否真实，还要判别图像和条件y是否匹配，故GAN的输入应该变为：

生成器输入：噪声z+ 条件y，输出符合该条件的图像 G(z|y)
判别器输入：图像x+ 条件y，输出该图像在该条件下的真实概率 D(x|y)

具体来说如上图，CGAN会将由one-hot编码过的标签向量y，与随机噪声z进行连接再进行训练，便可以利用同一个GAN来指定生成了。

#生成器
def build_generator(inputs, labels, image_size):
    image_resize = image_size // 4
    kernel_size = 5
    layer_filters = [128, 64, 32, 1]

    x = concatenate([inputs, labels], axis=1) #合并input和条件标签
    x = Dense(image_resize * image_resize * layer_filters[0])(x)
    x = Reshape((image_resize, image_resize, layer_filters[0]))(x)

    for filters in layer_filters:
        # 前两个卷积strides为2，最后一层为1
        if filters > layer_filters[-2]:
            strides = 2
        else:
            strides = 1
        #BN-ReLU-Conv生成假图片
        x = BatchNormalization()(x)
        x = Activation('relu')(x)
        x = Conv2DTranspose(filters=filters,
                            kernel_size=kernel_size,
                            strides=strides,
                            padding='same')(x)

    x = Activation('sigmoid')(x) #使用Sigmoid而不是tanh
    generator = Model([inputs, labels], x, name='generator') #得到生成器
    return generator

#判别器
def build_discriminator(inputs, labels, image_size):
    kernel_size = 5
    layer_filters = [32, 64, 128, 256]

    x = inputs
    y = Dense(image_size * image_size)(labels)
    y = Reshape((image_size, image_size, 1))(y)
    x = concatenate([x, y]) #合并input和Dense后的标签

    for filters in layer_filters:
        if filters == layer_filters[-1]:
            strides = 1
        else:
            strides = 2
        # LeakyReLU-Conv来辨别图像
        x = LeakyReLU(alpha=0.2)(x)
        x = Conv2D(filters=filters,
                   kernel_size=kernel_size,
                   strides=strides,
                   padding='same')(x)

    x = Flatten()(x)
    x = Dense(1)(x)
    x = Activation('sigmoid')(x)
    discriminator = Model([inputs, labels], x, name='discriminator') #得到判别器
    return discriminator

#训练函数，交替训练生成器和判别器
def train(models, data, params):
    generator, discriminator, adversarial = models
    x_train, y_train = data
    batch_size, latent_size, train_steps, num_labels, model_name = params
    save_interval = 500
    # 生成器噪音
    noise_input = np.random.uniform(-1.0, 1.0, size=[16, latent_size])
    #ont-hot编码的条件标签
    noise_class = np.eye(num_labels)[np.arange(0, 16) % num_labels]
    train_size = x_train.shape[0]

    print(model_name,"Labels for generated images: ",np.argmax(noise_class, axis=1))

    for i in range(train_steps):
        # 随机选择真图像
        rand_indexes = np.random.randint(0, train_size, size=batch_size)
        real_images = x_train[rand_indexes]
        real_labels = y_train[rand_indexes]
        # 生成噪音
        noise = np.random.uniform(-1.0, 1.0, size=[batch_size, latent_size])
        # 随机分别假标签
        fake_labels = np.eye(num_labels)[np.random.choice(num_labels, batch_size)]

        #根据噪音和假标签生成假图像
        fake_images = generator.predict([noise, fake_labels])
        x = np.concatenate((real_images, fake_images))
        labels = np.concatenate((real_labels, fake_labels))

        y = np.ones([2 * batch_size, 1])
        # 假图像标签为0
        y[batch_size:, :] = 0.0
        #训练discriminator
        loss, acc = discriminator.train_on_batch([x, labels], y)
        log = "%d: [discriminator loss: %f, acc: %f]" % (i, loss, acc)

        # 随机噪音
        noise = np.random.uniform(-1.0, 1.0, size=[batch_size, latent_size])
        fake_labels = np.eye(num_labels)[np.random.choice(num_labels, batch_size)]
        # 标记假图像为1
        y = np.ones([batch_size, 1])
        # 训练adversarial
        loss, acc = adversarial.train_on_batch([noise, fake_labels], y)
        log = "%s [adversarial loss: %f, acc: %f]" % (log, loss, acc)
        print(log)
        if (i + 1) % save_interval == 0:
            if (i + 1) == train_steps:
                show = True
            else:
                show = False

            plot_images(generator,
                        noise_input=noise_input,
                        noise_class=noise_class,
                        show=show,
                        step=(i + 1),
                        model_name=model_name)
    
    #保存模型
    generator.save(model_name + ".h5")

此后还有一些CGAN的升级版，如

IcGAN，以细化标签one hot 编码，IcGAN 通过编码器先学习到原图到其特征向量的映射，再通过修改特征向量的部分特征作为生成器的输入生成希望生成的特征
ACGAN，它没有选择将条件（样本的类别）直接输入判别器，而是训练判别器对样本进行分类，即判别器不仅需要判断每个样本的真假，还需要预测已知条件（样本的类别，添加一个分类的损失）

paper：https://arxiv.org/abs/1411.1784
pix2pix在线demo：https://affinelayer.com/pixsrv/index.html

在这里插入图片描述
CycleGAN
CycleGAN主要用于处理图像到图像的翻译问题，如上图关于马和斑马的图像合成。其与普通的图像风格转化所不同的就是不必成对训练，即所用来训练的图片不需要代表同一样东西。意味着只要大量搜集图像，便可以完成风格转化。

CycleGAN学习的目标是从一个图像域到另一个图像域的映射F（这种源域转换也就是“对偶”），也就是对应着生成器。F可以将X中的图片x转化为Y中的图片F(x) 。然后对于生成的图片F(x)，再用判别器Dy来判别它是否为真实图片就行了。但这样是无法进行训练的，因为没有成对数据，即F完全可以将所有图片x都映射为Y空间中的同一张图片，所以如名“Cycle”，意在X转换成Y后，同样应该可以回到原状–循环一致性：
$\approx x, x \in X$ $\approx y, y \in Y$
具体实现方法是，组成两个生成器G和F，两个判别器Dx和Dy。即G 从 X 中获取图像，并试图将其映射到 Y 中的某个图像。判别器 Dy 判断图像是由 G 生成的，还是实际上是在 Y 中生成的。同样地，F 从 Y 中获取一个图像，并试图将其映射到 X 中的某个图像，判别器 Dx 预测图像是由 F 生成的还是实际存在于 X 中。
在这里插入图片描述
损失函数为：
$L_{cyc}(G,F,X,Y) = \mathbb{E}_{x \sim p_{data}(x)}[||F(G(x))-x||_1]+\mathbb{E}_{y \sim p_{data}(y)}[||G(F(y))-y||_1]$ 整体的损失函数就变为了： $L=L_{GAN}(G,D_Y,X,Y)+L_{GAN}(F,D_X,Y,X)+L_{cyc}(G,F,Y,X)$

# reader负责读入X、Y空间
X_reader = Reader (self.X_train_file,name='X',image_size=self.image_size, batch_size=self.batch_size) 
Y_reader = Reader (self.Y_train_file,name='Y',image_size=self.image_size, batch_size=self.batch_size) 
# 保存数据
x = X_reader.feed() 
y = Y_reader.feed() 
# self.G将x映射到y
# self.F将y映射到x
＃ cycle_loss
cycle_loss= self.cycle_consistency_loss(self.G, self.F, x, y) 

# x -> y
fake_y = self.G(x) 
＃生成器loss
G_gan_loss = self.generator_loss (self.D_Y, fake_y, use_lsgan=self.use lsgan) 
G_loss = G_gan_loss + cycle_loss 
＃判别器loss
D_Y_loss = self.discriminator_loss(self.D_Y, y, self .fake_y, use_lsgan= self. use _lsgan) 
# y -> x 
fake_x = self.F(y) 
＃F生成图片的loss
F_gan_loss = self.generator_loss(self.D_X, fake_x, use_lsgan=self.use_lsgan) 
F_loss = F_gan_loss + cycle_loss 
#判别Y空间上的loss
D_X_loss =self.discriminator_loss(self.D_X, x, self.fake x, use_lsgan=self.use_lsgan)

code：https://github.com/junyanz/CycleGAN
paper：https://arxiv.org/abs/1703.10593

Coupled Generative Adversarial Networks（耦合生成对抗网络，CoGAN）
如名“耦合”，即需要多个GAN来进行耦合，其思想核心在于人多力量大。同样通过组建两组GAN进行训练，但与Cycle不同的是，CoGAN将会在这两组GAN中间进行对抗。
在这里插入图片描述
具体来说，将两个生成模型组成一个团队，在两个不同的域中合成一对图像，以混淆区分模型。而两个判别模型也为一对，将负责从各自领域的训练数据分布中提取的图像与从各自生成模型中提取的图像区分开来。

网络结果如上图，可以发现图中存在的weight sharing，即CoGAN是利用网络层的权重共享约束来训练GAN网络。作者想要在使输入向量z相同的情况下，生成的图片高频信息相同，但低频信息不同，所以在产生高频特征的前几层网络中进行权重共享。

def generator(self, z, y=None, share_params=False, reuse=False, name='G'):
    if '1' in name:
            branch = '1'
        elif '2' in name:
            branch = '2'

    #共享参数的层次
        s = self.output_size
        s2, s4 = int(s/2), int(s/4) 
        h0 = prelu(self.g_bn0(linear(z, self.gfc_dim, 'g_h0_lin', reuse=share_params), reuse=share_params), 
                        name='g_h0_prelu', reuse=share_params)
        h1 = prelu(self.g_bn1(linear(z, self.gf_dim*2*s4*s4,'g_h1_lin',reuse=share_params),reuse=share_params),
                        name='g_h1_prelu', reuse=share_params)
        h1 = tf.reshape(h1, [self.batch_size, s4, s4, self.gf_dim * 2])
        h2 = prelu(self.g_bn2(deconv2d(h1, [self.batch_size,s2,s2,self.gf_dim * 2], 
            name='g_h2', reuse=share_params), reuse=share_params), name='g_h2_prelu', reuse=share_params)

    #不共享参数的层次
    with tf.variable_scope(name):
        if reuse:
        tf.get_variable_scope().reuse_variables()
        output = tf.nn.sigmoid(deconv2d(h2, [self.batch_size, s, s, self.c_dim], name='g'+branch+'_h3', reuse=False))
        return output

def discriminator(self, image, y=None, share_params=False, reuse=False, name='D'):
    #选择相应的BN
        if '1' in name: #不共享
            d_bn1 = self.d1_bn1
        branch = '1'
        elif '2' in name:
            d_bn1 = self.d2_bn1
        branch = '2'

    #不共享
    with tf.variable_scope(name):
        if reuse:
       		tf.get_variable_scope().reuse_variables()
            h0 = prelu(conv2d(image, self.c_dim, name='d'+branch+'_h0_conv', reuse=False), 
                    name='d'+branch+'_h0_prelu', reuse=False)
            h1 = prelu(d_bn1(conv2d(h0, self.df_dim, name='d'+branch+'_h1_conv', reuse=False), reuse=reuse), 
                    name='d'+branch+'_h1_prelu', reuse=False)
            h1 = tf.reshape(h1, [self.batch_size, -1])            

    #共享层
    h2 = prelu(self.d_bn2(linear(h1, self.dfc_dim, 'd_h2_lin', reuse=share_params),reuse=share_params), 
                name='d_h2_prelu', reuse=share_params)
    h3 = linear(h2, 1, 'd_h3_lin', reuse=share_params)
    return tf.nn.sigmoid(h3), h3

code：https://github.com/mingyuliutw/CoGAN
paper：https://arxiv.org/abs/1606.07536

生成图像的准确性和多样性度量
Inception Score，之所以叫这个名字是因为计算时使用了 Inception Net做分类。对于准确性和多样性，它则是平衡了概率（准确）和边缘概率（某类，即多样性）
$IS(G)=e^{E_{x \in P}D_{KL}(P(y|x)||P(y))}$

def inception_score(imgs, cuda=True, batch_size=32, resize=False, splits=1):
    """Computes the inception score of the generated images imgs
    imgs -- Torch dataset of (3xHxW) numpy images normalized in the range [-1, 1]
    cuda -- whether or not to run on GPU
    batch_size -- batch size for feeding into Inception v3
    splits -- number of splits
    """
    N = len(imgs)

    assert batch_size > 0
    assert N > batch_size

    # Set up dtype
    if cuda:
        dtype = torch.cuda.FloatTensor
    else:
        if torch.cuda.is_available():
            print("WARNING: You have a CUDA device, so you should probably set cuda=True")
        dtype = torch.FloatTensor

    # Set up dataloader
    dataloader = torch.utils.data.DataLoader(imgs, batch_size=batch_size)

    # Load inception model
    inception_model = inception_v3(pretrained=True, transform_input=False).type(dtype)
    inception_model.eval();
    up = nn.Upsample(size=(299, 299), mode='bilinear').type(dtype)
    def get_pred(x):
        if resize:
            x = up(x)
        x = inception_model(x)
        return F.softmax(x).data.cpu().numpy()

    # Get predictions
    preds = np.zeros((N, 1000))

    for i, batch in enumerate(dataloader, 0):
        batch = batch.type(dtype)
        batchv = Variable(batch)
        batch_size_i = batch.size()[0]

        preds[i*batch_size:i*batch_size + batch_size_i] = get_pred(batchv)

    # Now compute the mean kl-div
    split_scores = []

    for k in range(splits):
        part = preds[k * (N // splits): (k+1) * (N // splits), :]
        py = np.mean(part, axis=0)
        scores = []
        for i in range(part.shape[0]):
            pyx = part[i, :]
            scores.append(entropy(pyx, py))
        split_scores.append(np.exp(np.mean(scores)))

    return np.mean(split_scores), np.std(split_scores)

缺点：当只产生一种物体的图像时，我们仍会认为这是均匀分布，而导致评价不正确。当模型坍塌时，结果就可能产生同样的图片。

FID（Fréchet Inception Distance） 则将生成图像与真实图像的进行的比较, 用相同的inception network提取中间层的特征后，用一个协方差为 μ 均值为 C 的正态分布去模拟这些特征的分布。所以FID对模型坍塌更加敏感。相比较IS来说，FID对噪声有更好的鲁棒性。因为假如只有一种图片时，FID这个距离将会相当的高。因此，FID更适合描述GAN网络的多样性。
$FID(G)=||\mu_r-\mu_g||+Tr(C_r+C_g-2(C_rC_g)^{0.5})$
但是这两种方法其实都无法解决图片的空间性问题（如把人脸的眼睛和鼻子换一下，这其实是由CNN的某平移不变性遗留的问题），目前出现了胶囊网络可以初步的改善这种问题。