InfoGAN详解与实现(采用tensorflow2.x实现)

24 篇文章 39 订阅
10 篇文章 14 订阅

InfoGAN原理

最初的GAN能够产生有意义的输出,但是缺点是它的属性无法控制。例如,无法明确向生成器提出生成女性名人的脸,该女性名人是黑发,白皙的肤色,棕色的眼睛,微笑着。这样做的根本原因是因为使用的100-dim噪声矢量合并了生成器输出的所有显着属性。
如果能够修改原始GAN,从而将表示形式分为合并和分离可解释的潜在编码向量,则可以告诉生成器要合成什么。
合并和分离编码可以表示如下:
合并编码与分离编码对比具有分离表示的GAN也可以以与普通GAN相同的方式进行优化。生成器的输出可以表示为:
G ( z , c ) = G ( z ) G(z,c)=G(z) G(z,c)=G(z)
编码 z = ( z , c ) z = (z,c) z=(z,c)包含两个元素, z z z表示合并表示, c = c 1 , c 2 , . . . , c L c=c_1,c_2,...,c_L c=c1,c2,...,cL表示分离的编码表示。
为了强制编码的解耦,InfoGAN提出了一种针对原始损失函数的正则化函数,该函数将潜在编码 c c c G ( z , c ) G(z,c) G(z,c)之间的互信息最大化:
I ( c ; G ( z , c ) ) = I G ( c ; z ) I(c;G(z,c))=IG(c;z) I(c;G(z,c))=IG(c;z)
正则化器强制生成器考虑潜在编码。在信息论领域,潜在编码 c c c G ( z , c ) G(z,c) G(z,c)之间的互信息定义为:
I ( G ( c ; z ) = H ( c ) − H ( c ∣ G ( z , c ) ) I(G(c;z)=H(c)-H(c|G(z,c)) I(G(c;z)=H(c)H(cG(z,c))
其中 H ( c ) H(c) H(c)是潜在编码 c c c的熵,而 H ( c ∣ G ( z , c ) ) H(c|G(z,c)) H(cG(z,c))是得到生成器的输出 G ( z , c ) G(z,c) G(z,c)后c的条件熵。
最大化互信息意味着在生成得到生成的输出时将 H ( c ∣ G ( z , c ) ) H(c|G(z,c)) H(cG(z,c))最小化或减小潜在编码中的不确定性。
但是由于估计 H ( c ∣ G ( z , c ) ) H(c|G(z,c)) H(cG(z,c))需要后验分布 p ( c ∣ G ( z , c ) ) = p ( c ∣ x ) p(c|G(z,c))=p(c|x) p(cG(z,c))=p(cx),因此难以估算 H ( c ∣ G ( z , c ) ) H(c|G(z,c)) H(cG(z,c))
解决方法是通过使用辅助分布 Q ( c ∣ x ) Q(c|x) Q(cx)估计后验概率来估计互信息的下限,估计相互信息的下限为:
I ( c ; G ( z , c ) ) ≥ L I ( G , Q ) = E c ∼ p ( c ) , x ∼ G ( z , c ) [ l o g Q ( c ∣ x ) ] + H ( c ) I(c;G(z,c)) \ge L_I(G,Q)=E_{c \sim p(c),x \sim G(z,c)}[logQ(c|x)]+H(c) I(c;G(z,c))LI(G,Q)=Ecp(c),xG(z,c)[logQ(cx)]+H(c)
在InfoGAN中,假设 H ( c ) H(c) H(c)为常数。因此,使互信息最大化是使期望最大化的问题。生成器必须确信已生成具有特定属性的输出。此期望的最大值为零。因此,互信息的下限的最大值为 H ( c ) H(c) H(c)。在InfoGAN中,离散潜在编码 Q ( c ∣ x ) Q(c|x) Q(cx)的可以用softmax表示。期望是tf.keras中的负categorical_crossentropy损失。
对于一维连续编码,期望是 c c c x x x上的二重积分,这是由于期望样本同时来自分离编码分布和生成器分布。估计期望值的一种方法是通过假设样本是连续数据的良好度量。因此,损失估计为 c l o g Q ( c ∣ x ) clogQ(c|x) clogQ(cx)
为了完成InfoGAN的网络,应该有一个 l o g Q ( c ∣ x ) logQ(c|x) logQ(cx)的实现。为简单起见,网络Q是附加到鉴别器的辅助网络。
InfoGAN网络架构鉴别器损失函数
L ( D ) = − E x ∼ p d a t a l o g D ( x ) − E z , c l o g [ 1 − D ( G ( z , c ) ) ] − λ I ( c ; G ( z , c ) ) \mathcal L^{(D)} = -\mathbb E_{x\sim p_{data}}logD(x)-\mathbb E_{z,c}log[1 − D(G(z,c))]-\lambda I(c;G(z,c)) L(D)=ExpdatalogD(x)Ez,clog[1D(G(z,c))]λI(c;G(z,c))
生成器损失函数:
L ( G ) = − E z , c l o g D ( G ( z , c ) ) − λ I ( c ; G ( z , c ) ) \mathcal L^{(G)} = -\mathbb E_{z,c}logD(G(z,c))-\lambda I(c;G(z,c)) L(G)=Ez,clogD(G(z,c))λI(c;G(z,c))
其中 λ \lambda λ是正的常数

InfoGAN实现

如果将其应用于MNIST数据集,InfoGAN可以学习分离的离散编码和连续编码,以修改生成器输出属性。 例如,像CGAN和ACGAN一样,将使用10维独热标签形式的离散编码来指定要生成的数字。但是,可以添加两个连续的编码,一个用于控制书写样式的角度,另一个用于调整笔划宽度。保留较小尺寸的编码以表示所有其他属性:

MNIST数据集编码形式

导入必要库

import tensorflow as tf
import numpy as np
from tensorflow import keras
import os
from matplotlib import pyplot as plt
import math
from PIL import Image
from tensorflow.keras import backend as K

生成器

def generator(inputs,image_size,activation='sigmoid',labels=None,codes=None):
    """generator model
    Arguments:
        inputs (layer): input layer of generator
        image_size (int): Target size of one side
        activation (string): name of output activation layer
        labels (tensor): input labels
        codes (list): 2-dim disentangled codes for infoGAN
    returns:
        model: generator model
    """
    image_resize = image_size // 4
    kernel_size = 5
    layer_filters = [128,64,32,1]
    inputs = [inputs,labels] + codes
    x = keras.layers.concatenate(inputs,axis=1)
    
    x = keras.layers.Dense(image_resize*image_resize*layer_filters[0])(x)
    x = keras.layers.Reshape((image_resize,image_resize,layer_filters[0]))(x)
    for filters in layer_filters:
        if filters > layer_filters[-2]:
            strides = 2
        else:
            strides = 1
        x = keras.layers.BatchNormalization()(x)
        x = keras.layers.Activation('relu')(x)
        x = keras.layers.Conv2DTranspose(filters=filters,
                kernel_size=kernel_size,
                strides=strides,
                padding='same')(x)
    if activation is not None:
        x = keras.layers.Activation(activation)(x)
    return keras.Model(inputs,x,name='generator')

鉴别器

def discriminator(inputs,activation='sigmoid',num_labels=None,num_codes=None):
    """discriminator model
    Arguments:
        inputs (Layer): input layer of the discriminator
        activation (string): name of output activation layer
        num_labels (int): dimension of one-hot labels for ACGAN & InfoGAN
        num_codes (int): num_codes-dim 2 Q network if InfoGAN
    Returns:
        Model: Discriminator model
    """
    kernel_size = 5
    layer_filters = [32,64,128,256]
    x = inputs
    for filters in layer_filters:
        if filters == layer_filters[-1]:
            strides = 1
        else:
            strides = 2
        x = keras.layers.LeakyReLU(0.2)(x)
        x = keras.layers.Conv2D(filters=filters,
                kernel_size=kernel_size,
                strides=strides,
                padding='same')(x)
    x = keras.layers.Flatten()(x)
    outputs = keras.layers.Dense(1)(x)
    if activation is not None:
        print(activation)
        outputs = keras.layers.Activation(activation)(outputs)
    if num_labels:
        layer = keras.layers.Dense(layer_filters[-2])(x)
        labels = keras.layers.Dense(num_labels)(layer)
        labels = keras.layers.Activation('softmax',name='label')(labels)
        # 1-dim continous Q of 1st c given x
        code1 = keras.layers.Dense(1)(layer)
        code1 = keras.layers.Activation('sigmoid',name='code1')(code1)
        # 1-dim continous Q of 2nd c given x
        code2 = keras.layers.Dense(1)(layer)
        code2 = keras.layers.Activation('sigmoid',name='code2')(code2)
        outputs = [outputs,labels,code1,code2]
    return keras.Model(inputs,outputs,name='discriminator')

模型构建

#mi_loss
def mi_loss(c,q_of_c_give_x):
    """mi_loss = -c * log(Q(c|x))
    """
    return K.mean(-K.sum(K.log(q_of_c_give_x + K.epsilon()) * c,axis=1))
    
def build_and_train_models(latent_size=100):
    """Load the dataset, build InfoGAN models,
    Call the InfoGAN train routine.
    """
    (x_train,y_train),_ = keras.datasets.mnist.load_data()
    image_size = x_train.shape[1]
    x_train = np.reshape(x_train,[-1,image_size,image_size,1])
    x_train = x_train.astype('float32') / 255.
    num_labels = len(np.unique(y_train))
    y_train = keras.utils.to_categorical(y_train)
    
    #超参数
    model_name = 'infogan_mnist'
    batch_size = 64
    train_steps = 40000
    lr = 2e-4
    decay = 6e-8
    input_shape = (image_size,image_size,1)
    label_shape = (num_labels,)
    code_shape = (1,)

    #discriminator model
    inputs = keras.layers.Input(shape=input_shape,name='discriminator_input')
    #discriminator with 4 outputs
    discriminator_model = discriminator(inputs,num_labels=num_labels,num_codes=2)
    optimizer = keras.optimizers.RMSprop(lr=lr,decay=decay)
    loss = ['binary_crossentropy','categorical_crossentropy',mi_loss,mi_loss]
    loss_weights = [1.0,1.0,0.5,0.5]
    discriminator_model.compile(loss=loss,
            loss_weights=loss_weights,
            optimizer=optimizer,
            metrics=['acc'])
    discriminator_model.summary()
    input_shape = (latent_size,)
    inputs = keras.layers.Input(shape=input_shape,name='z_input')
    labels = keras.layers.Input(shape=label_shape,name='labels')
    code1 = keras.layers.Input(shape=code_shape,name='code1')
    code2 = keras.layers.Input(shape=code_shape,name='code2')
    generator_model = generator(inputs,image_size,labels=labels,codes=[code1,code2])
    generator_model.summary()
    optimizer = keras.optimizers.RMSprop(lr=lr*0.5,decay=decay*0.5)
    discriminator_model.trainable = False
    inputs = [inputs,labels,code1,code2]
    adversarial_model = keras.Model(inputs,
            discriminator_model(generator_model(inputs)),
            name=model_name)
    adversarial_model.compile(loss=loss,loss_weights=loss_weights,
            optimizer=optimizer,
            metrics=['acc'])
    adversarial_model.summary()

    models = (generator_model,discriminator_model,adversarial_model)
    data = (x_train,y_train)
    params = (batch_size,latent_size,train_steps,num_labels,model_name)
    train(models,data,params)

模型训练

def train(models,data,params):
    """Train the network
    #Arguments
        models (Models): generator,discriminator,adversarial model
        data (tuple): x_train,y_train data
        params (tuple): Network params
    """
    generator,discriminator,adversarial = models
    x_train,y_train = data
    batch_size,latent_size,train_steps,num_labels,model_name = params

    save_interval = 500
    code_std = 0.5
    noise_input = np.random.uniform(-1.0,1.,size=[16,latent_size])
    noise_label = np.eye(num_labels)[np.arange(0,16) % num_labels]
    noise_code1 = np.random.normal(scale=code_std,size=[16,1])
    noise_code2 = np.random.normal(scale=code_std,size=[16,1])
    train_size = x_train.shape[0]
    print(model_name,
            "Labels for generated images: ",
            np.argmax(noise_label, axis=1))
    for i in range(train_steps):
        rand_indexes = np.random.randint(0,train_size,size=batch_size)
        real_images = x_train[rand_indexes]
        real_labels = y_train[rand_indexes]
        #random codes for real images
        real_code1 = np.random.normal(scale=code_std,size=[batch_size,1])
        real_code2 = np.random.normal(scale=code_std,size=[batch_size,1])
        #生成假图片,标签和编码
        noise = np.random.uniform(-1.,1.,size=[batch_size,latent_size])
        fake_labels = np.eye(num_labels)[np.random.choice(num_labels,batch_size)]
        fake_code1 = np.random.normal(scale=code_std,size=[batch_size,1])
        fake_code2 = np.random.normal(scale=code_std,size=[batch_size,1])
        inputs = [noise,fake_labels,fake_code1,fake_code2]
        fake_images = generator.predict(inputs)
        x = np.concatenate((real_images,fake_images))
        labels = np.concatenate((real_labels,fake_labels))
        codes1 = np.concatenate((real_code1,fake_code1))
        codes2 = np.concatenate((real_code2,fake_code2))
        y = np.ones([2 * batch_size,1])
        y[batch_size:,:] = 0
        #train discriminator network
        outputs = [y,labels,codes1,codes2]
        # metrics = ['loss', 'activation_1_loss', 'label_loss',
        # 'code1_loss', 'code2_loss', 'activation_1_acc',
        # 'label_acc', 'code1_acc', 'code2_acc']
        metrics = discriminator.train_on_batch(x, outputs)
        fmt = "%d: [dis: %f, bce: %f, ce: %f, mi: %f, mi:%f, acc: %f]"
        log = fmt % (i, metrics[0], metrics[1], metrics[2], metrics[3], metrics[4], metrics[6])
        #train the adversarial network
        noise = np.random.uniform(-1.,1.,size=[batch_size,latent_size])
        fake_labels = np.eye(num_labels)[np.random.choice(num_labels,batch_size)]
        fake_code1 = np.random.normal(scale=code_std,size=[batch_size,1])
        fake_code2 = np.random.normal(scale=code_std,size=[batch_size,1])
        y = np.ones([batch_size,1])
        inputs = [noise,fake_labels,fake_code1,fake_code2]
        outputs = [y,fake_labels,fake_code1,fake_code2]
        metrics = adversarial.train_on_batch(inputs,outputs)
        fmt = "%s [adv: %f, bce: %f, ce: %f, mi: %f, mi:%f, acc: %f]"
        log = fmt % (log, metrics[0], metrics[1], metrics[2], metrics[3], metrics[4], metrics[6])

        print(log)
        if (i + 1) % save_interval == 0:
            # plot generator images on a periodic basis
            plot_images(generator,
                            noise_input=noise_input,
                            noise_label=noise_label,
                            noise_codes=[noise_code1, noise_code2],
                            show=False,
                            step=(i + 1),
                            model_name=model_name)
   
        # save the model
        if (i + 1) % (2 * save_interval) == 0:
            generator.save(model_name + ".h5")

效果展示

#绘制生成图片
def plot_images(generator,
                noise_input,
                noise_label=None,
                noise_codes=None,
                show=False,
                step=0,
                model_name="gan"):
    """Generate fake images and plot them

    For visualization purposes, generate fake images
    then plot them in a square grid

    # Arguments
        generator (Model): The Generator Model for 
            fake images generation
        noise_input (ndarray): Array of z-vectors
        show (bool): Whether to show plot or not
        step (int): Appended to filename of the save images
        model_name (string): Model name

    """
    os.makedirs(model_name, exist_ok=True)
    filename = os.path.join(model_name, "%05d.png" % step)
    rows = int(math.sqrt(noise_input.shape[0]))
    if noise_label is not None:
        noise_input = [noise_input, noise_label]
        if noise_codes is not None:
            noise_input += noise_codes

    images = generator.predict(noise_input)
    plt.figure(figsize=(2.2, 2.2))
    num_images = images.shape[0]
    image_size = images.shape[1]
    for i in range(num_images):
        plt.subplot(rows, rows, i + 1)
        image = np.reshape(images[i], [image_size, image_size])
        plt.imshow(image, cmap='gray')
        plt.axis('off')
    plt.savefig(filename)
    if show:
        plt.show()
    else:
        plt.close('all')
#模型训练
build_and_train_models(latent_size=62)
steps = 500

steps = 500

steps = 16000

steps = 16000

修改书写角度的分离编码

修改书写角度的分离编码

  • 8
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 56
    评论
导数在数学中是一个非常重要的概念,其在机器学习深度学习中也扮演着至关重要的角色。TensorFlow作为一款流行的深度学习框架,在其2.x版本中提供了丰富的导数计算函数,本文将对TensorFlow 2.x中的导数计算进行详细的解析。 首先,TensorFlow中导数计算的核心就是“tf.GradientTape”函数,该函数记录执行的操作,并自动构建一个对应的计算图。在计算图中,我们可以根据需要定义一系列输入张量或者变量,并用这些对象进行复杂的计算。之后,再通过“tape.gradient”函数来计算导数。比如,在线性回归的例子中,我们可以将设计矩阵X和标签向量y作为输入张量,然后定义参数张量w,并对其进行计算。最后,我们用“tape.gradient”函数对w进行求导,即可得到损失对其的梯度。 除了上述基本实现之外,TensorFlow 2.x中还提供了丰富的导数计算函数,比如“tf.gradients”函数、自动微分工具“tf.autodiff”、高阶导数函数“tf.hessians”、方向导数函数“tf.custom_gradient”等等。这些函数可以根据用户的需要实现对导数的计算、控制求导的方式、实现高阶导数计算等等。在实际使用中,我们可以根据具体的需求选择使用不同的导数计算函数,比如在求解梯度下降法的过程中,我们可以根据需要计算一阶或二阶导数,也可以选择自动微分工具来实现快速又可靠的导数计算。 总之,TensorFlow 2.x中的导数计算是一个非常重要的功能,在深度学习的应用中起着至关重要的作用。通过使用不同的导数计算方法,我们可以实现对复杂模型参数的优化、实现高阶导数计算、实现特殊的导数控制等等功能。因此,熟练掌握TensorFlow 2.x中的导数计算函数是每一位深度学习从业者必备的能力。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 56
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

盼小辉丶

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值