系列文章目录
深度学习GAN(一)之简单介绍
深度学习GAN(二)之DCGAN基于CIFAR10数据集的例子
深度学习GAN(三)之DCGAN基于手写体Mnist数据集的例子
深度学习GAN(四)之cGAN (Conditional GAN)的例子
深度学习GAN(五)之PIX2PIX GAN的例子
深度学习GAN(六)之CycleGAN的例子
条件生成对抗网络cGAN的例子
1. 什么是cGAN
生成对抗网络(GAN)是一种深度学习网络,可以生成与输入训练数据具有相似特征的数据。
GAN由两个一起训练的网络组成:
- Generator — 给定一个随机值向量作为输入,此网络生成的数据与训练数据的结构相同。
- Discriminator — 给定包含训练的真实数据和生成器生成的数据,此网络是用于分开“真实”或“生成”的图片。
条件生成对抗网络是一种cGAN,在训练过程中也利用了标签。
- generator - 给定标签和随机数组作为输入,此网络将生成具有与对应于相同标签的训练数据中的数据。
- discriminator -
给定包含来自训练数据和来自生成器的生成数据的观测值的成批标记数据,该网络尝试将其分类为“真实”或“生成”。
例如,对于MNIST,可以生成特定的手写数字,例如数字9;对于CIFAR-10,可以生成特定的对象照片,例如“青蛙”;对于Fashion MNIST数据集,可以生成特定的服装项目,例如“dress”
这种模式被称为条件生成对抗网络,简称CGAN或CGAN。
2. 数据集准备
时装设计师服装照片数据集
这是一个由60000个28×28像素的小正方形灰度图像组成的数据集,10种服装,如鞋子、t恤衫、连衣裙等。
Keras通过提供对时尚MNIST数据集的访问_fashion_mnist.load_dataset()功能。它返回两个元组,一个包含标准训练数据集的输入和输出元素,另一个包含标准测试数据集的输入和输出元素。
下面的示例加载数据集并总结加载的数据集的形状。
注意:第一次加载数据集时,Keras将自动下载压缩版本的图像并将其保存在主目录下~/.Keras/datasets/。下载速度很快,因为数据集的压缩格式只有大约25兆字节。
# example of loading the fashion_mnist dataset
from keras.datasets.fashion_mnist import load_data
# load the images into memory
(trainX, trainy), (testX, testy) = load_data()
# summarize the shape of the dataset
print('Train', trainX.shape, trainy.shape)
print('Test', testX.shape, testy.shape)
训练集有6万张28x28大小的图片,
测试集有1万张28x28大小的图片,
Train (60000, 28, 28) (60000,)
Test (10000, 28, 28) (10000,)
利用Matplotlib包显示100张图片。
# example of loading the fashion_mnist dataset
from keras.datasets.fashion_mnist import load_data
from matplotlib import pyplot
# load the images into memory
(trainX, trainy), (testX, testy) = load_data()
# plot images from the training dataset
for i in range(100):
# define subplot
pyplot.subplot(10, 10, 1 + i)
# turn off axis
pyplot.axis('off')
# plot raw pixel data
pyplot.imshow(trainX[i], cmap='gray_r')
pyplot.show()
运行结果如下
3. 代码实现cGAN
3.1. 定义判别器discriminator
从判别器模型开始,定义了一个新的第二个输入,它为图像的类标签取一个整数。这会使输入图像以所提供的类标签为条件。
然后,类标签通过大小为50的嵌入层。这意味着时尚MNIST数据集(0到9)的10个类中的每一个都将映射到一个不同的50元素向量表示,该向量表示将由定别器模型学习。
然后通过线性激活将嵌入的输出传递到完全连接的层。重要的是,完全连接的层有足够的激活,可以重塑成28×28图像的一个通道。激活被重塑成单个的28×28激活图并与输入图像连接。这有一个效果,看起来像一个双通道输入图像到下一个卷积层。
下面的define_discriminator()实现了对discriminator模型的更新。在嵌入层之后还使用输入图像的参数化形状来定义完全连接层的激活次数,以重塑其输出。问题中的类数也在函数和集合中参数化。
def define_discriminator(in_shape=(28,28,1), n_classes=10):
# label input
in_label = Input(shape=(1,))
# embedding for categorical input
li = Embedding(n_classes, 50)(in_label)
# scale up to image dimensions with linear activation
n_nodes = in_shape[0] * in_shape[1]
li = Dense(n_nodes)(li)
# reshape to additional channel
li = Reshape((in_shape[0], in_shape[1], 1))(li)
# image input
in_image = Input(shape=in_shape)
# concat label as a channel
merge = Concatenate()([in_image, li])
# downsample
fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(merge)
fe = LeakyReLU(alpha=0.2)(fe)
# downsample
fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe)
fe = LeakyReLU(alpha=0.2)(fe)
# flatten feature maps
fe = Flatten()(fe)
# dropout
fe = Dropout(0.4)(fe)
# output
out_layer = Dense(1, activation='sigmoid')(fe)
# define model
model = Model([in_image, in_label], out_layer)
# compile model
opt = Adam(lr=0.0002, beta_1=0.5)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
#tf.keras.utils.plot_model(model, 'discriminator.png', show_shapes=True)
return model
为了使结构清晰,下面是一个判别器模型的图。
该图显示了两个输入:首先是通过嵌入(左)和图像(右)的类标签,并将它们连接到一个双通道28×28图像或特征映射(中间)。模型的其余部分与前面设计的判别器相同。
3.2. 定义生成器Generator
在判别器中,类标签通过一个嵌入层来映射到一个唯一的50个元素向量,然后通过一个全连接的层通过线性激活,然后再调整大小。在这种情况下,全连接层的激活被调整为一个7×7的特征映射。这是为了匹配无条件生成器模型的7×7特征映射激活。新的7×7特征映射作为一个通道添加到现有的128个通道中,生成129个特征映射,然后像先前的模型一样进行上采样。
下面的define_generator()函数实现了这一点,再次像使用判别器模型那样参数化类的数量。
# define the standalone generator model
def define_generator(latent_dim, n_classes=10):
# label input
in_label = Input(shape=(1,))
# embedding for categorical input
li = Embedding(n_classes, 50)(in_label)
# linear multiplication
n_nodes = 7 * 7
li = Dense(n_nodes)(li)
# reshape to additional channel
li = Reshape((7, 7, 1))(li)
# image generator input
in_lat = Input(shape=(latent_dim,))
# foundation for 7x7 image
n_nodes = 128 * 7 * 7
gen = Dense(n_nodes)(in_lat)
gen = LeakyReLU(alpha=0.2)(gen)
gen = Reshape((7, 7, 128))(gen)
# merge image gen and label input
merge = Concatenate()([gen, li])
# upsample to 14x14
gen = Conv2DTranspose(128, (4,4), strides=(2,2), padding='same')(merge)
gen = LeakyReLU(alpha=0.2)(gen)
# upsample to 28x28
gen = Conv2DTranspose(128, (4,4), strides=(2,2), padding='same')(gen)
gen = LeakyReLU(alpha=0.2)(gen)
# output
out_layer = Conv2D(1, (7,7), activation='tanh', padding='same')(gen)
# define model
model = Model([in_lat, in_label], out_layer)
return model
为了帮助理解新的模型体系结构,下图是条件生成器模型的图。
在这种情况下,您可以看到latent_dim中的100个元素点作为输入和后续的大小调整(左)和新的类标签输入和嵌入层(右),然后是两组特征映射的连接(中间)。模型的其余部分与无条件情况相同。
3.2. 定义GAN
最后,compositeGAN模型需要更新。
新的GAN模型将把潜在空间中的一个点作为输入和一个类标签,并像以前一样生成输入是真是假的预测。
使用函数API来设计模型,重要的是我们显式地连接生成器生成的图像输出和类标签输入,它们都作为鉴别器模型的输入。这允许相同的类标签输入向下流入生成器并向下流入鉴别器。
下面的define_gan()函数实现gan的条件版本
# define the combined generator and discriminator model, for updating the generator
def define_gan(g_model, d_model):
# make weights in the discriminator not trainable
d_model.trainable = False
# get noise and label inputs from generator model
gen_noise, gen_label = g_model.input
# get image output from the generator model
gen_output = g_model.output
# connect image output and label input from generator as inputs to discriminator
gan_output = d_model([gen_output, gen_label])
# define gan model as taking noise and label and outputting a classification
model = Model([gen_noise, gen_label], gan_output)
# compile model
opt = Adam(lr=0.0002, beta_1=0.5)
model.compile(loss='binary_crossentropy', optimizer=opt)
return model
下图总结了复合GAN模型。
重要的是,它完整地展示了生成器模型,以潜在空间中的点和类标签作为输入,并将生成器的输出和同一个类标签作为输入连接到鉴别器模型(图底部的最后一个框)和单个类标签分类的输出(真假)。
3.4. 训练
完成了从无条件到有条件GAN转换的难点,即模型体系结构的定义和配置。
接下来,剩下的就是更新培训过程以使用类标签。
首先,分别用于加载数据集和选择一批样本的load_real_samples()和generate_real_samples()函数必须更新,以使用训练数据集中的实际类标签。重要的是,generate_real_samples()函数现在返回图像、衣服标签和鉴别器的类标签(class=1)。
# load fashion mnist images
def load_real_samples():
# load dataset
(trainX, trainy), (_, _) = load_data()
# expand to 3d, e.g. add channels
X = expand_dims(trainX, axis=-1)
# convert from ints to floats
X = X.astype('float32')
# scale from [0,255] to [-1,1]
X = (X - 127.5) / 127.5
return [X, trainy]
# select real samples
def generate_real_samples(dataset, n_samples):
# split into images and labels
images, labels = dataset
# choose random instances
ix = randint(0, images.shape[0], n_samples)
# select images and labels
X, labels = images[ix], labels[ix]
# generate class labels
y = ones((n_samples, 1))
return [X, labels], y
接下来,generate_potent_points()函数必须更新,以生成随机选择的整数类标签数组,以便与潜在空间中随机选择的点一起使用。
然后generate_fake_samples()函数必须更新,以便在生成新的假图像时使用这些随机生成的类标签作为生成器模型的输入。
# generate points in latent space as input for the generator
def generate_latent_points(latent_dim, n_samples, n_classes=10):
# generate points in the latent space
x_input = randn(latent_dim * n_samples)
# reshape into a batch of inputs for the network
z_input = x_input.reshape(n_samples, latent_dim)
# generate labels
labels = randint(0, n_classes, n_samples)
return [z_input, labels]
# use the generator to generate n fake examples, with class labels
def generate_fake_samples(generator, latent_dim, n_samples):
# generate points in latent space
z_input, labels_input = generate_latent_points(latent_dim, n_samples)
# predict outputs
images = generator.predict([z_input, labels_input])
# create class labels
y = zeros((n_samples, 1))
return [images, labels_input], y
最后,必须更新train()函数,以便在更新鉴别器和生成器模型的调用中检索和使用类标签。
# train the generator and discriminator
def train(g_model, d_model, gan_model, dataset, latent_dim, n_epochs=100, n_batch=128):
bat_per_epo = int(dataset[0].shape[0] / n_batch)
half_batch = int(n_batch / 2)
# manually enumerate epochs
for i in range(n_epochs):
# enumerate batches over the training set
for j in range(bat_per_epo):
# get randomly selected 'real' samples
[X_real, labels_real], y_real = generate_real_samples(dataset, half_batch)
# update discriminator model weights
d_loss1, _ = d_model.train_on_batch([X_real, labels_real], y_real)
# generate 'fake' examples
[X_fake, labels], y_fake = generate_fake_samples(g_model, latent_dim, half_batch)
# update discriminator model weights
d_loss2, _ = d_model.train_on_batch([X_fake, labels], y_fake)
# prepare points in latent space as input for the generator
[z_input, labels_input] = generate_latent_points(latent_dim, n_batch)
# create inverted labels for the fake samples
y_gan = ones((n_batch, 1))
# update the generator via the discriminator's error
g_loss = gan_model.train_on_batch([z_input, labels_input], y_gan)
# summarize loss on this batch
print('>%d, %d/%d, d1=%.3f, d2=%.3f g=%.3f' %
(i+1, j+1, bat_per_epo, d_loss1, d_loss2, g_loss))
# save the generator model
g_model.save('cgan_generator.h5')
4. 完整代码
import tensorflow as tf
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import Conv2DTranspose
from tensorflow.keras.datasets.fashion_mnist import load_data
from tensorflow.keras.layers import Concatenate, Dense, Reshape, Embedding, Flatten
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import LeakyReLU
from numpy import expand_dims
from numpy import zeros
from numpy import ones
from numpy.random import randn
from numpy.random import randint
# define the standalone discriminator model
def define_discriminator(in_shape=(28,28,1), n_classes=10):
# label input
in_label = Input(shape=(1,))
# embedding for categorical input
li = Embedding(n_classes, 50)(in_label)
# scale up to image dimensions with linear activation
n_nodes = in_shape[0] * in_shape[1]
li = Dense(n_nodes)(li)
# reshape to additional channel
li = Reshape((in_shape[0], in_shape[1], 1))(li)
# image input
in_image = Input(shape=in_shape)
# concat label as a channel
merge = Concatenate()([in_image, li])
# downsample
fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(merge)
fe = LeakyReLU(alpha=0.2)(fe)
# downsample
fe = Conv2D(128, (3,3), strides=(2,2), padding='same')(fe)
fe = LeakyReLU(alpha=0.2)(fe)
# flatten feature maps
fe = Flatten()(fe)
# dropout
fe = Dropout(0.4)(fe)
# output
out_layer = Dense(1, activation='sigmoid')(fe)
# define model
model = Model([in_image, in_label], out_layer)
# compile model
opt = Adam(lr=0.0002, beta_1=0.5)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
#tf.keras.utils.plot_model(model, 'discriminator.png', show_shapes=True)
return model
# define the standalone generator model
def define_generator(latent_dim, n_classes=10):
# label input
in_label = Input(shape=(1,))
# embedding for categorical input
li = Embedding(n_classes, 50)(in_label)
# linear multiplication
n_nodes = 7 * 7
li = Dense(n_nodes)(li)
# reshape to additional channel
li = Reshape((7, 7, 1))(li)
# image generator input
in_lat = Input(shape=(latent_dim,))
# foundation for 7x7 image
n_nodes = 128 * 7 * 7
gen = Dense(n_nodes)(in_lat)
gen = LeakyReLU(alpha=0.2)(gen)
gen = Reshape((7, 7, 128))(gen)
# merge image gen and label input
merge = Concatenate()([gen, li])
# upsample to 14x14
gen = Conv2DTranspose(128, (4,4), strides=(2,2), padding='same')(merge)
gen = LeakyReLU(alpha=0.2)(gen)
# upsample to 28x28
gen = Conv2DTranspose(128, (4,4), strides=(2,2), padding='same')(gen)
gen = LeakyReLU(alpha=0.2)(gen)
# output
out_layer = Conv2D(1, (7,7), activation='tanh', padding='same')(gen)
# define model
model = Model([in_lat, in_label], out_layer)
return model
# define the combined generator and discriminator model, for updating the generator
def define_gan(g_model, d_model):
# make weights in the discriminator not trainable
d_model.trainable = False
# get noise and label inputs from generator model
gen_noise, gen_label = g_model.input
# get image output from the generator model
gen_output = g_model.output
# connect image output and label input from generator as inputs to discriminator
gan_output = d_model([gen_output, gen_label])
# define gan model as taking noise and label and outputting a classification
model = Model([gen_noise, gen_label], gan_output)
# compile model
opt = Adam(lr=0.0002, beta_1=0.5)
model.compile(loss='binary_crossentropy', optimizer=opt)
return model
# load fashion mnist images
def load_real_samples():
# load dataset
(trainX, trainy), (_, _) = load_data()
# expand to 3d, e.g. add channels
X = expand_dims(trainX, axis=-1)
# convert from ints to floats
X = X.astype('float32')
# scale from [0,255] to [-1,1]
X = (X - 127.5) / 127.5
return [X, trainy]
# select real samples
def generate_real_samples(dataset, n_samples):
# split into images and labels
images, labels = dataset
# choose random instances
ix = randint(0, images.shape[0], n_samples)
# select images and labels
X, labels = images[ix], labels[ix]
# generate class labels
y = ones((n_samples, 1))
return [X, labels], y
# generate points in latent space as input for the generator
def generate_latent_points(latent_dim, n_samples, n_classes=10):
# generate points in the latent space
x_input = randn(latent_dim * n_samples)
# reshape into a batch of inputs for the network
z_input = x_input.reshape(n_samples, latent_dim)
# generate labels
labels = randint(0, n_classes, n_samples)
return [z_input, labels]
# use the generator to generate n fake examples, with class labels
def generate_fake_samples(generator, latent_dim, n_samples):
# generate points in latent space
z_input, labels_input = generate_latent_points(latent_dim, n_samples)
# predict outputs
images = generator.predict([z_input, labels_input])
# create class labels
y = zeros((n_samples, 1))
return [images, labels_input], y
# train the generator and discriminator
def train(g_model, d_model, gan_model, dataset, latent_dim, n_epochs=100, n_batch=128):
bat_per_epo = int(dataset[0].shape[0] / n_batch)
half_batch = int(n_batch / 2)
# manually enumerate epochs
for i in range(n_epochs):
# enumerate batches over the training set
for j in range(bat_per_epo):
# get randomly selected 'real' samples
[X_real, labels_real], y_real = generate_real_samples(dataset, half_batch)
# update discriminator model weights
d_loss1, _ = d_model.train_on_batch([X_real, labels_real], y_real)
# generate 'fake' examples
[X_fake, labels], y_fake = generate_fake_samples(g_model, latent_dim, half_batch)
# update discriminator model weights
d_loss2, _ = d_model.train_on_batch([X_fake, labels], y_fake)
# prepare points in latent space as input for the generator
[z_input, labels_input] = generate_latent_points(latent_dim, n_batch)
# create inverted labels for the fake samples
y_gan = ones((n_batch, 1))
# update the generator via the discriminator's error
g_loss = gan_model.train_on_batch([z_input, labels_input], y_gan)
# summarize loss on this batch
print('>%d, %d/%d, d1=%.3f, d2=%.3f g=%.3f' %
(i+1, j+1, bat_per_epo, d_loss1, d_loss2, g_loss))
# save the generator model
g_model.save('cgan_generator.h5')
if __name__ == '__main__':
# size of the latent space
latent_dim = 100
# create the discriminator
d_model = define_discriminator()
# create the generator
g_model = define_generator(latent_dim)
# create the gan
gan_model = define_gan(g_model, d_model)
# load image data
dataset = load_real_samples()
# train model
train(g_model, d_model, gan_model, dataset, latent_dim)
5. 根据条件生成相应的衣服
我们将使用经过训练的生成器模型有条件地生成衣服项目的新照片。
我们可以更新代码示例,以便使用模型生成新图像,现在可以根据类标签生成图像。我们可以为列中的每个类标签生成10个示例。
下面列出了完整的示例。
# example of loading the generator model and generating images
from numpy import asarray
from numpy.random import randn
from numpy.random import randint
from tensorflow.keras.models import load_model
from matplotlib import pyplot
# generate points in latent space as input for the generator
def generate_latent_points(latent_dim, n_samples, n_classes=10):
# generate points in the latent space
x_input = randn(latent_dim * n_samples)
# reshape into a batch of inputs for the network
z_input = x_input.reshape(n_samples, latent_dim)
# generate labels
labels = randint(0, n_classes, n_samples)
return [z_input, labels]
# create and save a plot of generated images
def save_plot(examples, n):
# plot images
for i in range(n * n):
# define subplot
pyplot.subplot(n, n, 1 + i)
# turn off axis
pyplot.axis('off')
# plot raw pixel data
pyplot.imshow(examples[i, :, :, 0], cmap='gray_r')
pyplot.show()
# load model
model = load_model('cgan_generator.h5')
# generate images
latent_points, labels = generate_latent_points(100, 100)
# specify labels
labels = asarray([x for _ in range(10) for x in range(10)])
# generate images
X = model.predict([latent_points, labels])
# scale from [-1,1] to [0,1]
X = (X + 1) / 2.0
# plot the result
save_plot(X, 10)
运行该示例将加载保存的条件GAN模型并使用它生成100件衣服。
衣服是按纵队排列的。从左到右依次是“t恤”、“裤子”、“套头衫”、“连衣裙”、“外套”、“凉鞋”、“衬衫”、“运动鞋”、“包”和“踝靴”。
我们可以看到,不仅随机生成的衣服项目是可信的,而且他们也符合他们的预期类别
6. 个人总结
GPU上训练,总时长1小时左右。结果还可以,虽然有些衣服缺袖子什么的,总的来说还能看的过去。