自从 Ian Goodfellow 在 14 年发表了 论文 Generative Adversarial Nets 以来,生成式对抗网络 GAN 广受关注,加上学界大牛 Yann Lecun 在 Quora 答题时曾说,他最激动的深度学习进展是生成式对抗网络,使得 GAN 成为近年来在机器学习领域的新宠,可以说,研究机器学习的人,不懂 GAN,简直都不好意思出门。
下面我们来简单介绍一下生成式对抗网络,主要介绍三篇论文:1)Generative Adversarial Networks;2)Conditional Generative Adversarial Nets;3)Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks。
首先来看下第一篇论文,了解一下 GAN 的过程和原理:
GAN 启发自博弈论中的二人零和博弈(two-player game),GAN 模型中的两位博弈方分别由生成式模型(generative model)和判别式模型(discriminative model)充当。生成模型 G 捕捉样本数据的分布,用服从某一分布(均匀分布,高斯分布等)的噪声 z 生成一个类似真实训练数据的样本,追求效果是越像真实样本越好;判别模型 D 是一个二分类器,估计一个样本来自于训练数据(而非生成数据)的概率,如果样本来自于真实的训练数据,D 输出大概率,否则,D 输出小概率。可以做如下类比:生成网络 G 好比假币制造团伙,专门制造假币,判别网络 D 好比警察,专门检测使用的货币是真币还是假币,G 的目标是想方设法生成和真币一样的货币,使得 D 判别不出来,D 的目标是想方设法检测出来 G 生成的假币。如图所示:
在训练的过程中固定一方,更新另一方的网络权重,交替迭代,在这个过程中,双方都极力优化自己的网络,从而形成竞争对抗,直到双方达到一个动态的平衡(纳什均衡),此时生成模型 G 恢复了训练数据的分布(造出了和真实数据一模一样的样本),判别模型再也判别不出来结果,准确率为 50%,约等于乱猜。
上述过程可以表述为如下公式:
当固定生成网络 G 的时候,对于判别网络 D 的优化,可以这样理解:输入来自于真实数据,D 优化网络结构使自己输出 1,输入来自于生成数据,D 优化网络结构使自己输出 0;当固定判别网络 D 的时候,G 优化自己的网络使自己输出尽可能和真实数据一样的样本,并且使得生成的样本经过 D 的判别之后,D 输出高概率。
第一篇文章,在 MNIST 手写数据集上生成的结果如下图:
最右边的一列是真实样本的图像,前面五列是生成网络生成的样本图像,可以看到生成的样本还是很像真实样本的,只是和真实样本属于不同的类,类别是随机的。
第二篇文章想法很简单,就是给 GAN 加上条件,让生成的样本符合我们的预期,这个条件可以是类别标签(例如 MNIST 手写数据集的类别标签),也可以是其他的多模态信息(例如对图像的描述语言)等。用公式表示就是:
式子中的 y 是所加的条件,结构图如下:
生成结果如下图:
图中所加的条件 y 是类别标签。
第三篇文章,简称(DCGAN),在实际中是代码使用率最高的一篇文章,本系列文的代码也是这篇文章代码的初级版本,它优化了网络结构,加入了 conv,batch_norm 等层,使得网络更容易训练,网络结构如下:
可以有加条件和不加条件两种网络,论文还做了好多试验,展示了这个网络在各种数据集上的结果。有兴趣同学可以去看论文,此文我们只从代码的角度理解去理解它。
下面我们就来用 TensorFlow 搭建 GAN(严格说来是 DCGAN,如无特别说明,本系列文章所说的 GAN 均指 DCGAN),如前面所说,GAN 分为有约束条件的 GAN,和不加约束条件的GAN,我们先来搭建一个简单的 MNIST 数据集上加约束条件的 GAN。
首先下载数据:在 /home/your_name/TensorFlow/DCGAN/ 下建立文件夹 data/mnist,从 http://yann.lecun.com/exdb/mnist/ 网站上下载 mnist 数据集 train-images-idx3-ubyte.gz,train-labels-idx1-ubyte.gz,t10k-images-idx3-ubyte.gz,t10k-labels-idx1-ubyte.gz 到 mnist 文件夹下得到四个 .gz 文件。
数据下载好之后,在 /home/your_name/TensorFlow/DCGAN/ 下新建文件 read_data.py 读取数据,输入如下代码:
import os
import numpy as np
def read_data():
# 数据目录
data_dir = '/home/your_name/TensorFlow/DCGAN/data/mnist'
# 打开训练数据
fd = open(os.path.join(data_dir,'train-images-idx3-ubyte'))
# 转化成 numpy 数组
loaded = np.fromfile(file=fd,dtype=np.uint8)
# 根据 mnist 官网描述的数据格式,图像像素从 16 字节开始
trX = loaded[16:].reshape((60000,28,28,1)).astype(np.float)
# 训练 label
fd = open(os.path.join(data_dir,'train-labels-idx1-ubyte'))
loaded = np.fromfile(file=fd,dtype=np.uint8)
trY = loaded[8:].reshape((60000)).astype(np.float)
# 测试数据
fd = open(os.path.join(data_dir,'t10k-images-idx3-ubyte'))
loaded = np.fromfile(file=fd,dtype=np.uint8)
teX = loaded[16:].reshape((10000,28,28,1)).astype(np.float)
# 测试 label
fd = open(os.path.join(data_dir,'t10k-labels-idx1-ubyte'))
loaded = np.fromfile(file=fd,dtype=np.uint8)
teY = loaded[8:].reshape((10000)).astype(np.float)
trY = np.asarray(trY)
teY = np.asarray(teY)
# 由于生成网络由服从某一分布的噪声生成图片,不需要测试集,
# 所以把训练和测试两部分数据合并
X = np.concatenate((trX, teX), axis=0)
y = np.concatenate((trY, teY), axis=0)
# 打乱排序
seed = 547
np.random.seed(seed)
np.random.shuffle(X)
np.random.seed(seed)
np.random.shuffle(y)
# 这里,y_vec 表示对网络所加的约束条件,这个条件是类别标签,
# 可以看到,y_vec 实际就是对 y 的独热编码,关于什么是独热编码,
# 请参考 http://www.cnblogs.com/Charles-Wan/p/6207039.html
y_vec = np.zeros((len(y), 10), dtype=np.float)
for i, label in enumerate(y):
y_vec[i,y[i]] = 1.0
return X/255., y_vec
这里顺便说明一下,由于 MNIST 数据总体占得内存不大(可以看下载的文件,最大的一个 45M 左右,)所以这样读取数据是允许的,一般情况下,数据特别庞大的时候,建议把数据转化成 tfrecords,用 TensorFlow 标准的数据读取格式,这样能带来比较高的效率。
然后,定义一些基本的操作层,例如卷积,池化,全连接等层,在 /home/your_name/TensorFlow/DCGAN/ 新建文件 ops.py,输入如下代码:
import tensorflow as tf
from tensorflow.contrib.layers.python.layers import batch_norm as batch_norm
# 常数偏置
def bias(name, shape, bias_start = 0.0, trainable = True):
dtype = tf.float32
var = tf.get_variable(name, shape, tf.float32, trainable = trainable,
initializer = tf.constant_initializer(
bias_start, dtype = dtype))
return var
# 随机权重
def weight(name, shape, stddev = 0.02, trainable = True):
dtype = tf.float32
var = tf.get_variable(name, shape, tf.float32, trainable = trainable,
initializer = tf.random_normal_initializer(
stddev = stddev, dtype = dtype))
return var
# 全连接层
def fully_connected(value, output_shape, name = 'fully_connected', with_w = False):
shape = value.get_shape().as_list()
with tf.variable_scope(name):
weights = weight('weights', [shape[1], output_shape], 0.02)
biases = bias('biases', [output_shape], 0.0)
if with_w:
return tf.matmul(value, weights) + biases, weights, biases
else:
return tf.matmul(value, weights) + biases
# Leaky-ReLu 层
def lrelu(x, leak=0.2, name = 'lrelu'):
with tf.variable_scope(name):
return tf.maximum(x, leak*x, name = name)
# ReLu 层
def relu(value, name = 'relu'):
with tf.variable_scope(name):
return tf.nn.relu(value)
# 解卷积层
def deconv2d(value, output_shape, k_h = 5, k_w = 5, strides =[1, 2, 2, 1],
name = 'deconv2d', with_w = False):
with tf.variable_scope(name):
weights = weight('weights',
[k_h, k_w, output_shape[-1], value.get_shape()[-1]])
deconv = tf.nn.conv2d_transpose(value, weights,
output_shape, strides = strides)
biases = bias('biases', [output_shape[-1]])
deconv = tf.reshape(tf.nn.bias_add(deconv, biases), deconv.get_shape())
if with_w:
return deconv, weights, biases
else:
return deconv
# 卷积层
def conv2d(value, output_dim, k_h = 5, k_w = 5,
strides =[1, 2, 2, 1], name = 'conv2d'):
with tf.variable_scope(name):
weights = weight('weights',
[k_h, k_w, value.get_shape()[-1], output_dim])
conv = tf.nn.conv2d(value, weights, strides = strides, padding = 'SAME')
biases = bias('biases', [output_dim])
conv = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())
return conv
# 把约束条件串联到 feature map
def conv_cond_concat(value, cond, name = 'concat'):
# 把张量的维度形状转化成 Python 的 list
value_shapes = value.get_shape().as_list()
cond_shapes = cond.get_shape().as_list()
# 在第三个维度上(feature map 维度上)把条件和输入串联起来,
# 条件会被预先设为四维张量的形式,假设输入为 [64, 32, 32, 32] 维的张量,
# 条件为 [64, 32, 32, 10] 维的张量,那么输出就是一个 [64, 32, 32, 42] 维张量
with tf.variable_scope(name):
return tf.concat(3, [value,
cond * tf.ones(value_shapes[0:3] + cond_shapes[3:])])
# Batch Normalization 层
def batch_norm_layer(value, is_train = True, name = 'batch_norm'):
with tf.variable_scope(name) as scope:
if is_train:
return batch_norm(value, decay = 0.9, epsilon = 1e-5, scale = True,
is_training = is_train,
updates_collections = None, scope = scope)
else:
return batch_norm(value, decay = 0.9, epsilon = 1e-5, scale = True,
is_training = is_train, reuse = True,
updates_collections = None, scope = scope)
mma,否则 gamma 不使用,is_train 也是布尔变量,为真值代表训练过程,否则代表测试过程(在 BN 层中,训练过程和测试过程是不同的,具体请参考论文:https://arxiv.org/abs/1502.03167)。关于 batch_norm 的其他的参数,请看参考文献2。
在 /home/your_name/TensorFlow/DCGAN/ 下新建文件 utils.py,输入如下代码:
import scipy.misc
import numpy as np
# 保存图片函数
def save_images(images, size, path):
"""
Save the samples images
The best size number is
int(max(sqrt(image.shape[0]),sqrt(image.shape[1]))) + 1
example:
The batch_size is 64, then the size is recommended [8, 8]
The batch_size is 32, then the size is recommended [6, 6]
"""
# 图片归一化,主要用于生成器输出是 tanh 形式的归一化
img = (images + 1.0) / 2.0
h, w = img.shape[1], img.shape[2]
# 产生一个大画布,用来保存生成的 batch_size 个图像
merge_img = np.zeros((h * size[0], w * size[1], 3))
# 循环使得画布特定地方值为某一幅图像的值
for idx, image in enumerate(images):
i = idx % size[1]
j = idx // size[1]
merge_img[j*h:j*h+h, i*w:i*w+w, :] = image
# 保存画布
return scipy.misc.imsave(path, merge_img)
这个函数的作用是在训练的过程中保存采样生成的图片。
在 /home/your_name/TensorFlow/DCGAN/ 下新建文件 model.py,定义生成器,判别器和训练过程中的采样网络,在 model.py 输入如下代码
import tensorflow as tf
from ops import *
BATCH_SIZE = 64
# 定义生成器
def generator(z, y, train = True):
# y 是一个 [BATCH_SIZE, 10] 维的向量,把 y 转成四维张量
yb = tf.reshape(y, [BATCH_SIZE, 1, 1, 10], name = 'yb')
# 把 y 作为约束条件和 z 拼接起来
z = tf.concat(1, [z, y], name = 'z_concat_y')
# 经过一个全连接,BN 和激活层 ReLu
h1 = tf.nn.relu(batch_norm_layer(fully_connected(z, 1024, 'g_fully_connected1'),
is_train = train, name = 'g_bn1'))
# 把约束条件和上一层拼接起来
h1 = tf.concat(1, [h1, y], name = 'active1_concat_y')
h2 = tf.nn.relu(batch_norm_layer(fully_connected(h1, 128 * 49, 'g_fully_connected2'),
is_train = train, name = 'g_bn2'))
h2 = tf.reshape(h2, [64, 7, 7, 128], name = 'h2_reshape')
# 把约束条件和上一层拼接起来
h2 = conv_cond_concat(h2, yb, name = 'active2_concat_y')
h3 = tf.nn.relu(batch_norm_layer(deconv2d(h2, [64,14,14,128],
name = 'g_deconv2d3'),
is_train = train, name = 'g_bn3'))
h3 = conv_cond_concat(h3, yb, name = 'active3_concat_y')
# 经过一个 sigmoid 函数把值归一化为 0~1 之间,
h4 = tf.nn.sigmoid(deconv2d(h3, [64, 28, 28, 1],
name = 'g_deconv2d4'), name = 'generate_image')
return h4
# 定义判别器
def discriminator(image, y, reuse = False):
# 因为真实数据和生成数据都要经过判别器,所以需要指定 reuse 是否可用
if reuse:
tf.get_variable_scope().reuse_variables()
# 同生成器一样,判别器也需要把约束条件串联进来
yb = tf.reshape(y, [BATCH_SIZE, 1, 1, 10], name = 'yb')
x = conv_cond_concat(image, yb, name = 'image_concat_y')
# 卷积,激活,串联条件。
h1 = lrelu(conv2d(x, 11, name = 'd_conv2d1'), name = 'lrelu1')
h1 = conv_cond_concat(h1, yb, name = 'h1_concat_yb')
h2 = lrelu(batch_norm_layer(conv2d(h1, 74, name = 'd_conv2d2'),
name = 'd_bn2'), name = 'lrelu2')
h2 = tf.reshape(h2, [BATCH_SIZE, -1], name = 'reshape_lrelu2_to_2d')
h2 = tf.concat(1, [h2, y], name = 'lrelu2_concat_y')
h3 = lrelu(batch_norm_layer(fully_connected(h2, 1024, name = 'd_fully_connected3'),
name = 'd_bn3'), name = 'lrelu3')
h3 = tf.concat(1,[h3, y], name = 'lrelu3_concat_y')
# 全连接层,输出以为 loss 值
h4 = fully_connected(h3, 1, name = 'd_result_withouts_sigmoid')
return tf.nn.sigmoid(h4, name = 'discriminator_result_with_sigmoid'), h4
# 定义训练过程中的采样函数
def sampler(z, y, train = True):
tf.get_variable_scope().reuse_variables()
return generator(z, y, train = train)
可以看到,生成器由 7 × 7 变为 14 × 14 再变为 28 × 28大小,每一层都加入了约束条件 y,完美的诠释了论文所给出的网络,之所以要加入 is_train 参数,是由于 Batch_norm 层中训练和测试的时候的过程是不同的,用这个参数区分训练和测试,生成器的最后一层,用了一个 sigmoid 函数把值归一化到 0~1 之间,如果是不加约束的网络,则用 tanh 函数,所以在 save_images 函数中要用到语句:img = (images + 1.0) / 2.0。
sampler 函数的作用是在训练过程中对生成器生成的图片进行采样,所以这个函数必须指定 reuse 可用,关于 reuse 说明,请看:http://www.cnblogs.com/Charles-Wan/p/6200446.html。
(三)训练和测试GAN
在 /home/your_name/TensorFlow/DCGAN/ 下新建文件 train.py,同时新建文件夹 logs 和文件夹 samples,前者用来保存训练过程中的日志和模型,后者用来保存训练过程中采样器的采样图片,在 train.py 中输入如下代码:
# -*- coding: utf-8 -*-
import tensorflow as tf
import os
from read_data import *
from utils import *
from ops import *
from model import *
from model import BATCH_SIZE
def train():
# 设置 global_step ,用来记录训练过程中的 step
global_step = tf.Variable(0, name = 'global_step', trainable = False)
# 训练过程中的日志保存文件
train_dir = '/home/your_name/TensorFlow/DCGAN/logs'
# 放置三个 placeholder,y 表示约束条件,images 表示送入判别器的图片,
# z 表示随机噪声
y= tf.placeholder(tf.float32, [BATCH_SIZE, 10], name='y')
images = tf.placeholder(tf.float32, [64, 28, 28, 1], name='real_images')
z = tf.placeholder(tf.float32, [None, 100], name='z')
# 由生成器生成图像 G
G = generator(z, y)
# 真实图像送入判别器
D, D_logits = discriminator(images, y)
# 采样器采样图像
samples = sampler(z, y)
# 生成图像送入判别器
D_, D_logits_ = discriminator(G, y, reuse = True)
# 损失计算
d_loss_real = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(D_logits, tf.ones_like(D)))
d_loss_fake = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(D_logits_, tf.zeros_like(D_)))
d_loss = d_loss_real + d_loss_fake
g_loss = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(D_logits_, tf.ones_like(D_)))
# 总结操作
z_sum = tf.histogram_summary("z", z)
d_sum = tf.histogram_summary("d", D)
d__sum = tf.histogram_summary("d_", D_)
G_sum = tf.image_summary("G", G)
d_loss_real_sum = tf.scalar_summary("d_loss_real", d_loss_real)
d_loss_fake_sum = tf.scalar_summary("d_loss_fake", d_loss_fake)
d_loss_sum = tf.scalar_summary("d_loss", d_loss)
g_loss_sum = tf.scalar_summary("g_loss", g_loss)
# 合并各自的总结
g_sum = tf.merge_summary([z_sum, d__sum, G_sum, d_loss_fake_sum, g_loss_sum])
d_sum = tf.merge_summary([z_sum, d_sum, d_loss_real_sum, d_loss_sum])
# 生成器和判别器要更新的变量,用于 tf.train.Optimizer 的 var_list
t_vars = tf.trainable_variables()
d_vars = [var for var in t_vars if 'd_' in var.name]
g_vars = [var for var in t_vars if 'g_' in var.name]
saver = tf.train.Saver()
# 优化算法采用 Adam
d_optim = tf.train.AdamOptimizer(0.0002, beta1 = 0.5) \
.minimize(d_loss, var_list = d_vars, global_step = global_step)
g_optim = tf.train.AdamOptimizer(0.0002, beta1 = 0.5) \
.minimize(g_loss, var_list = g_vars, global_step = global_step)
os.environ['CUDA_VISIBLE_DEVICES'] = str(0)
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.2
sess = tf.InteractiveSession(config=config)
init = tf.initialize_all_variables()
writer = tf.train.SummaryWriter(train_dir, sess.graph)
# 这个自己理解吧
data_x, data_y = read_data()
sample_z = np.random.uniform(-1, 1, size=(BATCH_SIZE, 100))
# sample_images = data_x[0: 64]
sample_labels = data_y[0: 64]
sess.run(init)
# 循环 25 个 epoch 训练网络
for epoch in range(25):
batch_idxs = 1093
for idx in range(batch_idxs):
batch_images = data_x[idx*64: (idx+1)*64]
batch_labels = data_y[idx*64: (idx+1)*64]
batch_z = np.random.uniform(-1, 1, size=(BATCH_SIZE, 100))
# 更新 D 的参数
_, summary_str = sess.run([d_optim, d_sum],
feed_dict = {images: batch_images,
z: batch_z,
y: batch_labels})
writer.add_summary(summary_str, idx+1)
# 更新 G 的参数
_, summary_str = sess.run([g_optim, g_sum],
feed_dict = {z: batch_z,
y: batch_labels})
writer.add_summary(summary_str, idx+1)
# 更新两次 G 的参数确保网络的稳定
_, summary_str = sess.run([g_optim, g_sum],
feed_dict = {z: batch_z,
y: batch_labels})
writer.add_summary(summary_str, idx+1)
# 计算训练过程中的损失,打印出来
errD_fake = d_loss_fake.eval({z: batch_z, y: batch_labels})
errD_real = d_loss_real.eval({images: batch_images, y: batch_labels})
errG = g_loss.eval({z: batch_z, y: batch_labels})
if idx % 20 == 0:
print("Epoch: [%2d] [%4d/%4d] d_loss: %.8f, g_loss: %.8f" \
% (epoch, idx, batch_idxs, errD_fake+errD_real, errG))
# 训练过程中,用采样器采样,并且保存采样的图片到
# /home/your_name/TensorFlow/DCGAN/samples/
if idx % 100 == 1:
sample = sess.run(samples, feed_dict = {z: sample_z, y: sample_labels})
samples_path = '/home/your_name/TensorFlow/DCGAN/samples/'
save_images(sample, [8, 8],
samples_path + 'test_%d_epoch_%d.png' % (epoch, idx))
print 'save down'
# 每过 500 次迭代,保存一次模型
if idx % 500 == 2:
checkpoint_path = os.path.join(train_dir, 'DCGAN_model.ckpt')
saver.save(sess, checkpoint_path, global_step = idx+1)
sess.close()
if __name__ == '__main__':
train()
输入完成后点击运行,运行过程中,可以看到,生成的每个图片对应行对应列都是一样的数字,这是因为我们加了条件约束;采样器 sampler 采样的图片被保存在 samples 文件夹下,由模糊到清晰,由刚开始的噪声,慢慢变成手写字符,最后完全区分不出来是生成图片还是真实图片,反正我是区分不出来,you can you up。
与此同时,要是在训练的时候打开 TensorBoard,可以看到 D 的分布,大致在趋于 0.5 左右的附件徘徊,说明判别器 D 已经趋于判别不出来了,只能随机猜测,正确率大致 0.5。
讲道理,我们的 GAN 到这一步,已经算是完成了,测试的过程,我们已经在训练的时候通过采样完成了,如果嫌不够,非要单独写个测试的文件,也不是不可以:
在 /home/your_name/TensorFlow/DCGAN/ 下新建文件 eval.py 和文件夹 eval,eval 文件夹用来保存测试结果图片,在 eval.py 中输入如下代码:
# -*- coding: utf-8 -*-
import tensorflow as tf
import os
from read_data import *
from utils import *
from ops import *
from model import *
from model import BATCH_SIZE
def eval():
# 用于存放测试图片
test_dir = '/home/your_name/TensorFlow/DCGAN/eval/'
# 从此处加载模型
checkpoint_dir = '/home/your_name/TensorFlow/DCGAN/logs/'
y= tf.placeholder(tf.float32, [BATCH_SIZE, 10], name='y')
z = tf.placeholder(tf.float32, [None, 100], name='z')
G = generator(z, y)
data_x, data_y = read_data()
sample_z = np.random.uniform(-1, 1, size=(BATCH_SIZE, 100))
sample_labels = data_y[120: 184]
# 读取 ckpt 需要 sess,saver
print("Reading checkpoints...")
ckpt = tf.train.get_checkpoint_state(checkpoint_dir)
# saver
saver = tf.train.Saver(tf.all_variables())
# sess
os.environ['CUDA_VISIBLE_DEVICES'] = str(0)
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.2
sess = tf.InteractiveSession(config=config)
# 从保存的模型中恢复变量
if ckpt and ckpt.model_checkpoint_path:
ckpt_name = os.path.basename(ckpt.model_checkpoint_path)
saver.restore(sess, os.path.join(checkpoint_dir, ckpt_name))
# 用恢复的变量进行生成器的测试
test_sess = sess.run(G, feed_dict = {z: sample_z, y: sample_labels})
# 保存测试的生成器图片到特定文件夹
save_images(test_sess, [8, 8], test_dir + 'test_%d.png' % 500)
sess.close()
if __name__ == '__main__':
eval()
点击运行,在 eval 文件夹下生成test_500.png 文件,可以看到,生成器 G 已经可以生成不错的结果。
训练测试完,可以打开 TensorBoard 查看网络的 Graph,可以看到,由于没有细致采用 namespace 和 variable_scope ,画出来的 Graph 比较凌乱,只能依稀的看出来网络的一些结构。
至此,我们的 TensorFlow GAN 工作基本完成,细心的朋友会发现,我们的程序存在以下几个问题:
1)在写 eval() 函数的时候,对于生成函数 generator(),没有指定 train = False,也就是在 BN 层,没有体现出训练和测试的区别;
2)在我的这篇 http://www.cnblogs.com/Charles-Wan/p/6197019.html 博客中,提到了我采用了 tfrecords 进行 GAN 数据的输入处理,但是此程序并没有体现出来;
3)没有细致的采用 namespace 和 variable_scope ,画出来的 Graph 比较凌乱;
4)程序中太多不明含义的数字,路径名字全都采用绝对路径;
5)训练过程中不能断点续训练等。
针对以上问题,我们在不加约束 GAN 上将进行改进。
GAN 这个领域发展太快,日新月异,各种 GAN 层出不穷,前几天看到一篇关于 Wasserstein GAN 的文章,讲的很好,在此把它分享出来一起学习:https://zhuanlan.zhihu.com/p/25071913。相比 Wasserstein GAN ,我们的 DCGAN 好像低了一个档次,但是我们伟大的教育家鲁迅先生说过:“合抱之木,生于毫末;九层之台,起于累土;千里之行,始于足下”,(依稀记得那大概是我 7 - 8 岁的时候,鲁迅先生依偎在我身旁,带着和蔼可亲切的口吻对我说的这句话,他当时还加了一句话,小伙子你要记住,如果一句名言,你不知道是谁说的,那就是鲁迅说的)。所以我们的基础还是要打好的, DCGAN 是我们的基础,有了 DCGAN 的代码经验,相信写起 Wasserstein GAN 就顺手很多,所以,我们接下来继续来研究我们的无约束条件 DCGAN。
在上一篇文章中,我们用 MNIST 手写字符训练 GAN,生成网络 G 生成了相对比较好的手写字符,这一次,我们换个数据集,用 CelebA 人脸数据集来训练我们的 GAN,相比于手写字符,人脸数据集的分布更加复杂多样,长头发短头发,黄种人黑种人,戴眼镜不戴眼镜,男人女人等等,看看我们的生成网络 G 能否成功的检验出人脸数据集的分布。
首先准备数据:从官网分享的百度云盘连接 https://pan.baidu.com/s/1eSNpdRG#list/path=%2FCelebA%2FImg 下载 img_align_celeba.zip,在 /home/your_name/TensorFlow/DCGAN/data 文件夹下解压,得到 img_align_celeba 文件夹,里面有 20600 张人脸图片,在 /home/your_name/TensorFlow/DCGAN/data 文件夹下新建 img_align_celeba_tfrecords 文件夹,用来存放 tfrecords 文件,然后,在 /home/your_name/TensorFlow/DCGAN/ 下新建 convert_data.py,编写如下的代码,把人脸图片转化成 tfrecords 形式:
import os
import time
from PIL import Image
import tensorflow as tf
# 将图片裁剪为 128 x 128
OUTPUT_SIZE = 128
# 图片通道数,3 表示彩色
DEPTH = 3
def _int64_feature(value):
return tf.train.Feature(int64_list = tf.train.Int64List(value = [value]))
def _bytes_feature(value):
return tf.train.Feature(bytes_list = tf.train.BytesList(value = [value]))
def convert_to(data_path, name):
"""
Converts s dataset to tfrecords
"""
rows = 64
cols = 64
depth = DEPTH
# 循环 12 次,产生 12 个 .tfrecords 文件
for ii in range(12):
writer = tf.python_io.TFRecordWriter(name + str(ii) + '.tfrecords')
# 每个 tfrecord 文件有 16384 个图片
for img_name in os.listdir(data_path)[ii*16384 : (ii+1)*16384]:
# 打开图片
img_path = data_path + img_name
img = Image.open(img_path)
# 设置裁剪参数
h, w = img.size[:2]
j, k = (h - OUTPUT_SIZE) / 2, (w - OUTPUT_SIZE) / 2
box = (j, k, j + OUTPUT_SIZE, k+ OUTPUT_SIZE)
# 裁剪图片
img = img.crop(box = box)
# image resize
img = img.resize((rows,cols))
# 转化为字节
img_raw = img.tobytes()
# 写入到 Example
example = tf.train.Example(features = tf.train.Features(feature = {
'height': _int64_feature(rows),
'width': _int64_feature(cols),
'depth': _int64_feature(depth),
'image_raw': _bytes_feature(img_raw)}))
writer.write(example.SerializeToString())
writer.close()
if __name__ == '__main__':
current_dir = os.getcwd()
data_path = current_dir + '/data/img_align_celeba/'
name = current_dir + '/data/img_align_celeba_tfrecords/train'
start_time = time.time()
print('Convert start')
print('\n' * 2)
convert_to(data_path, name)
print('\n' * 2)
print('Convert done, take %.2f seconds' % (time.time() - start_time))
运行之后,在 /home/your_name/TensorFlow/DCGAN/data/img_align_celeba_tfrecords/ 下会产生 12 个 .tfrecords 文件,这就是我们要的数据格式。
数据准备好之后,根据前面的经验,我们来写无约束条件的 DCGAN 代码,在 /home/your_name/TensorFlow/DCGAN/ 新建 none_cond_DCGAN.py 文件敲写代码,为了简便起见,代码中没有加注释并且把所有的代码总结到一个代码中,从代码中可以看到,我们自己写了一个 batch_norm 层,解决了 evaluation 函数中 is_train = False 的问题,并且可以断点续训练(只需要将开头的 LOAD_MODEL 设置为 True);此外该程序在开头采用很多的宏定义,可以方便的改为 tf.app.flags 定义的命令行参数,进而在命令行终端进行训练,还可以进行类的拓展,例如:
class DCGAN(object):
def __init__(self):
self.BATCH_SIZE = 64
...
def bias(self):
...
...
关于类的拓展,这里不做过多说明。
在 none_cond_DCGAN.py 文件中敲写如下代码:
import os
import numpy as np
import scipy.misc
import tensorflow as tf
BATCH_SIZE = 64
OUTPUT_SIZE = 64
GF = 64 # Dimension of G filters in first conv layer. default [64]
DF = 64 # Dimension of D filters in first conv layer. default [64]
Z_DIM = 100
IMAGE_CHANNEL = 3
LR = 0.0002 # Learning rate
EPOCH = 5
LOAD_MODEL = False # Whether or not continue train from saved model。
TRAIN = True
CURRENT_DIR = os.getcwd()
def bias(name, shape, bias_start = 0.0, trainable = True):
dtype = tf.float32
var = tf.get_variable(name, shape, tf.float32, trainable = trainable,
initializer = tf.constant_initializer(
bias_start, dtype = dtype))
return var
def weight(name, shape, stddev = 0.02, trainable = True):
dtype = tf.float32
var = tf.get_variable(name, shape, tf.float32, trainable = trainable,
initializer = tf.random_normal_initializer(
stddev = stddev, dtype = dtype))
return var
def fully_connected(value, output_shape, name = 'fully_connected', with_w = False):
shape = value.get_shape().as_list()
with tf.variable_scope(name):
weights = weight('weights', [shape[1], output_shape], 0.02)
biases = bias('biases', [output_shape], 0.0)
if with_w:
return tf.matmul(value, weights) + biases, weights, biases
else:
return tf.matmul(value, weights) + biases
def lrelu(x, leak=0.2, name = 'lrelu'):
with tf.variable_scope(name):
return tf.maximum(x, leak*x, name = name)
def relu(value, name = 'relu'):
with tf.variable_scope(name):
return tf.nn.relu(value)
def deconv2d(value, output_shape, k_h = 5, k_w = 5, strides =[1, 2, 2, 1],
name = 'deconv2d', with_w = False):
with tf.variable_scope(name):
weights = weight('weights',
[k_h, k_w, output_shape[-1], value.get_shape()[-1]])
deconv = tf.nn.conv2d_transpose(value, weights,
output_shape, strides = strides)
biases = bias('biases', [output_shape[-1]])
deconv = tf.reshape(tf.nn.bias_add(deconv, biases), deconv.get_shape())
if with_w:
return deconv, weights, biases
else:
return deconv
def conv2d(value, output_dim, k_h = 5, k_w = 5,
strides =[1, 2, 2, 1], name = 'conv2d'):
with tf.variable_scope(name):
weights = weight('weights',
[k_h, k_w, value.get_shape()[-1], output_dim])
conv = tf.nn.conv2d(value, weights, strides = strides, padding = 'SAME')
biases = bias('biases', [output_dim])
conv = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())
return conv
def conv_cond_concat(value, cond, name = 'concat'):
"""
Concatenate conditioning vector on feature map axis.
"""
value_shapes = value.get_shape().as_list()
cond_shapes = cond.get_shape().as_list()
with tf.variable_scope(name):
return tf.concat(3,
[value, cond * tf.ones(value_shapes[0:3] + cond_shapes[3:])])
def batch_norm(value, is_train = True, name = 'batch_norm',
epsilon = 1e-5, momentum = 0.9):
with tf.variable_scope(name):
ema = tf.train.ExponentialMovingAverage(decay = momentum)
shape = value.get_shape().as_list()[-1]
beta = bias('beta', [shape], bias_start = 0.0)
gamma = bias('gamma', [shape], bias_start = 1.0)
if is_train:
batch_mean, batch_variance = tf.nn.moments(
value, [0, 1, 2], name = 'moments')
moving_mean = bias('moving_mean', [shape], 0.0, False)
moving_variance = bias('moving_variance', [shape], 1.0, False)
ema_apply_op = ema.apply([batch_mean, batch_variance])
assign_mean = moving_mean.assign(ema.average(batch_mean))
assign_variance = \
moving_variance.assign(ema.average(batch_variance))
with tf.control_dependencies([ema_apply_op]):
mean, variance = \
tf.identity(batch_mean), tf.identity(batch_variance)
with tf.control_dependencies([assign_mean, assign_variance]):
return tf.nn.batch_normalization(
value, mean, variance, beta, gamma, 1e-5)
else:
mean = bias('moving_mean', [shape], 0.0, False)
variance = bias('moving_variance', [shape], 1.0, False)
return tf.nn.batch_normalization(
value, mean, variance, beta, gamma, epsilon)
def generator(z, is_train = True, name = 'generator'):
with tf.name_scope(name):
s2, s4, s8, s16 = \
OUTPUT_SIZE/2, OUTPUT_SIZE/4, OUTPUT_SIZE/8, OUTPUT_SIZE/16
h1 = tf.reshape(fully_connected(z, GF*8*s16*s16, 'g_fc1'),
[-1, s16, s16, GF*8], name = 'reshap')
h1 = relu(batch_norm(h1, name = 'g_bn1', is_train = is_train))
h2 = deconv2d(h1, [BATCH_SIZE, s8, s8, GF*4], name = 'g_deconv2d1')
h2 = relu(batch_norm(h2, name = 'g_bn2', is_train = is_train))
h3 = deconv2d(h2, [BATCH_SIZE, s4, s4, GF*2], name = 'g_deconv2d2')
h3 = relu(batch_norm(h3, name = 'g_bn3', is_train = is_train))
h4 = deconv2d(h3, [BATCH_SIZE, s2, s2, GF*1], name = 'g_deconv2d3')
h4 = relu(batch_norm(h4, name = 'g_bn4', is_train = is_train))
h5 = deconv2d(h4, [BATCH_SIZE, OUTPUT_SIZE, OUTPUT_SIZE, 3],
name = 'g_deconv2d4')
return tf.nn.tanh(h5)
def discriminator(image, reuse = False, name = 'discriminator'):
with tf.name_scope(name):
if reuse:
tf.get_variable_scope().reuse_variables()
h0 = lrelu(conv2d(image, DF, name='d_h0_conv'), name = 'd_h0_lrelu')
h1 = lrelu(batch_norm(conv2d(h0, DF*2, name='d_h1_conv'),
name = 'd_h1_bn'), name = 'd_h1_lrelu')
h2 = lrelu(batch_norm(conv2d(h1, DF*4, name='d_h2_conv'),
name = 'd_h2_bn'), name = 'd_h2_lrelu')
h3 = lrelu(batch_norm(conv2d(h2, DF*8, name='d_h3_conv'),
name = 'd_h3_bn'), name = 'd_h3_lrelu')
h4 = fully_connected(tf.reshape(h3, [BATCH_SIZE, -1]), 1, 'd_h4_fc')
return tf.nn.sigmoid(h4), h4
def sampler(z, is_train = False, name = 'sampler'):
with tf.name_scope(name):
tf.get_variable_scope().reuse_variables()
return generator(z, is_train = is_train)
def read_and_decode(filename_queue):
"""
read and decode tfrecords
"""
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
features = tf.parse_single_example(serialized_example,features = {
'image_raw':tf.FixedLenFeature([], tf.string)})
image = tf.decode_raw(features['image_raw'], tf.uint8)
image = tf.reshape(image, [OUTPUT_SIZE, OUTPUT_SIZE, 3])
image = tf.cast(image, tf.float32)
image = image / 255.0
return image
def inputs(data_dir, batch_size, name = 'input'):
"""
Reads input data num_epochs times.
"""
with tf.name_scope(name):
filenames = [
os.path.join(data_dir,'train%d.tfrecords' % ii) for ii in range(12)]
filename_queue = tf.train.string_input_producer(filenames)
image = read_and_decode(filename_queue)
images = tf.train.shuffle_batch([image], batch_size = batch_size,
num_threads = 4,
capacity = 20000 + 3 * batch_size,
min_after_dequeue = 20000)
return images
def save_images(images, size, path):
"""
Save the samples images
The best size number is
int(max(sqrt(image.shape[1]),sqrt(image.shape[1]))) + 1
"""
img = (images + 1.0) / 2.0
h, w = img.shape[1], img.shape[2]
merge_img = np.zeros((h * size[0], w * size[1], 3))
for idx, image in enumerate(images):
i = idx % size[1]
j = idx // size[1]
merge_img[j*h:j*h+h, i*w:i*w+w, :] = image
return scipy.misc.imsave(path, merge_img)
def train():
global_step = tf.Variable(0, name = 'global_step', trainable = False)
train_dir = CURRENT_DIR + '/logs_without_condition/'
data_dir = CURRENT_DIR + '/data/img_align_celeba_tfrecords/'
images = inputs(data_dir, BATCH_SIZE)
z = tf.placeholder(tf.float32, [None, Z_DIM], name='z')
G = generator(z)
D, D_logits = discriminator(images)
samples = sampler(z)
D_, D_logits_ = discriminator(G, reuse = True)
d_loss_real = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(D_logits, tf.ones_like(D)))
d_loss_fake = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(D_logits_, tf.zeros_like(D_)))
d_loss = d_loss_real + d_loss_fake
g_loss = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(D_logits_, tf.ones_like(D_)))
z_sum = tf.histogram_summary('z', z)
d_sum = tf.histogram_summary('d', D)
d__sum = tf.histogram_summary('d_', D_)
G_sum = tf.image_summary('G', G)
d_loss_real_sum = tf.scalar_summary('d_loss_real', d_loss_real)
d_loss_fake_sum = tf.scalar_summary('d_loss_fake', d_loss_fake)
d_loss_sum = tf.scalar_summary('d_loss', d_loss)
g_loss_sum = tf.scalar_summary('g_loss', g_loss)
g_sum = tf.merge_summary([z_sum, d__sum, G_sum, d_loss_fake_sum, g_loss_sum])
d_sum = tf.merge_summary([z_sum, d_sum, d_loss_real_sum, d_loss_sum])
t_vars = tf.trainable_variables()
d_vars = [var for var in t_vars if 'd_' in var.name]
g_vars = [var for var in t_vars if 'g_' in var.name]
saver = tf.train.Saver()
d_optim = tf.train.AdamOptimizer(LR, beta1 = 0.5) \
.minimize(d_loss, var_list = d_vars, global_step = global_step)
g_optim = tf.train.AdamOptimizer(LR, beta1 = 0.5) \
.minimize(g_loss, var_list = g_vars, global_step = global_step)
os.environ['CUDA_VISIBLE_DEVICES'] = str(0)
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.2
sess = tf.InteractiveSession(config=config)
writer = tf.train.SummaryWriter(train_dir, sess.graph)
sample_z = np.random.uniform(-1, 1, size = (BATCH_SIZE, Z_DIM))
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess = sess, coord = coord)
init = tf.initialize_all_variables()
sess.run(init)
start = 0
if LOAD_MODEL:
print(" [*] Reading checkpoints...")
ckpt = tf.train.get_checkpoint_state(train_dir)
if ckpt and ckpt.model_checkpoint_path:
ckpt_name = os.path.basename(ckpt.model_checkpoint_path)
saver.restore(sess, os.path.join(train_dir, ckpt_name))
global_step = ckpt.model_checkpoint_path.split('/')[-1]\
.split('-')[-1]
print('Loading success, global_step is %s' % global_step)
start = int(global_step)
for epoch in range(EPOCH):
batch_idxs = 3072
if epoch:
start = 0
for idx in range(start, batch_idxs):
batch_z = np.random.uniform(-1, 1, size = (BATCH_SIZE, Z_DIM))
_, summary_str = sess.run([d_optim, d_sum], feed_dict = {z: batch_z})
writer.add_summary(summary_str, idx+1)
# Update G network
_, summary_str = sess.run([g_optim, g_sum], feed_dict = {z: batch_z})
writer.add_summary(summary_str, idx+1)
# Run g_optim twice to make sure that d_loss does not go to zero
_, summary_str = sess.run([g_optim, g_sum], feed_dict = {z: batch_z})
writer.add_summary(summary_str, idx+1)
errD_fake = d_loss_fake.eval({z: batch_z})
errD_real = d_loss_real.eval()
errG = g_loss.eval({z: batch_z})
if idx % 20 == 0:
print("[%4d/%4d] d_loss: %.8f, g_loss: %.8f" \
% (idx, batch_idxs, errD_fake+errD_real, errG))
if idx % 100 == 0:
sample = sess.run(samples, feed_dict = {z: sample_z})
samples_path = CURRENT_DIR + '/samples_without_condition/'
save_images(sample, [8, 8],
samples_path + \
'sample_%d_epoch_%d.png' % (epoch, idx))
print '\n'*2
print('=========== %d_epoch_%d.png save down ==========='
%(epoch, idx))
print '\n'*2
if (idx % 512 == 0) or (idx + 1 == batch_idxs):
checkpoint_path = os.path.join(train_dir,
'my_dcgan_tfrecords.ckpt')
saver.save(sess, checkpoint_path, global_step = idx+1)
print '********* model saved *********'
print '******* start with %d *******' % start
coord.request_stop()
coord.join(threads)
sess.close()
def evaluate():
eval_dir = CURRENT_DIR + '/eval/'
checkpoint_dir = CURRENT_DIR + '/logs_without_condition/'
z = tf.placeholder(tf.float32, [None, Z_DIM], name='z')
G = generator(z, is_train = False)
sample_z1 = np.random.uniform(-1, 1, size=(BATCH_SIZE, Z_DIM))
sample_z2 = np.random.uniform(-1, 1, size=(BATCH_SIZE, Z_DIM))
sample_z3 = (sample_z1 + sample_z2) / 2
sample_z4 = (sample_z1 + sample_z3) / 2
sample_z5 = (sample_z2 + sample_z3) / 2
print("Reading checkpoints...")
ckpt = tf.train.get_checkpoint_state(checkpoint_dir)
saver = tf.train.Saver(tf.all_variables())
os.environ['CUDA_VISIBLE_DEVICES'] = str(0)
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.2
sess = tf.InteractiveSession(config=config)
if ckpt and ckpt.model_checkpoint_path:
ckpt_name = os.path.basename(ckpt.model_checkpoint_path)
global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
saver.restore(sess, os.path.join(checkpoint_dir, ckpt_name))
print('Loading success, global_step is %s' % global_step)
eval_sess1 = sess.run(G, feed_dict = {z: sample_z1})
eval_sess2 = sess.run(G, feed_dict = {z: sample_z4})
eval_sess3 = sess.run(G, feed_dict = {z: sample_z3})
eval_sess4 = sess.run(G, feed_dict = {z: sample_z5})
eval_sess5 = sess.run(G, feed_dict = {z: sample_z2})
print(eval_sess3.shape)
save_images(eval_sess1, [8, 8], eval_dir + 'eval_%d.png' % 1)
save_images(eval_sess2, [8, 8], eval_dir + 'eval_%d.png' % 2)
save_images(eval_sess3, [8, 8], eval_dir + 'eval_%d.png' % 3)
save_images(eval_sess4, [8, 8], eval_dir + 'eval_%d.png' % 4)
save_images(eval_sess5, [8, 8], eval_dir + 'eval_%d.png' % 5)
sess.close()
if __name__ == '__main__':
if TRAIN:
train()
else:
evaluate()
完成后,运行代码,网络开始训练,大致需要 1~2 个小时,训练就可以完成,在训练的过程中,可以看出 sampler 采样的生成结果越来越好,最后得到了一个如下图所示的结果,由于人脸的数据分布比手写数据分布复杂多样,所以生成器不能完全抓住人脸的特征,下图所示的第 6 行第 7 列就是一个很糟糕的生成图像。
训练完成后,我们用 tensorboard 打开网络的 graph,看看经过我们的精心设计,网络结构变成了什么样子:
可以看出来,这次的结构图,比之前的顺眼多了。
先来梳理一下我们之前所写的代码,原始的生成对抗网络,所要优化的目标函数为:
此目标函数可以分为两部分来看:
①固定生成器 G,优化判别器 D, 则上式可以写成如下形式:
可以转化为最小化形式:
我们编写的代码中,d_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = D_logits, labels = tf.ones_like(D))),由于我们判别器最后一层是 sigmoid ,所以可以看出来 d_loss_real 是上式中的第一项(舍去常数概率 1/2),d_loss_fake 为上式中的第二项。
②固定判别器 D,优化生成器 G,舍去前面的常数,相当于最小化:
也相当于最小化:
我们的代码中,g_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = D_logits_, labels = tf.ones_like(D))),完美对应上式。
接下来开始我们的 WGAN 之旅,正如 https://zhuanlan.zhihu.com/p/25071913 所介绍的,我们要构建一个判别器 D,使得 D 的参数不超过某个固定的常数,最后一层是非线性层,并且使式子:
达到最大,那么 L 就可以作为我们的 Wasserstein 距离,生成器的目标是最小化这个距离,去掉第一项与生成器无关的项,得到我们生成器的损失函数。我们可以把上式加个负号,作为 D 的损失函数,其中加负号后的第一项,是 d_loss_real,加负号后的第二项,是 d_loss_fake。
下面开始码代码:
为了方便,我们直接在上一节我们的 none_cond_DCGAN.py 文件中修改相应的代码:
在开头的宏定义中加入:
CLIP = [-0.01, 0.01]
CRITIC_NUM = 5
如图:
注释掉原来 discriminator 的 return,重新输入一个 return 如下:
在 train 函数里面,修改如下地方:
在循环里面,要改如下地方,这里稍微做一下说明,idx < 25 时 D 循环更新 25 次才会更新 G,用来保证 D 的网络大致满足 Wasserstein 距离,这是一个小小的 trick。
改完之后点击运行进行训练,WGAN 收敛速度很快,大约一千多次迭代的时候,生成网络生成的图像已经很像了,最后生成的图像如下,可以看到,图像还是有些噪点和坏点的。
最后的最后,贴一张网络的 Graph:
参考文献:
1. https://github.com/carpedm20/DCGAN-tensorflow
2. https://github.com/tensorflow/tensorflow/blob/b826b79718e3e93148c3545e7aa3f90891744cc0/tensorflow/contrib/layers/python/layers/layers.py#L100
3. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/how_tos/reading_data/convert_to_records.py
4. https://zhuanlan.zhihu.com/p/25071913
5.https://www.cnblogs.com/Charles-Wan/p/6501945.html