这个博客主要是总结下模型的搭建细节,完成这两模型花了三四天吧,DCGAN的参数一直调不好。
先简单说说DCGAN的原理吧,主要原理还是GAN,一个生成网络和一个判别网络。和基础的GAN相比,区别就在于DCGAN无论是生成网络还是判别网络都采用了卷积。
一开始,生成网络和判别网络都采用较深的卷积(都用了10层吧),什么LeakyRelu都用上了,效果一直很差,后来感觉,可能mnist数据图片小,内容比较简单,GAN又难拟合,我就简单的弄了2,3层,把激活函数直接采用tanh,而且Batch_normal改成了Dropout(区别挺大的,不知道为何),效果得到了改善
生成网络:
def G_Net(input, reuse=False, Training=True):
with tf.variable_scope('G_Net', reuse=reuse):
with tf.variable_scope('dense'):
net = tf.layers.dense(input, 8*8*256, activation=tf.nn.tanh)
# net = tf.layers.batch_normalization(net, training=Training)
net = tf.nn.dropout(net, 0.5)
net = tf.reshape(net, shape=(-1, 8, 8, 256))
with tf.variable_scope('deconv_1'):
net = slim.conv2d_transpose(net, num_outputs=128, kernel_size=[5, 5], stride=2, activation_fn=None,
weights_initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.02))
net = tf.layers.batch_normalization(net, training=Training)
net = tf.nn.tanh(net)
# with tf.variable_scope('conv_1'):
# net = slim.conv2d(net, num_outputs=128, kernel_size=[5, 5], stride=1, activation_fn=None,
# weights_initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.02))
# net = tf.layers.batch_normalization(net, training=Training)
# net = tf.nn.relu(net)
with tf.variable_scope('deconv_2'):
net = slim.conv2d_transpose(net, num_outputs=1, kernel_size=[5, 5], stride=2, activation_fn=None,
weights_initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.02))
# net = tf.layers.batch_normalization(net, training=Training)
net = tf.nn.tanh(net)
判别网络:
def D_Net(input, reuse=False, Training=True):
with tf.variable_scope('D_Net', reuse=reuse):
with tf.variable_scope('conv_1'):
net = slim.conv2d(input, kernel_size=[5, 5], num_outputs=64, stride=1, activation_fn=None,
weights_initializer=tf.truncated_normal_initializer(mean=0, stddev=0.02))
# net = tf.layers.batch_normalization(net, training=Training)
net = tf.nn.tanh(net)
net = slim.max_pool2d(net, kernel_size=[2, 2], stride=2, padding='SAME')
with tf.variable_scope('conv_2'):
net = slim.conv2d(net, kernel_size=[5, 5], num_outputs=128, stride=1, activation_fn=None,
weights_initializer=tf.truncated_normal_initializer(mean=0, stddev=0.02))
# net = tf.layers.batch_normalization(net, training=Training)
net = tf.nn.tanh(net)
net = slim.max_pool2d(net, kernel_size=[2, 2], stride=2, padding='SAME')
# with tf.variable_scope('conv_3'):
# net = slim.conv2d(net, kernel_size=[3, 3], num_outputs=16, stride=2, activation_fn=None,
# weights_initializer=tf.truncated_normal_initializer(mean=0, stddev=0.02))
# net = tf.layers.batch_normalization(net, training=Training)
# net = LeakyRelu(net)
with tf.variable_scope('dense'):
net = slim.flatten(net)
net = tf.nn.dropout(net, 0.5)
net = tf.layers.dense(net, 1024, activation=tf.nn.relu)
net = tf.nn.dropout(net, 0.5)
# net = tf.layers.batch_normalization(net, training=Training)
net = tf.layers.dense(net, 1, activation=None)
p = tf.nn.sigmoid(net)
return p
效果:
效果看着还行,却出现了过拟合的现象,基本都是7和9,分析了下原因,mnist数据确实太简单了28*28,简单归简单,数字间差别还挺大的,网络简单吧,数字生成不像,网络复杂吧,又过拟合了,所以引入条件对于mnist的GAN提升应该很大。
============================================分割线================================================
cDCGAN又称为条件GAN,原理也很简单,在中间层concat上label,相当于加入条件。原本的GAN可以看成噪声到目标的映射关系,比如G(noise) = fake_img, D(img) = p_of_real, 加入条件后变为G(noise|y=label) = (fake_img | y=label), D(img|label) = p_of_real | y = label,公式总结的不大好,可以参考网上对于条件GAN的解读:http://baijiahao.baidu.com/s?id=1602483050633464000&wfr=spider&for=pc
由于label是2维的(包括Batch维度),在卷积部分添加条件时,需要先reshape到(batch_size, 1, 1, num_of_class),注意,num_of_class需要多一个fake类。reshape完还得根据当前层的size乘以一个相同size全为1的矩阵再concat,DCGAN的官方实现中有卷积过程concat的实现:
def conv_cond_concat(x, y):
"""Concatenate conditioning vector on feature map axis."""
x_shapes = x.get_shape()
y_shapes = y.get_shape()
return tf.concat([
x, y*tf.ones([x_shapes[0], x_shapes[1], x_shapes[2], y_shapes[3]])], 3)
然后是G_net:
def G_Net(input, label, reuse=False, Training=True):
with tf.variable_scope('G_Net', reuse=reuse):
with tf.variable_scope('dense'):
net = tf.concat([input, label], axis=1)
net = tf.layers.dense(net, 8*8*256, activation=tf.nn.tanh)
# net = tf.layers.batch_normalization(net, training=Training)
net = tf.nn.dropout(net, 0.5)
net = tf.reshape(net, shape=(-1, 8, 8, 256))
with tf.variable_scope('deconv_1'):
net = slim.conv2d_transpose(net, num_outputs=128, kernel_size=[5, 5], stride=2, activation_fn=None,
weights_initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.02))
net = tf.layers.batch_normalization(net, training=Training)
net = tf.nn.tanh(net)
G_net只在全连接出来的地方concat了
然后是D_net:
def D_Net(input, label, reuse=False, Training=True):
batch_size = label.get_shape()[0]
label = tf.reshape(label, shape=(batch_size, 1, 1, 11))
with tf.variable_scope('D_Net', reuse=reuse):
with tf.variable_scope('conv_1'):
net = conv_cond_concat(input, label)
net = slim.conv2d(net, kernel_size=[5, 5], num_outputs=64, stride=1, activation_fn=None,
weights_initializer=tf.truncated_normal_initializer(mean=0, stddev=0.02))
# net = tf.layers.batch_normalization(net, training=Training)
net = tf.nn.tanh(net)
net = slim.max_pool2d(net, kernel_size=[2, 2], stride=2, padding='SAME')
with tf.variable_scope('conv_2'):
net = conv_cond_concat(net, label)
net = slim.conv2d(net, kernel_size=[5, 5], num_outputs=128, stride=1, activation_fn=None,
weights_initializer=tf.truncated_normal_initializer(mean=0, stddev=0.02))
# net = tf.layers.batch_normalization(net, training=Training)
net = tf.nn.tanh(net)
net = slim.max_pool2d(net, kernel_size=[2, 2], stride=2, padding='SAME')
# with tf.variable_scope('conv_3'):
# net = slim.conv2d(net, kernel_size=[3, 3], num_outputs=16, stride=2, activation_fn=None,
# weights_initializer=tf.truncated_normal_initializer(mean=0, stddev=0.02))
# net = tf.layers.batch_normalization(net, training=Training)
# net = LeakyRelu(net)
with tf.variable_scope('dense'):
net = slim.flatten(net)
net = tf.nn.dropout(net, 0.5)
net = tf.layers.dense(net, 1024, activation=tf.nn.relu)
net = tf.nn.dropout(net, 0.5)
# net = tf.layers.batch_normalization(net, training=Training)
net = tf.layers.dense(net, 11, activation=None)
return net
由于不再只是简单的真假判断了,所以返回num_of_class的长度出来做softmax
损失函数如下:
loss_G = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=label, logits=logit_fake))
loss_D = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=label, logits=logit_real) + \
tf.nn.softmax_cross_entropy_with_logits(labels=fake_label, logits=logit_fake))
fake_label是标记全为10的label,即全是fake类
生成效果就很棒了:
本人小白,分析很多都是个人理解,有错还望大佬们指出。