DCGAN 源码分析（一）

最新推荐文章于 2025-04-07 21:57:44 发布

nongfu_spring

最新推荐文章于 2025-04-07 21:57:44 发布

阅读量2.5w

点赞数 6

分类专栏：深度学习

本文链接：https://blog.csdn.net/nongfu_spring/article/details/54342861

版权

深度学习专栏收录该内容

2 篇文章

订阅专栏

DCGAN

https://github.com/carpedm20/DCGAN-tensorflow

DCGAN的原理和GAN是一样的，它只是把上述的G和D换成了两个卷积神经网络（CNN）。但不是直接换就可以了，DCGAN对卷积神经网络的结构做了一些改变，以提高样本的质量和收敛的速度，这些改变如下：

取消所有pooling层。G网络中使用反卷积（Deconvolutional layer）进行上采样，D网络中用加入stride的卷积代替pooling。
D和G中均使用batch normalization
去掉FC层，使网络变为全卷积网络
G网络中使用ReLU作为激活函数，最后一层使用tanh
D网络中使用LeakyReLU作为激活函数，最后一层使用softmax

DCGANs的基本架构就是使用几层“反卷积”（Deconvolution）网络。“反卷积”类似于一种反向卷积，这跟用反向传播算法训练监督的卷积神经网络（CNN）是类似的操作。

DCGAN-tensorflow核心是model.py
model.py定义了生成器和判别器，其中生成器使用deconv2d,判别器使用conv2d

discriminator的实现

def discriminator(self, image, y=None, reuse=False):
    with tf.variable_scope("discriminator") as scope:
        if reuse:
            scope.reuse_variables()

        if not self.y_dim:
            h0 = lrelu(conv2d(image, self.df_dim, name='d_h0_conv'))
            h1 = lrelu(self.d_bn1(conv2d(h0, self.df_dim*2, name='d_h1_conv')))
            h2 = lrelu(self.d_bn2(conv2d(h1, self.df_dim*4, name='d_h2_conv')))
            h3 = lrelu(self.d_bn3(conv2d(h2, self.df_dim*8, name='d_h3_conv')))
            h4 = linear(tf.reshape(h3, [self.batch_size, -1]), 1, 'd_h3_lin')

            return tf.nn.sigmoid(h4), h4
        else:
            yb = tf.reshape(y, [self.batch_size, 1, 1, self.y_dim])
            x = conv_cond_concat(image, yb)

            h0 = lrelu(conv2d(x, self.c_dim + self.y_dim, name='d_h0_conv'))
            h0 = conv_cond_concat(h0, yb)

            h1 = lrelu(self.d_bn1(conv2d(h0, self.df_dim + self.y_dim, name='d_h1_conv')))
            h1 = tf.reshape(h1, [self.batch_size, -1])            
            h1 = tf.concat(1, [h1, y])

            h2 = lrelu(self.d_bn2(linear(h1, self.dfc_dim, 'd_h2_lin')))
            h2 = tf.concat(1, [h2, y])

            h3 = linear(h2, 1, 'd_h3_lin')

            return tf.nn.sigmoid(h3), h3

这里conv2d是通过调用tensorflown的conv2d实现和权重相乘，再用bias_add实现偏置项相加。

def conv2d(input_, output_dim, 
       k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02,
       name="conv2d"):
    with tf.variable_scope(name):
        w = tf.get_variable('w', [k_h, k_w, input_.get_shape()[-1], output_dim],
                            initializer=tf.truncated_normal_initializer(stddev=stddev))
        conv = tf.nn.conv2d(input_, w, strides=[1, d_h, d_w, 1], padding='SAME')

        biases = tf.get_variable('biases', [output_dim], initializer=tf.constant_initializer(0.0))
        conv = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())

        return conv

【注意】

1.conv2d函数中权重w的生成如下：

w = tf.get_variable('w', [k_h, k_w, input_.get_shape()[-1], output_dim],initializer=tf.truncated_normal_initializer(stddev=stddev))

第二个参数依次为卷积核的高，宽，输入的特征图个数，输出的特征图个数。输出的特征图个数，即卷积核的个数。

2.tf.nn.conv2d的使用如下：

conv = tf.nn.conv2d(input_, w, strides=[1, d_h, d_w, 1], padding='SAME')

第一个参数是输入，即上一层的结果，

第二个参数是1生成的权重，注意这里权重w的尺寸，

第三个参数卷积核的移动步长，[1, d_h, d_w, 1]，其中第一个对应一次跳过batch中的多少图片，第二个d_h对应一次跳过图片中多少行，第三个d_w对应一次跳过图片中多少列，第四个对应一次跳过图像的多少个通道。这里直接设置为[1，2，2，1]。即每次卷积后，图像的滑动步长为2，特征图会缩小为原来的1/4。

3.conv2d中输出的特征图个数，是个1维的参数，即output_dim，output_dim是 conv2d函数的第二个入参，由外部传入。

比如，下面的这句话，表示h1是输入，通过卷积之后，输出的特征图个数为gf_dim* *4，这里gf_dim = 128，则输出特征图为128*4=512个。即这里一共有512个卷积核。

 h2 = lrelu(self.d_bn2(conv2d(h1, self.df_dim*4, name='d_h2_conv')))

generator的实现：

def generator(self, z, y=None):
    with tf.variable_scope("generator") as scope:
        if not self.y_dim:
            #s是输出图片的大小，比如s是64，s2为32，s4为16，s8为8,s16为4
            s = self.output_size
            s2, s4, s8, s16 = int(s/2), int(s/4), int(s/8), int(s/16)

            # project `z` and reshape
            self.z_, self.h0_w, self.h0_b = linear(z, self.gf_dim*8*s16*s16, 'g_h0_lin', with_w=True)

            self.h0 = tf.reshape(self.z_, [-1, s16, s16, self.gf_dim * 8])
            h0 = tf.nn.relu(self.g_bn0(self.h0))

            self.h1, self.h1_w, self.h1_b = deconv2d(h0, 
                [self.batch_size, s8, s8, self.gf_dim*4], name='g_h1', with_w=True)
            h1 = tf.nn.relu(self.g_bn1(self.h1))

            h2, self.h2_w, self.h2_b = deconv2d(h1,
                [self.batch_size, s4, s4, self.gf_dim*2], name='g_h2', with_w=True)
            h2 = tf.nn.relu(self.g_bn2(h2))

            h3, self.h3_w, self.h3_b = deconv2d(h2,
                [self.batch_size, s2, s2, self.gf_dim*1], name='g_h3', with_w=True)
            h3 = tf.nn.relu(self.g_bn3(h3))

            h4, self.h4_w, self.h4_b = deconv2d(h3,
                [self.batch_size, s, s, self.c_dim], name='g_h4', with_w=True)

            return tf.nn.tanh(h4)
        else:
            s = self.output_size
            s2, s4 = int(s/2), int(s/4) 

            # yb = tf.expand_dims(tf.expand_dims(y, 1),2)
            yb = tf.reshape(y, [self.batch_size, 1, 1, self.y_dim])
            z = tf.concat(1, [z, y])

            h0 = tf.nn.relu(self.g_bn0(linear(z, self.gfc_dim, 'g_h0_lin')))
            h0 = tf.concat(1, [h0, y])

            h1 = tf.nn.relu(self.g_bn1(linear(h0, self.gf_dim*2*s4*s4, 'g_h1_lin')))
            h1 = tf.reshape(h1, [self.batch_size, s4, s4, self.gf_dim * 2])

            h1 = conv_cond_concat(h1, yb)

            h2 = tf.nn.relu(self.g_bn2(deconv2d(h1, [self.batch_size, s2, s2, self.gf_dim * 2], name='g_h2')))
            h2 = conv_cond_concat(h2, yb)

            return tf.nn.sigmoid(deconv2d(h2, [self.batch_size, s, s, self.c_dim], name='g_h3'))

这里deconv2d是通过调用tensorflown的conv2d_transpose实现和权重相乘，再用bias_add实现偏置项相加。

def deconv2d(input_, output_shape,
         k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02,
         name="deconv2d", with_w=False):
    with tf.variable_scope(name):
        # filter : [height, width, output_channels, in_channels]
        w = tf.get_variable('w', [k_h, k_w, output_shape[-1], input_.get_shape()[-1]],
                            initializer=tf.random_normal_initializer(stddev=stddev))

        try:
            deconv = tf.nn.conv2d_transpose(input_, w, output_shape=output_shape,
                                strides=[1, d_h, d_w, 1])

        # Support for verisons of TensorFlow before 0.7.0
        except AttributeError:
            deconv = tf.nn.deconv2d(input_, w, output_shape=output_shape,
                                strides=[1, d_h, d_w, 1])

        biases = tf.get_variable('biases', [output_shape[-1]], initializer=tf.constant_initializer(0.0))
        deconv = tf.reshape(tf.nn.bias_add(deconv, biases), deconv.get_shape())

        if with_w:
            return deconv, w, biases
        else:
            return deconv

deconv2d和conv2d都在ops.py中。

【注意】

1.deconv2d函数中权重w的生成如下,

w = tf.get_variable('w', [k_h, k_w, output_shape[-1], input_.get_shape()[-1]], initializer=tf.random_normal_initializer(stddev=stddev))

第二个参数依次为卷积核的高，宽，输出的特征图个数，输入的特征图个数。
这里output_shape[-1]只取了output_shape的最后一位，即output_shape[-1]是输出特征图的个数。output_shape[0]是batch_size，output_shape[1]是输出特征图的高，output_shape[2]是输出特征图的宽。

这里w的生成和conv2d函数中权重w的生成的第二个参数中输出的特征图个数，输入的特征图个数的顺序是相反的。

conv2d函数中权重w的生成如下，第二个参数依次为卷积核的高，宽，输入的特征图个数，输出的特征图个数。输出的特征图个数，即卷积核的个数。

w = tf.get_variable('w', [k_h, k_w, input_.get_shape()[-1], output_dim],initializer=tf.truncated_normal_initializer(stddev=stddev))

2.tf.nn.conv2d_transpose的使用如下：

deconv = tf.nn.conv2d_transpose(input_, w, output_shape=output_shape,
                                strides=[1, d_h, d_w, 1])

第一个参数是输入，即上一层的结果，

第二个参数是输出输出的特征图维数，是个4维的参数，

第三个参数卷积核的移动步长，[1, d_h, d_w, 1]，其中第一个对应一次跳过batch中的多少图片，第二个d_h对应一次跳过图片中多少行，第三个d_w对应一次跳过图片中多少列，第四个对应一次跳过图像的多少个通道。这里直接设置为[1，2，2，1]。即每次反卷积后，图像的滑动步长为2，特征图会扩大缩小为原来2*2=4倍。

3.deconv2d中输出的特征图维数，是个4维的参数，即output_shape，output_shape是 deconv2d函数的第二个入参，由外部传入。

比如，下面的这句话，表示h1是输入，通过反卷积之后，输出的维数为[self.batch_size, s4, s4, self.gf_dim*2]。这里batch_size是设置的批尺寸大小，如64，S4是输出图像的尺寸S/4,比如最终输出图像是64* * 64，则该层输出的特征图大小为16 * 16，特征图个数是gf__dim* 2，这里gf_dim = 128，则输出特征图为128 *2=256个。

deconv2d(h1, [self.batch_size, s4, s4, self.gf_dim*2], name='g_h2', with_w=True)