Keras:DCGAN实践试坑记

 

前言吐槽:

    上周微机终于考完了,可以愉快地学习GAN一GAN了。根据GAN的综述文章所说的来看,最适合入手的肯定还是DCGAN(别问我为啥不试一试原版GAN,看着就不好调……)

    说实话,这几天是真的调试得我欲仙欲死,BN层有巨坑,学习率有巨坑,甚至网络结构也有坑……各种尝试、模式崩塌、峰回路转,直到昨天才基本吃明白。我这几天真的是进一步的明白了一个好显卡的重要性。感谢老爹的生日礼物!(二手1080Ti~)

 

特别鸣谢:

《卷积神经网络:原理与实践》——图书馆借到的,连着看了一个月,直接照亮了我的深度学习之路,让我有了个“广博”的基础

《Deep Learning with Python》(Manning)——里面的内容很务实,为我直接提供了可运行的Keras代码,是本次的调试基础

 

实验环境:

Anaconda3 5.1.0 + Keras-gpu 2.2.4(Tensorflow后端)  + PyCharm + CIFAR10

 

关于DCGAN:

论文网址在这里:https://arxiv.org/abs/1511.06434

原文所述的训练参数和结构图:

No pre-processing was applied to training images besides scaling to the range of the tanh activation function [-1, 1]. All models were trained with mini-batch stochastic gradient descent (SGD) with a mini-batch size of 128. All weights were initialized from a zero-centered Normal distribution with standard deviation 0.02. In the LeakyReLU, the slope of the leak was set to 0.2 in all models. While previous GAN work has used momentum to accelerate training, we used the Adam optimizer (Kingma & Ba, 2014) with tuned hyperparameters. We found the suggested learning rate of 0.001, to be too high, using 0.0002 instead. Additionally, we found leaving the momentum term β1 at the suggested value of 0.9 resulted in training oscillation and instability while reducing it to 0.5 helped stabilize training

遇到的问题:

    首先就是网络结构的问题,Deep Learning with Python中的DCGAN和这个结构,完!全!不!一!样!

    说出来你可能不信,我觉得下面这个模型训练的时候远比原文这个结构要稳定,基本500次迭代就有型了,而原文要5000次。顺带一提,《Deep Learning with Python》是作者是Keras之父,所以他的改动还是很有参考价值的。

def Gnet():
    generator_input = keras.Input(shape=(latent_dim,))
    x = layers.Dense(128 * 16 * 16)(generator_input)
    x = layers.LeakyReLU()(x)
    x = layers.Reshape((16, 16, 128))(x)
    x = layers.Conv2D(256, 5, padding='same')(x)
    x = layers.LeakyReLU()(x)
    x = layers.Conv2DTranspose(256, 4, strides=2, padding='same')(x)
    x = layers.LeakyReLU()(x)
    x = layers.Conv2D(256, 5, padding='same')(x)
    x = layers.LeakyReLU()(x)
    x = layers.Conv2D(256, 5, padding='same')(x)
    x = layers.LeakyReLU()(x)
    x = layers.Conv2D(channels, 7, activation='tanh', padding='same')(x)
    generator = keras.models.Model(generator_input, x)
    generator.summary()
    return generator


def Dnet():
    discriminator_input = layers.Input(shape=(height, width, channels))
    x = layers.Conv2D(64, 3)(discriminator_input)
    x = layers.LeakyReLU()(x)
    x = layers.Conv2D(128, 4, strides=2)(x)
    x = layers.LeakyReLU()(x)
    x = layers.Conv2D(256, 4, strides=2)(x)
    x = layers.LeakyReLU()(x)
    x = layers.Conv2D(512, 4, strides=2)(x)
    x = layers.LeakyReLU()(x)
    x = layers.Flatten()(x)
    x = layers.Dropout(0.4)(x)
    x = layers.Dense(1, activation='sigmoid')(x)
    discriminator = keras.models.Model(discriminator_input, x)
    discriminator.summary()
    discriminator_optimizer = keras.optimizers.RMSprop(
        lr=0.0008,
        clipvalue=1.0,
        decay=1e-8)
    discriminator.compile(optimizer=discriminator_optimizer,
                          loss='binary_crossentropy')
    return discriminator


# G和D的网络结构
G = Gnet()
D = Dnet()
# G和D的训练方法
D.trainable = False
GAN_input = keras.Input(shape=(latent_dim,))
GAN_output = D(G(GAN_input))
GAN = keras.models.Model(GAN_input, GAN_output)
GAN_optimizer = keras.optimizers.RMSprop(lr=0.0004, clipvalue=1.0, decay=1e-8)
GAN.compile(optimizer=GAN_optimizer, loss='binary_crossentropy')

上文和原文有几个明显的区别:

1、在optimizer上:原文用的是Adam,而这段代码用的是RMSprop;本文学习率偏大,G网络的学习率和D网络还不一样!出人意料的是,D网络还大一些(这对于新人而言其实是有些耐人寻味的,大家也可以自己好好想一想)。

2、在网络结构上:原文的通道数是递进的,而这段代码的G网络并不是;原文没有Dropout,而这段代码的Dropout十分关键,直接关系到训练的效果(主要是输出层的那个)!原文的BN层在这段代码里全都舍弃了!

3、在tricks上:这段代码的完整版基本集成了后来人们发现改善GAN训练问题的多数方法。比如平滑标签(0.95/0.05)

(下图是对比图,32*32太小,为了方便观察我输出的时候resize成64*64的了)

GAN生成的马(30000次迭代)
CIFAR10中的马

我在尝试根据这个代码改成原论文模型的路上简直是一路模式崩塌过来的……只能说,Keras之父的一手祖传老参数真香

具体过程我就不赘述了,直接说我发现的问题和对应的理解:

下面是我修改后最接近论文的可行模型:

def Conv_Down(x, kernel_size, channel, name='Conv_Down?_'):
    # x = layers.BatchNormalization(axis=-1, name=name + 'BN')(x)
    x = layers.Conv2D(channel, kernel_size, strides=2, padding='same', name=name+'Conv1',
                      kernel_initializer=keras.initializers.RandomNormal(stddev=0.02))(x)
    x = layers.LeakyReLU(0.2, name=name+'LeakyReLU1')(x)
    # x = layers.Dropout(0.2)(x)
    # x = layers.Conv2D(channel, kernel_size, padding='same', name=name + 'Conv2',
    #                   kernel_initializer=keras.initializers.RandomNormal(stddev=0.02))(x)
    # x = layers.LeakyReLU(0.2, name=name + 'LeakyReLU2')(x)
    return x


def Conv_Up(x, kernel_size, channel, name='Conv_Down?_'):
    x = layers.Conv2DTranspose(channel, kernel_size, strides=2, padding='same', name=name+'ConvT',
                      kernel_initializer=keras.initializers.RandomNormal(stddev=0.02))(x)
    x = layers.BatchNormalization(axis=-1, name=name+'BN')(x)
    # x = layers.LeakyReLU(0.2, name=name + 'LeakyReLU1')(x)
    x = layers.ReLU(name=name + 'ReLU')(x)
    return x


def G_net():
    generator_input = keras.Input(shape=(latent_dim,), name='G_input')
    x = layers.Dense(512 * 4 * 4, name='G_First_Dense',
                     kernel_initializer=keras.initializers.RandomNormal(stddev=0.02))(generator_input)
    x = layers.Reshape((4, 4, 512))(x)
    x = layers.BatchNormalization(name='G_BN0')(x)
    x = layers.ReLU(name='G_ReLU')(x)
    x = Conv_Up(x, kernel_size=5, channel=256, name='Conv_Up1_')
    # x = layers.BatchNormalization(name='G_BN1')(x)
    x = Conv_Up(x, kernel_size=5, channel=128, name='Conv_Up2_')
    # x = layers.BatchNormalization(name='G_BN2')(x)
    # x = Conv_Up(x, kernel_size=3, channel=128, name='Conv_Up3_')
    x = layers.Conv2DTranspose(channels, 5, activation='tanh', strides=2, padding='same', name='G_output',
                               kernel_initializer=keras.initializers.RandomNormal(stddev=0.02))(x)    # 输出层
    generator = keras.models.Model(generator_input, x, name='G')
    generator.summary()
    return generator


def D_net():
    discriminator_input = layers.Input(shape=(height, width, channels), name='D_input')
    # x = layers.Conv2D(64, 3, padding='same', name='D_First_Conv')(discriminator_input)
    # # x = layers.BatchNormalization(name='D_First_BN')(x)
    # x = layers.LeakyReLU()(x)
    x = Conv_Down(discriminator_input, kernel_size=5, channel=128, name='Conv_Down1_')
    x = layers.Dropout(0.3)(x)
    x = Conv_Down(x, kernel_size=5, channel=256, name='Conv_Down2_')
    x = layers.Dropout(0.3)(x)
    x = Conv_Down(x, kernel_size=5, channel=512, name='Conv_Down3_')
    x = layers.Flatten()(x)
    x = layers.Dropout(0.3)(x)
    x = layers.Dense(1, activation='sigmoid', name='D_output')(x)
    discriminator = keras.models.Model(discriminator_input, x, name='D')
    discriminator.summary()
    # discriminator_optimizer = keras.optimizers.RMSprop(lr=0.0001, clipvalue=1.0, decay=1e-8)
    discriminator_optimizer = keras.optimizers.Adam(lr=0.0002, beta_1=0.5)
    discriminator.compile(optimizer=discriminator_optimizer,
                          loss='binary_crossentropy')
    return discriminator


# G和D的网络结构
G = G_net()
D = D_net()
# G.load_weights(save_dir+'/G_3.5w.h5', by_name=True)
# D.load_weights(save_dir+'/D_3.5w.h5', by_name=True)
# G, D, WGAN = WGAN_net()
# G和D的训练方法
D.trainable = False
WGAN_input = keras.Input(shape=(latent_dim,))
WGAN_output = D(G(WGAN_input))
WGAN = keras.models.Model(WGAN_input, WGAN_output)
WGAN.summary()
# WGAN_optimizer = keras.optimizers.RMSprop(lr=0.0002, clipvalue=1.0, decay=1e-8)
WGAN_optimizer = keras.optimizers.Adam(lr=0.0004, beta_1=0.5)
WGAN.compile(optimizer=WGAN_optimizer, loss='binary_crossentropy')

 1、optimizer的问题:

优化器这玩意要测试太耗时间了,我没测试太多。鉴于WGAN中说最好不要带动量的优化器Adam,WGAN-GP中又说Adam比RMSprop收敛的快(真香),所以应该是随意的。

主要是学习率大家需要注意一下,0.0002附近差不多是一个通用的学习率(附近的意思就是上下乘除400%),这个得自己试,参数一样不一样看需求。如果你的D网络被压制了导致G网络没得学,那么请提高D的能力;如果你的D网络太NB了导致G网络绝望地瞎生成,请提高G的能力。【……貌似上述两个原因都可能导致模式崩塌……嗯,这叫理论基础,不是在打太极!】

每个网络都有其最合适的学习率,要是追求效果的话(都是被逼的……要么都不生成),请自行慢慢调参。

2、关于网络结构的问题:

据说这种逐层变化的网络有助于结构的稳定……可能是咱规模太小,并没有看出来,不过人家NVIDIA的超分辨率都是逐层来的,想必是有道理的。

我曾经试过把5*5卷积核换成两个3*3,发现效果并不理想——生成图像更慢了,收敛的更慢了,到后期或许会有些差别,但是时好时坏我不都看不出来,谁知道是不是因为初始化的时候随机数的问题?我还试了试7*7,发现容易崩,感觉应该是外衬的问题

卷积层方面,带步长卷积最好就单纯点,后面堆通道堆深度貌似没太大用,只能让训练更加困难(慢,稳不稳定不确定,最终的区别肉眼不可见)【举个例子,Conv2DTranspose后面就别Conv2D了,赶紧激活赶紧下一个Conv2DTranspose】

激活函数方面没什么好说的,按照论文来LeakyReLU(0.2)就挺好,生成模型那里,ReLU可能会让你的图片更加尖锐一些?(这个我没有实验到很多轮,迭代了2000次后感觉图像比原来要尖锐一些,可能只是随机现象,速度差别也不大)

BN层!巨坑!我三天有两天是在纠结这玩意!一加就崩,加哪儿哪儿崩!
现在想来,应该是BN层和Keras之父版代码相性不合,里面的参数和结构不对应,后来我基本照搬原版后终于能运行了,但是效果不好,贼慢,以至于我一度以为它一直在模式崩塌出不来了……原本500次见影,5000次出货,现在2000次才能出影
BN层的位置很有讲究,我见过放到层和层中间的(激活函数后),这个好调,基本不影响网络(不过可能也没啥效果)。后来我看了看WGAN的Keras实现,发现里面的网络结构部分,BN层确实在relu之前,所以我痛定思痛,疯狂调参,终于能在2000次后见到模模糊糊的图像了,但是它运行了5000次还是模模糊糊的图像……就跟油画似得(是时候研究一波风格迁移了
后来我发现一个重要的结论——除了G结尾和D开头不适合+BN层之外,G的第一个BN层要加到Reshape之后而不是之前!!

说起来,这个和BN层的原理有关系,如果在Reshape前,它相当于对MLP神经元做归一化,而在Reshape后,其实是对卷积的神经元做归一化,归一化用的参数对这个4*4的神经元是共享的!!!所以他们不一样!(当然,具体为什么,作为一个工科的懒人,我怎么会去深究其数学原理……)

3、关于tricks的问题:

不得不说,这世界上的小机灵鬼真多,都是点子王,这些技巧一个比一个有意思:有健忘的(Dropout),有卖假答案的(忘了,偶然看到的),有辩证评判对错的(label_smoothing平滑标签),还有拉偏架的(多训练几次D网络),真的是大开眼界。有兴趣的同学可以去了解一下(忘记网址了,好像专门有一个论文写得这个,有二十多条)

结语:

啊……这是我第一次在CSDN写博客(写出来也是为了方便学习小组的例会交流),还希望大家伙喜欢~

本人只是大三学生,智能科学与技术专业(然而本学院至今一直在教自动化……画个圈圈诅&^*&#@%^),这学期刚接触深度学习,技术有限,之前只做过图像识别分类的项目,最高成就不过是和同学复现了CheXNet,正确率虽然高,但效果还很魔幻(简直江湖郎中网络),GAN系列的方向是我计划中发展的主要方向,不过也确实只是初学,很多地方还是似懂非懂。如果我有理解不正确的地方,还望海涵并帮忙指正,谢谢!

下附能实现的DCGAN的所有的代码(格式略难看……海涵):

# By hansen97 @USTB-ML
# 2018.12.27

import keras
from keras import layers
from keras.backend import tensorflow_backend as K
import numpy as np
import os
import PIL.Image as Image
from keras.preprocessing import image
latent_dim = 100
height = 32
width = 32
channels = 3
col = 5
row = 5
image_size = 32
save_dir = './WGAN_Car'


def wasserstein(y_true, y_pred):
    return K.mean(y_true * y_pred)


def Conv_Down(x, kernel_size, channel, name='Conv_Down?_'):
    # x = layers.BatchNormalization(axis=-1, name=name + 'BN')(x)
    x = layers.Conv2D(channel, kernel_size, strides=2, padding='same', name=name+'Conv1',
                      kernel_initializer=keras.initializers.RandomNormal(stddev=0.02))(x)
    x = layers.LeakyReLU(0.2, name=name+'LeakyReLU1')(x)
    # x = layers.Dropout(0.2)(x)
    # x = layers.Conv2D(channel, kernel_size, padding='same', name=name + 'Conv2',
    #                   kernel_initializer=keras.initializers.RandomNormal(stddev=0.02))(x)
    # x = layers.LeakyReLU(0.2, name=name + 'LeakyReLU2')(x)
    return x


def Conv_Up(x, kernel_size, channel, name='Conv_Down?_'):
    x = layers.Conv2DTranspose(channel, kernel_size, strides=2, padding='same', name=name+'ConvT',
                      kernel_initializer=keras.initializers.RandomNormal(stddev=0.02))(x)
    x = layers.BatchNormalization(axis=-1, name=name+'BN')(x)
    # x = layers.LeakyReLU(0.2, name=name + 'LeakyReLU1')(x)
    x = layers.ReLU(name=name + 'ReLU')(x)
    return x


def G_net():
    generator_input = keras.Input(shape=(latent_dim,), name='G_input')
    x = layers.Dense(512 * 4 * 4, name='G_First_Dense',
                     kernel_initializer=keras.initializers.RandomNormal(stddev=0.02))(generator_input)
    x = layers.Reshape((4, 4, 512))(x)
    x = layers.BatchNormalization(name='G_BN0')(x)
    x = layers.ReLU(name='G_ReLU')(x)
    x = Conv_Up(x, kernel_size=5, channel=256, name='Conv_Up1_')
    # x = layers.BatchNormalization(name='G_BN1')(x)
    x = Conv_Up(x, kernel_size=5, channel=128, name='Conv_Up2_')
    # x = layers.BatchNormalization(name='G_BN2')(x)
    # x = Conv_Up(x, kernel_size=3, channel=128, name='Conv_Up3_')
    x = layers.Conv2DTranspose(channels, 5, activation='tanh', strides=2, padding='same', name='G_output',
                               kernel_initializer=keras.initializers.RandomNormal(stddev=0.02))(x)    # 输出层
    generator = keras.models.Model(generator_input, x, name='G')
    generator.summary()
    return generator


def D_net():
    discriminator_input = layers.Input(shape=(height, width, channels), name='D_input')
    # x = layers.Conv2D(64, 3, padding='same', name='D_First_Conv')(discriminator_input)
    # # x = layers.BatchNormalization(name='D_First_BN')(x)
    # x = layers.LeakyReLU()(x)
    x = Conv_Down(discriminator_input, kernel_size=5, channel=128, name='Conv_Down1_')
    x = layers.Dropout(0.3)(x)
    x = Conv_Down(x, kernel_size=5, channel=256, name='Conv_Down2_')
    x = layers.Dropout(0.3)(x)
    x = Conv_Down(x, kernel_size=5, channel=512, name='Conv_Down3_')
    x = layers.Flatten()(x)
    x = layers.Dropout(0.3)(x)
    x = layers.Dense(1, activation='sigmoid', name='D_output')(x)
    discriminator = keras.models.Model(discriminator_input, x, name='D')
    discriminator.summary()
    # discriminator_optimizer = keras.optimizers.RMSprop(lr=0.0001, clipvalue=1.0, decay=1e-8)
    discriminator_optimizer = keras.optimizers.Adam(lr=0.0002, beta_1=0.5)
    discriminator.compile(optimizer=discriminator_optimizer,
                          loss='binary_crossentropy')
    return discriminator


# G和D的网络结构
G = G_net()
D = D_net()
# G.load_weights(save_dir+'/G_3.5w.h5', by_name=True)
# D.load_weights(save_dir+'/D_3.5w.h5', by_name=True)
# G, D, WGAN = WGAN_net()
# G和D的训练方法
D.trainable = False
WGAN_input = keras.Input(shape=(latent_dim,))
WGAN_output = D(G(WGAN_input))
WGAN = keras.models.Model(WGAN_input, WGAN_output)
WGAN.summary()
# WGAN_optimizer = keras.optimizers.RMSprop(lr=0.0002, clipvalue=1.0, decay=1e-8)
WGAN_optimizer = keras.optimizers.Adam(lr=0.0004, beta_1=0.5)
WGAN.compile(optimizer=WGAN_optimizer, loss='binary_crossentropy')
# WGAN.load_weights(save_dir+'/WGAN_3.5w.h5', by_name=True)


# 定义图像拼接函数
def image_compose(ori_images, save_path):
    imgs = []
    for i in range(col * row):
        imgs.append(image.array_to_img((ori_images[i]+1) * 127.5, scale=False))
    to_image = Image.new('RGB', (col * image_size, row * image_size))   #创建一个新图
    # 循环遍历,把每张图片按顺序粘贴到对应位置上
    for y in range(row):
        for x in range(col):
            from_image = imgs[y*col+x].resize((image_size, image_size), Image.ANTIALIAS)
            to_image.paste(from_image, (x * image_size, y * image_size))
    return to_image.save(save_path)  # 保存新图


# DCGAN的训练
(x_train, y_train), (_, _) = keras.datasets.cifar10.load_data()
x_train = x_train[y_train.flatten() == 1]   # car
x_train = x_train.reshape((x_train.shape[0],) + (height, width, channels)).astype('float32') / 127.5 - 1
iterations = 40000
batch_size = 64
start = 0
for step in range(iterations):
    # Samples random points in the latent space
    random_latent_vectors = np.random.normal(size=(batch_size, latent_dim))
    # Decodes them to fake images
    G_images = G.predict(random_latent_vectors)
    # Combines them with real images
    stop = start + batch_size
    real_images = x_train[start: stop]
    combined_images = np.concatenate([G_images, real_images])
    # Assembles labels, discriminating real from fake images
    labels = np.concatenate([0.90*np.ones((batch_size, 1)),
                             np.zeros((batch_size, 1))])
    # 为标签添加随机噪音————这是一个重要的技巧!
    labels += 0.05 * np.random.random(labels.shape)+0.05
    for i in range(1):
        d_loss = D.train_on_batch(combined_images, labels)

    # Samples random points in the Assembles latent space
    random_latent_vectors = np.random.normal(size=(batch_size, latent_dim))
    # Assembles latent space labels that say “these are all real images” (it’s a lie!)
    misleading_targets = np.zeros((batch_size, 1))
    for i in range(1):
        a_loss = WGAN.train_on_batch(random_latent_vectors, misleading_targets)
    # a_loss = WGAN.train_on_batch(random_latent_vectors, misleading_targets)
    start += batch_size
    if start > len(x_train) - batch_size:
        start = 0
    # Occasionally saves and plots (every 100 steps)
    # print('step:', step, '/', iterations, '------', round(100 * step / iterations, 3), '%',
    #       '-- discriminator loss:', d_loss, '-- adversarial loss:', a_loss)
    if step % 125 == 0:
        if d_loss < 0: break
        print('step:', step, '/', iterations, '------', round(100 * step / iterations, 3), '%',
              '-- discriminator loss:', d_loss, '-- adversarial loss:', a_loss)
        if step < 5000 or step % 625 == 0:
            image_compose(G_images, os.path.join(save_dir, 'generated_pic' + str(step) + '.png'))
    if step % 5000 == 0:
        G.save(save_dir + '/G_' + str(step / 10000) + 'w.h5')
        D.save(save_dir + '/D_' + str(step / 10000) + 'w.h5')
        WGAN.save(save_dir + '/WGAN_' + str(step / 10000) + 'w.h5')
    if step % 10000 == 0:
        image_compose(real_images, os.path.join(save_dir, 'real_pic' + str(step) + '.png'))
G.save('./G_END.h5')
D.save('./D_END.h5')
WGAN.save('./WGAN_END.h5')

 

  • 5
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 7
    评论
评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值