dropout tensorflow2

Am_mSXQS

于 2021-07-27 21:28:52 发布

阅读量1.3k

点赞数

分类专栏：人工智能的日常文章标签： python tensorflow

本文链接：https://blog.csdn.net/qq_41567413/article/details/119153422

版权

人工智能的日常专栏收录该内容

14 篇文章 1 订阅

订阅专栏

dropout主要是为了防止训练模型过拟合，当DNN网络复杂的时候，输出超时，添加dropout层也可以让训练模型跑得快一丢丢。

典型的dropout方法：

1. 伯努利dropout，神经元都随机以概率0.5活动，属于离散dropout，在tensorflow当中，tf.nn.dropout(y,rate = 0.5)，这个函数是y作为输入，所有神经元以概率rate保留输入到该神经元的值，以1-rate的概率将值设为0。这个y可以是x输入神经网络后，某一层得到的y=wx+b，或者直接是特征值的输入。rate是要求50%的神经元保留的概率（作为超参数的话，随着训练可以设置从0.5升到1）。

2.均匀dropout和高斯dropout，属于连续的dropout

连续dropout论文链接：https://pan.baidu.com/s/1STcI_D6x5NPHJPceZnxzlg
提取码：ole9

dropout的原理传送：

https://microstrong.blog.csdn.net/article/details/80737724

数据集为手写数据集，设计网络模型的时候，添加两层dropout层，每500条训练数据之后，验证测试集。

import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

import tensorflow as tf
from tensorflow.keras import datasets, layers, optimizers, Sequential, metrics
tf.random.set_seed(1)

def preprocess(x, y):
    x = tf.cast(x, dtype=tf.float32) / 255.
    y = tf.cast(y, dtype=tf.int32)
    return x, y


batchsz = 128
(x, y), (x_val, y_val) = datasets.mnist.load_data()
print('datasets:', x.shape, y.shape, x.min(), x.max())

db = tf.data.Dataset.from_tensor_slices((x, y))
db = db.map(preprocess).shuffle(60000).batch(batchsz).repeat(10)

ds_val = tf.data.Dataset.from_tensor_slices((x_val, y_val))
ds_val = ds_val.map(preprocess).batch(batchsz)

network = Sequential([layers.Dense(256, activation='relu'),
                      layers.Dropout(0.5),  # 0.5 rate to drop
                      layers.Dense(128, activation='relu'),
                      layers.Dropout(0.5),  # 0.5 rate to drop
                      layers.Dense(64, activation='relu'),
                      layers.Dense(32, activation='relu'),
                      layers.Dense(10)])
network.build(input_shape=(None, 28 * 28))
network.summary()

optimizer = optimizers.Adam(lr=0.01)
flag_regulation = True
for step, (x, y) in enumerate(db):

    with tf.GradientTape() as tape:
        # [b, 28, 28] => [b, 784]
        x = tf.reshape(x, (-1, 28 * 28))
        # [b, 784] => [b, 10]
        out = network(x, training=True)
        # [b] => [b, 10]
        y_onehot = tf.one_hot(y, depth=10)
        # [b]
        loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_onehot, out, from_logits=True))
        if flag_regulation==True:
            loss_regularization = []
            for p in network.trainable_variables:
                loss_regularization.append(tf.nn.l2_loss(p))
            loss_regularization = tf.reduce_sum(tf.stack(loss_regularization))

            loss = loss + 0.0001 * loss_regularization

    grads = tape.gradient(loss, network.trainable_variables)
    optimizer.apply_gradients(zip(grads, network.trainable_variables))

    if step % 500 == 0:
        # evaluate
        total, total_correct = 0., 0
        for episode, (x, y) in enumerate(ds_val):
            # [b, 28, 28] => [b, 784]
            x = tf.reshape(x, (-1, 28 * 28))
            # [b, 784] => [b, 10] 
            out = network(x)
            # [b, 10] => [b] 
            pred = tf.argmax(out, axis=1)
            pred = tf.cast(pred, dtype=tf.int32)
            # bool type 
            correct = tf.equal(pred, y)
            # bool tensor => int tensor => numpy
            total_correct += tf.reduce_sum(tf.cast(correct, dtype=tf.int32)).numpy()
            total += x.shape[0]

        print(step, 'loss:', float(loss),'Evaluate Acc with drop:', total_correct / total)

datasets: (60000, 28, 28) (60000,) 0 255
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 256)               200960    
_________________________________________________________________
dropout (Dropout)            (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               32896     
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_3 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_4 (Dense)              (None, 10)                330       
=================================================================
Total params: 244,522
Trainable params: 244,522
Non-trainable params: 0
_________________________________________________________________
0 loss: 2.3945655822753906 Evaluate Acc with drop: 0.2225
500 loss: 0.36276036500930786 Evaluate Acc with drop: 0.9414
1000 loss: 0.41202312707901 Evaluate Acc with drop: 0.9523
1500 loss: 0.25829312205314636 Evaluate Acc with drop: 0.9559
2000 loss: 0.2626938223838806 Evaluate Acc with drop: 0.9541
2500 loss: 0.31651824712753296 Evaluate Acc with drop: 0.9557
3000 loss: 0.2286975234746933 Evaluate Acc with drop: 0.9623
3500 loss: 0.3360612988471985 Evaluate Acc with drop: 0.963
4000 loss: 0.26416313648223877 Evaluate Acc with drop: 0.961
4500 loss: 0.24929602444171906 Evaluate Acc with drop: 0.9635
4689 loss: 0.25821831822395325 Evaluate Acc with drop: 0.9635

Process finished with exit code 0

如果添加了dropout层，并且使用正则化，令那个 flag_regulation = True，输出结果为

datasets: (60000, 28, 28) (60000,) 0 255
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 256)               200960    
_________________________________________________________________
dropout (Dropout)            (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               32896     
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_3 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_4 (Dense)              (None, 10)                330       
=================================================================
Total params: 244,522
Trainable params: 244,522
Non-trainable params: 0
_________________________________________________________________
0 loss: 2.4295244216918945 Evaluate Acc with drop: 0.223
500 loss: 0.44218504428863525 Evaluate Acc with drop: 0.9446
1000 loss: 0.5258310437202454 Evaluate Acc with drop: 0.9497
1500 loss: 0.47201406955718994 Evaluate Acc with drop: 0.9488
2000 loss: 0.47350579500198364 Evaluate Acc with drop: 0.9535
2500 loss: 0.6400591135025024 Evaluate Acc with drop: 0.9492
3000 loss: 0.7657594680786133 Evaluate Acc with drop: 0.9511
3500 loss: 0.5626079440116882 Evaluate Acc with drop: 0.9401
4000 loss: 0.5259348154067993 Evaluate Acc with drop: 0.9522
4500 loss: 0.38989314436912537 Evaluate Acc with drop: 0.9496
4689 loss: 0.6070573329925537 Evaluate Acc with drop: 0.9496

Process finished with exit code 0

看来dropout和正则化放在一起性能降低了呢（甚至不如没有dropout和正则化好），掌握技术就好吧~

如果只用正则化，没有tropout层，输出是：

datasets: (60000, 28, 28) (60000,) 0 255
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 256)               200960    
_________________________________________________________________
dense_1 (Dense)              (None, 128)               32896     
_________________________________________________________________
dense_2 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_3 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_4 (Dense)              (None, 10)                330       
=================================================================
Total params: 244,522
Trainable params: 244,522
Non-trainable params: 0
_________________________________________________________________
0 loss: 2.354185104370117 Evaluate Acc with drop: 0.2086
500 loss: 0.2510494589805603 Evaluate Acc with drop: 0.942
1000 loss: 0.20196089148521423 Evaluate Acc with drop: 0.9467
1500 loss: 0.18167926371097565 Evaluate Acc with drop: 0.9648
2000 loss: 0.24093079566955566 Evaluate Acc with drop: 0.9653
2500 loss: 0.38632088899612427 Evaluate Acc with drop: 0.9577
3000 loss: 0.3047362267971039 Evaluate Acc with drop: 0.9623
3500 loss: 0.3375288248062134 Evaluate Acc with drop: 0.956
4000 loss: 0.2577934265136719 Evaluate Acc with drop: 0.9651
4500 loss: 0.13722452521324158 Evaluate Acc with drop: 0.9652
4689 loss: 0.18181777000427246 Evaluate Acc with drop: 0.9652

Process finished with exit code 0

单单的加dropout或者只有正则化操作回比二者放一块强，考虑到手写数据集的测试集本来就很多，已经可以避免很大程度的过拟合了，再加“骚操作”反而性能降低。具体问题，具体操作吧~

下面这篇博客说明了dropout放在卷积前或者卷积后是对模型是有效果的，放在中间效果不佳。

https://blog.csdn.net/u010960155/article/details/104103795

解释了

Am_mSXQS

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
dropout tensorflow2

dropout的原理传送：https://microstrong.blog.csdn.net/article/details/80737724数据集为手写数据集，设计网络模型的时候，添加两层dropout层，每500条训练数据之后，验证测试集。import osos.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'import tensorflow as tffrom tensorflow.keras import datasets, layers, opt
复制链接

扫一扫

专栏目录