dropout tensorflow2

dropout主要是为了防止训练模型过拟合,当DNN网络复杂的时候,输出超时,添加dropout层也可以让训练模型跑得快一丢丢。

典型的dropout方法:

1. 伯努利dropout,神经元都随机以概率0.5活动,属于离散dropout,在tensorflow当中,tf.nn.dropout(y,rate = 0.5),这个函数是y作为输入,所有神经元以概率rate保留输入到该神经元的值,以1-rate的概率将值设为0。这个y可以是x输入神经网络后,某一层得到的y=wx+b,或者直接是特征值的输入。rate是要求50%的神经元保留的概率(作为超参数的话,随着训练可以设置从0.5升到1)。

2.均匀dropout和高斯dropout,属于连续的dropout

连续dropout论文 链接:https://pan.baidu.com/s/1STcI_D6x5NPHJPceZnxzlg 
提取码:ole9

dropout的原理传送:

https://microstrong.blog.csdn.net/article/details/80737724

数据集为手写数据集,设计网络模型的时候,添加两层dropout层,每500条训练数据之后,验证测试集。

import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

import tensorflow as tf
from tensorflow.keras import datasets, layers, optimizers, Sequential, metrics
tf.random.set_seed(1)

def preprocess(x, y):
    x = tf.cast(x, dtype=tf.float32) / 255.
    y = tf.cast(y, dtype=tf.int32)
    return x, y


batchsz = 128
(x, y), (x_val, y_val) = datasets.mnist.load_data()
print('datasets:', x.shape, y.shape, x.min(), x.max())

db = tf.data.Dataset.from_tensor_slices((x, y))
db = db.map(preprocess).shuffle(60000).batch(batchsz).repeat(10)

ds_val = tf.data.Dataset.from_tensor_slices((x_val, y_val))
ds_val = ds_val.map(preprocess).batch(batchsz)

network = Sequential([layers.Dense(256, activation='relu'),
                      layers.Dropout(0.5),  # 0.5 rate to drop
                      layers.Dense(128, activation='relu'),
                      layers.Dropout(0.5),  # 0.5 rate to drop
                      layers.Dense(64, activation='relu'),
                      layers.Dense(32, activation='relu'),
                      layers.Dense(10)])
network.build(input_shape=(None, 28 * 28))
network.summary()

optimizer = optimizers.Adam(lr=0.01)
flag_regulation = True
for step, (x, y) in enumerate(db):

    with tf.GradientTape() as tape:
        # [b, 28, 28] => [b, 784]
        x = tf.reshape(x, (-1, 28 * 28))
        # [b, 784] => [b, 10]
        out = network(x, training=True)
        # [b] => [b, 10]
        y_onehot = tf.one_hot(y, depth=10)
        # [b]
        loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_onehot, out, from_logits=True))
        if flag_regulation==True:
            loss_regularization = []
            for p in network.trainable_variables:
                loss_regularization.append(tf.nn.l2_loss(p))
            loss_regularization = tf.reduce_sum(tf.stack(loss_regularization))

            loss = loss + 0.0001 * loss_regularization

    grads = tape.gradient(loss, network.trainable_variables)
    optimizer.apply_gradients(zip(grads, network.trainable_variables))

    if step % 500 == 0:
        # evaluate
        total, total_correct = 0., 0
        for episode, (x, y) in enumerate(ds_val):
            # [b, 28, 28] => [b, 784]
            x = tf.reshape(x, (-1, 28 * 28))
            # [b, 784] => [b, 10] 
            out = network(x)
            # [b, 10] => [b] 
            pred = tf.argmax(out, axis=1)
            pred = tf.cast(pred, dtype=tf.int32)
            # bool type 
            correct = tf.equal(pred, y)
            # bool tensor => int tensor => numpy
            total_correct += tf.reduce_sum(tf.cast(correct, dtype=tf.int32)).numpy()
            total += x.shape[0]

        print(step, 'loss:', float(loss),'Evaluate Acc with drop:', total_correct / total)

       
        
datasets: (60000, 28, 28) (60000,) 0 255
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 256)               200960    
_________________________________________________________________
dropout (Dropout)            (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               32896     
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_3 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_4 (Dense)              (None, 10)                330       
=================================================================
Total params: 244,522
Trainable params: 244,522
Non-trainable params: 0
_________________________________________________________________
0 loss: 2.3945655822753906 Evaluate Acc with drop: 0.2225
500 loss: 0.36276036500930786 Evaluate Acc with drop: 0.9414
1000 loss: 0.41202312707901 Evaluate Acc with drop: 0.9523
1500 loss: 0.25829312205314636 Evaluate Acc with drop: 0.9559
2000 loss: 0.2626938223838806 Evaluate Acc with drop: 0.9541
2500 loss: 0.31651824712753296 Evaluate Acc with drop: 0.9557
3000 loss: 0.2286975234746933 Evaluate Acc with drop: 0.9623
3500 loss: 0.3360612988471985 Evaluate Acc with drop: 0.963
4000 loss: 0.26416313648223877 Evaluate Acc with drop: 0.961
4500 loss: 0.24929602444171906 Evaluate Acc with drop: 0.9635
4689 loss: 0.25821831822395325 Evaluate Acc with drop: 0.9635

Process finished with exit code 0

如果添加了dropout层,并且使用正则化,令那个 flag_regulation = True,输出结果为

datasets: (60000, 28, 28) (60000,) 0 255
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 256)               200960    
_________________________________________________________________
dropout (Dropout)            (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               32896     
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_3 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_4 (Dense)              (None, 10)                330       
=================================================================
Total params: 244,522
Trainable params: 244,522
Non-trainable params: 0
_________________________________________________________________
0 loss: 2.4295244216918945 Evaluate Acc with drop: 0.223
500 loss: 0.44218504428863525 Evaluate Acc with drop: 0.9446
1000 loss: 0.5258310437202454 Evaluate Acc with drop: 0.9497
1500 loss: 0.47201406955718994 Evaluate Acc with drop: 0.9488
2000 loss: 0.47350579500198364 Evaluate Acc with drop: 0.9535
2500 loss: 0.6400591135025024 Evaluate Acc with drop: 0.9492
3000 loss: 0.7657594680786133 Evaluate Acc with drop: 0.9511
3500 loss: 0.5626079440116882 Evaluate Acc with drop: 0.9401
4000 loss: 0.5259348154067993 Evaluate Acc with drop: 0.9522
4500 loss: 0.38989314436912537 Evaluate Acc with drop: 0.9496
4689 loss: 0.6070573329925537 Evaluate Acc with drop: 0.9496

Process finished with exit code 0

看来dropout和正则化放在一起性能降低了呢(甚至不如没有dropout和正则化好),掌握技术就好吧~

如果只用正则化,没有tropout层,输出是:

datasets: (60000, 28, 28) (60000,) 0 255
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 256)               200960    
_________________________________________________________________
dense_1 (Dense)              (None, 128)               32896     
_________________________________________________________________
dense_2 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_3 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_4 (Dense)              (None, 10)                330       
=================================================================
Total params: 244,522
Trainable params: 244,522
Non-trainable params: 0
_________________________________________________________________
0 loss: 2.354185104370117 Evaluate Acc with drop: 0.2086
500 loss: 0.2510494589805603 Evaluate Acc with drop: 0.942
1000 loss: 0.20196089148521423 Evaluate Acc with drop: 0.9467
1500 loss: 0.18167926371097565 Evaluate Acc with drop: 0.9648
2000 loss: 0.24093079566955566 Evaluate Acc with drop: 0.9653
2500 loss: 0.38632088899612427 Evaluate Acc with drop: 0.9577
3000 loss: 0.3047362267971039 Evaluate Acc with drop: 0.9623
3500 loss: 0.3375288248062134 Evaluate Acc with drop: 0.956
4000 loss: 0.2577934265136719 Evaluate Acc with drop: 0.9651
4500 loss: 0.13722452521324158 Evaluate Acc with drop: 0.9652
4689 loss: 0.18181777000427246 Evaluate Acc with drop: 0.9652

Process finished with exit code 0

单单的加dropout或者只有正则化操作回比二者放一块强,考虑到手写数据集的测试集本来就很多,已经可以避免很大程度的过拟合了,再加“骚操作”反而性能降低。具体问题,具体操作吧~

下面这篇博客说明了dropout放在卷积前或者卷积后是对模型是有效果的,放在中间效果不佳。

https://blog.csdn.net/u010960155/article/details/104103795

解释了

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值