【lzy学习笔记-dive into deep learning】4.6 暂退法Dropout 的原理与代码实现

DadongDer

已于 2022-02-16 23:07:28 修改

阅读量2.2k

点赞数 5

分类专栏： dive into deep learning 文章标签：深度学习学习机器学习

于 2022-02-16 22:59:48 首次发布

本文链接：https://blog.csdn.net/lzydadong/article/details/122971725

版权

dive into deep learning 专栏收录该内容

11 篇文章 2 订阅

订阅专栏

4.6.1 重新审视过拟合

线性模型

当⾯对更多的特征而样本不⾜时，线性模型往往会过拟合。相反，当给出更多样本而不是特征，通常线性模型不会过拟合。
不幸的是，线性模型泛化的可靠性是有代价的。简单地说，线性模型没有考虑到特征之间的交互作⽤。对于每个特征，线性模型必须指定正的或负的权重，而忽略其他特征。

深度神经网络

2017年，⼀组研究⼈员通过在随机标记的
图像上训练深度⽹络。这展⽰了神经⽹络的极⼤灵活性，因为⼈类很难将输⼊和随机标记的输出联系起来，但通过随机梯度下降优化的神经⽹络可以完美地标记训练集中的每⼀幅图像。想⼀想这意味着什么？假设标签是随机均匀分配的，并且有10个类别，那么分类器在测试数据上很难取得⾼于10%的精度，那么这⾥的泛化差距就⾼达90%，如此严重的过拟合。
深度⽹络的泛化性质令⼈费解，而这种泛化性质的数学基础仍然是悬而未决的研究问题。

vs

泛化性和灵活性之间的这种基本权衡被描述为偏差-方差权衡（bias-variance tradeoff）

线性模型	深度神经网络
没有考虑到特征之间的交互作⽤。对于每个特征，线性模型必须指定正的或负的权重，而忽略其他特征。	学习特征之间的交互。e.g.可能推断“尼⽇利亚”和“西联汇款”⼀起出现在电⼦邮件中表⽰垃圾邮件，但单独出现则不表⽰垃圾邮件。

当给出更多样本而不是特征，通常线性模型不会过拟合。	即使我们有⽐特征多得多的样本，深度神经⽹络也有可能过拟合。

4.6.2 扰动的稳健性

好的预测模型

期待好的预测模型能在未知的数据上有很好的表现：经典泛化理论认为，为了缩小训练和测试性能之间的差距，应该以简单的模型为目标。
①简单性以较小维度的形式展现。
e.g. 权重衰减（L2正则化）参数的范数代表了⼀种有⽤的简单性度量。
②简单性的另⼀个⻆度是平滑性，即函数不应该对其输⼊的微小变化敏感。

暂退法dropout的出现

原始论文：
在训练过程中，建议在计算后续层之前向⽹络的每⼀层注⼊噪声。因为当训练⼀个有多层的深层⽹络时，注⼊噪声只会在输⼊-输出映射上增强平滑性。
解读：
暂退法在前向传播过程中，计算每⼀内部层的同时注⼊噪声，这已经成为训练神经⽹络的常⽤技术。这种⽅法之所以被称为暂退法，因为从表⾯上看是在训练过程中丢弃（drop out）⼀些神经元。在整个训练过程的每⼀次迭代中，标准暂退法包括在计算下⼀层之前将当前层中的⼀些节点置零。

如何注入噪声

在这里插入图片描述

4.6.3 实践中的暂退法

在这里插入图片描述

4.6.4 从零开始实现

源代码遇到的问题
 Class Net

import torch
from torch import nn
import commfuncs


def dropout_layer(X, dropout):
    assert 0 <= dropout <= 1
    if dropout == 1:
        return torch.zeros_like(X)
    if dropout == 0:
        return X
    # 从均匀分布U[0, 1]中抽取样本，样本数与这层神经网络的维度⼀致
    # 保留那些对应样本大于p的节点，把剩下的丢弃
    mask = (torch.rand(X.shape) > dropout).float()
    # print(X)
    # tensor([[ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.],
    #         [ 8.,  9., 10., 11., 12., 13., 14., 15.]])
    # print(torch.rand(X.shape))
    # tensor([[0.1022, 0.1470, 0.5858, 0.9003, 0.0412, 0.0870, 0.3523, 0.0864],
    #         [0.9840, 0.3919, 0.8694, 0.2050, 0.4681, 0.3243, 0.5055, 0.5013]])
    # print(torch.rand(X.shape) > dropout)
    # tensor([[ True, False, False,  True, False, False, False, False],
    #         [ True, False, False, False,  True, False, False, False]])
    # print(mask)
    # tensor([[1., 0., 0., 0., 0., 1., 0., 1.],
    #         [0., 1., 0., 0., 1., 0., 0., 1.]])
    return mask * X / (1.0 - dropout) # 依据公式4.6.1


X = torch.arange(16, dtype=torch.float32).reshape((2, 8))
# print(X)
# print(dropout_layer(X, 0))
# print(dropout_layer(X, 1))
# print(dropout_layer(X, 0.5))

# 引入的Fashion-MNIST数据集
num_inputs, num_outputs, num_hiddens1, num_hiddens2 = 784, 10, 256, 256
dropout1, dropout2 = 0.2, 0.5

# 以将暂退法应⽤于每个隐藏层的输出（在激活函数之后），并且可以为每⼀层分别设置暂退概率
# 常⻅的技巧是在靠近输入层的地方设置较低的暂退概率
# 暂退法只在训练期间有效
class Net(nn.Module):# 自定义模型
    def __init__(self, num_inputs, num_outputs, num_hiddens1, num_hiddens2, is_training=True):
        super(Net, self).__init__()
        self.num_inputs = num_inputs
        self.training = is_training
        self.lin1 = nn.Linear(num_inputs, num_hiddens1)
        self.lin2 = nn.Linear(num_hiddens1, num_hiddens2)
        self.lin3 = nn.Linear(num_hiddens2, num_outputs)
        self.relu = nn.ReLU()

    def forward(self, X): # 连接关系
        H1 = self.relu(self.lin1(X.reshape((-1, self.num_inputs))))
        if self.training == True:
            H1 = dropout_layer(H1, dropout1)
        H2 = self.relu(self.lin2(H1))
        if self.training == True:
            H2 = dropout_layer(H2, dropout2)
        out = self.lin3(H2)
        return out

net = Net(num_inputs, num_outputs, num_hiddens1, num_hiddens2)

# for param in net.parameters():
    # print("param",param)
    # print("param.shape",param.shape)

# 训练同多层感知机
num_epochs, lr, batch_size = 2, 0.5, 256
loss = nn.CrossEntropyLoss(reduction='none')
trainer_iter, test_iter = commfuncs.load_data_fashion_mnist(batch_size)
trainer = torch.optim.SGD(net.parameters(), lr=lr)
commfuncs.train_ch3(net, trainer_iter, test_iter, loss, num_epochs, trainer)

4.6.5 简洁实现

import torch
from torch import nn
import commfuncs

dropout1, dropout2 = 0.2, 0.5
net = nn.Sequential(nn.Flatten(),
                    nn.Linear(784, 256),
                    nn.ReLU(),
                    nn.Dropout(dropout1),
                    nn.Linear(256, 256),
                    nn.ReLU(),
                    nn.Dropout(dropout2),
                    nn.Linear(256, 10))
# 在训练时，Dropout层将根据指定的暂退概率随机丢弃上⼀层的输出（相当于下⼀层的输⼊）
# 在测试时，Dropout层仅传递数据。
def init_weights(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, std=0.01)

net.apply(init_weights)

num_epochs, lr, batch_size = 20, 0.5, 256
loss = nn.CrossEntropyLoss(reduction='none')
trainer_iter, test_iter = commfuncs.load_data_fashion_mnist(batch_size)
trainer = torch.optim.SGD(net.parameters(), lr=lr)
commfuncs.train_ch3(net, trainer_iter, test_iter, loss, num_epochs, trainer)

4.6.6 小结

• 暂退法在前向传播过程中，计算每⼀内部层的同时丢弃⼀些神经元。
• 暂退法可以避免过拟合，它通常与控制权重向量的维数和⼤小结合使⽤的。
• 暂退法将活性值h替换为具有期望值h的随机变量。
• 暂退法仅在训练期间使⽤。

DadongDer

关注

5
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
【lzy学习笔记-dive into deep learning】4.6 暂退法Dropout 的原理与代码实现

4.6.1 重新审视过拟合线性模型当⾯对更多的特征而样本不⾜时，线性模型往往会过拟合。相反，当给出更多样本而不是特征，通常线性模型不会过拟合。不幸的是，线性模型泛化的可靠性是有代价的。简单地说，线性模型没有考虑到特征之间的交互作⽤。对于每个特征，线性模型必须指定正的或负的权重，而忽略其他特征。深度神经网络2017年，⼀组研究⼈员通过在随机标记的图像上训练深度⽹络。这展⽰了神经⽹络的极⼤灵活性，因为⼈类很难将输⼊和随机标记的输出联系起来，但通过随机梯度下降优化的神经⽹络可以完美地标记训练集中的每⼀
复制链接

扫一扫