为了小论文之跟着李沐学AI(六)

最新推荐文章于 2023-12-02 12:35:17 发布

70pice

最新推荐文章于 2023-12-02 12:35:17 发布

阅读量780

点赞数 1

文章标签：正则化权重衰减 Dropout 过拟合神经网络

本文链接：https://blog.csdn.net/qq_36309174/article/details/121333105

版权

泛化增强

我们需要的是一个高考能考出好成绩的学生，而不是只有在模拟考考得好的学生

!pip install d2l
import math
import numpy as np
import torch
from torch import nn
from d2l import torch as d2l

导入一些库，进行准备

max_degree = 20  # 多项式的最大阶数
n_train, n_test = 100, 100  # 训练和测试数据集大小,分配训练集是100，而测试集也是100
true_w = np.zeros(max_degree)  # 分配大量的空间，多项式系数,这个列表里装多项式的系数
true_w[0:4] = np.array([5, 1.2, -3.4, 5.6]) #我们只规定了前面四个系数，后面的系数都为0

features = np.random.normal(size=(n_train + n_test, 1))#特征，这是一个200 * 1的矩阵，其实就相当于x
np.random.shuffle(features)#打乱这个特征把x打乱
poly_features = np.power(features, np.arange(max_degree).reshape(1, -1))#这里相当于对于每一个x进行n次方，一行代表一个x经过n次方
for i in range(max_degree):
    poly_features[:, i] /= math.gamma(i + 1)  # `gamma(n)` = (n-1)! 除上它的阶乘
# `labels`的维度: (`n_train` + `n_test`,)
labels = np.dot(poly_features, true_w)#乘上系数矩阵
labels += np.random.normal(scale=0.1, size=labels.shape)#增加一些白噪声
#在这里 labels是y poly_features是多项式预测的值

train(poly_features[:n_train, :4], poly_features[n_train:, :4],
      labels[:n_train], labels[n_train:])
# 从多项式特征中选择前2个维度，即 1, x
train(poly_features[:n_train, :2], poly_features[n_train:, :2],
      labels[:n_train], labels[n_train:])

取四个系数或者两个系数去查看它的拟合情况
在这里插入图片描述

很明显，对于参数选择多，那么带来的问题就是模型对于训练集的拟合很强大但是也带来了一个问题。

这个问题的答案是从知乎上看来的，真实醍醐灌顶。就是，对于一个训练集，如果我们过拟合，那就会造成我们的模型想去模拟训练集的每一个点，并且训练集中还有噪声，这就会发生一个问题，倒数过大，因为噪声的存在，点与点之间有一些的差距很大，那你要去贯穿这些点，势必造成你的值的变化很大，那就造成导数过大，造成系数过大，为了限制这种情况产生，就出现了一个权重衰退。具体理论

在这里插入图片描述
我们知道，本来中心店才是我们的损失函数最小的地方，现在加上一个正则化，把点拉倒交界处（黑点）的地方。这就是正则化带来的力量。

在这里插入图片描述
具体的推导就很简单，就是求一个偏导，有手就行

只放简洁的部分的部分

trainer = torch.optim.SGD([
        {"params":net[0].weight,'weight_decay': wd},
        {"params":net[0].bias}], lr=lr)

以后看到知道是怎么回儿事儿就行了

drop out，当前用的还是蛮多的一个东西
他的核心就是防止过拟合。他的想法就是，对于一个隐藏层，随机的根据drop out的概率选择一些点让他的输出变成0，另一部分点扩大，我们来看一下它的实现

在这里插入图片描述

import torch
from torch import nn
from d2l import torch as d2l


def dropout_layer(X, dropout):
    assert 0 <= dropout <= 1  #首先，我们断言这个drop out的概率在0-1之间， 超过这个概率很明显肯定不真长
    # 在本情况中，所有元素都被丢弃。
    if dropout == 1:
        return torch.zeros_like(X)  #如果dropout=1,那不用说，这一层的元素全部被隐藏了
    # 在本情况中，所有元素都被保留。
    if dropout == 0:#如果dropout=0，那就相当于这是一个无效操作
        return X
    mask = (torch.Tensor(X.shape).uniform_(0, 1) > dropout).float()#经过一个0，1的随机分布，随机挑选参数隐藏，并且进行float浮点数的转换
    return mask * X / (1.0 - dropout)

class Net(nn.Module):
    def __init__(self, num_inputs, num_outputs, num_hiddens1, num_hiddens2,is_training = True):
      #为什么一定要规定is_training这个参数呢，我们要确保确实在训练的，训练的时候参数是未定义的，这个时候有用，但是预测时候，我们不希望参数发生改变
        super(Net, self).__init__() #这个是常用写法，python的继承
        self.num_inputs = num_inputs  #输入的维数
        self.training = is_training   #是否是训练集
        self.lin1 = nn.Linear(num_inputs, num_hiddens1) #第一层
        self.lin2 = nn.Linear(num_hiddens1, num_hiddens2) #第二层
        self.lin3 = nn.Linear(num_hiddens2, num_hiddens2)  #第三层
        self.lin4 = nn.Linear(num_hiddens2, num_outputs)  #第三层
        self.relu = nn.ReLU() #激活函数

    def forward(self, X):
        H1 = self.relu(self.lin1(X.reshape((-1, self.num_inputs))))
        # 只有在训练模型时才使用dropout
        if self.training == True:
            # 在第一个全连接层之后添加一个dropout层
            H1 = dropout_layer(H1, dropout1)
        H2 = self.relu(self.lin2(H1))
        if self.training == True:
            # 在第二个全连接层之后添加一个dropout层
            H2 = dropout_layer(H2, dropout2)
        H3 = self.lin3(H2)
        out = self.lin4(H3)
        return out

num_inputs, num_outputs, num_hiddens1, num_hiddens2 = 784, 10, 256, 256
net = Net(num_inputs, num_outputs, num_hiddens1, num_hiddens2)
num_epochs, lr, batch_size = 10, 0.5, 256
loss = nn.CrossEntropyLoss()
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
trainer = torch.optim.SGD(net.parameters(), lr=lr)
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)

dropout就加在后面就加在隐藏层之后就行，这里我多加了一层线性层模仿过拟合，进行比较

net = nn.Sequential(nn.Flatten(),
        nn.Linear(784, 256),
        nn.ReLU(),
        # 在第一个全连接层之后添加一个dropout层
        nn.Dropout(dropout1),
        nn.Linear(256, 256),
        nn.ReLU(),
        # 在第二个全连接层之后添加一个dropout层
        nn.Dropout(dropout2),
        nn.Linear(256, 10))

def init_weights(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, std=0.01)

net.apply(init_weights);

简洁实现