过拟合的两种解决方法

最新推荐文章于 2024-06-03 20:11:57 发布

顺毛黑起

最新推荐文章于 2024-06-03 20:11:57 发布

阅读量2.9k

点赞数 1

分类专栏：深度学习（pytorch）文章标签：深度学习正则化神经网络过拟合

本文链接：https://blog.csdn.net/Apikaqiu/article/details/104282911

版权

深度学习（pytorch）专栏收录该内容

9 篇文章 3 订阅

订阅专栏

本文主要是学习了Dive-into-DL-PyTorch这本书。因此这篇博客的大部分内容来源于此书。框架使用的是pytorch，开发工具是pycharm
参考动手学深度学习Dive-into-DL-Pytorch
参考链接 https://github.com/ShusenTang/Dive-into-DL-PyTorch
https://github.com/zergtant/pytorch-handbook

方法一权重衰减（L2范数正则化）

在模型原损失函数基础上添加L2范数正则化。L2范数惩罚项指的是模型权重参数每个元素的平方和再乘以一个正的常数。
以线性回归的损失函数为例：原损失函数
在这里插入图片描述
加入L2范数惩罚项之后损失函数为：
超参数λ>0。当权重参数为0时，惩罚项最小，当λ较大时，惩罚项对损失函数会有较大影响，会通常使得学到的权重参数的元素较接近0.‖w‖^2=w1w1+w2w2。在小批量随机梯度下降中，原来线性回归权重的迭代方式是
在这里插入图片描述现在迭代方式更改为
L2范数正则化令权重先自称小于1的数，再减去不含惩罚项的梯度。因此L2范数正则化又称权重衰减。权重衰减通过惩罚绝对值较大的模型参数为需要学习的模型增加限制。（也可在惩罚项中添加偏差元素的平方和）
高维线性回归例子：
设数据样本特征维度为p.对于训练集和测试集中的任一样本，用以下函数生成标签
在这里插入图片描述

'''
引用
@book{zhang2019dive,
    title={Dive into Deep Learning},
    author={Aston Zhang and Zachary C. Lipton and Mu Li and Alexander J. Smola},
    note={\url{http://www.d2l.ai}},
    year={2020}
}
'''
import torch
import torch.nn as nn
import numpy as np
import sys
import matplotlib.pyplot as plt
sys.path.append("..")
from IPython import display
#生成数据
n_train, n_test, num_inputs = 20, 100, 200
true_w, true_b = torch.ones(num_inputs, 1) * 0.01, 0.05
features = torch.randn((n_train + n_test, num_inputs))
labels = torch.matmul(features, true_w) + true_b
labels += torch.tensor(np.random.normal(0, 0.01, size=labels.size()), dtype=torch.float)
train_features, test_features = features[:n_train, :], features[n_train:, :]
train_labels, test_labels = labels[:n_train], labels[n_train:]
#初始化模型参数
def init_params():
    w = torch.randn((num_inputs, 1), requires_grad=True)
    b = torch.zeros(1, requires_grad=True)
    return [w, b]
#定义L2范数惩罚项
def l2_penalty(w):
    return (w**2).sum() / 2
#模型
def linreg(X, w, b):
    return torch.mm(X, w) + b

#损失函数
def squared_loss(y_hat, y):
    # 注意这里返回的是向量, 另外, pytorch里的MSELoss并没有除以 2
    return ((y_hat - y.view(y_hat.size())) ** 2) / 2



#定义训练和测试
batch_size, num_epochs, lr = 1, 100, 0.003
net, loss = linreg, squared_loss
dataset = torch.utils.data.TensorDataset(train_features, train_labels)
train_iter = torch.utils.data.DataLoader(dataset, batch_size, shuffle=True)
#绘制图形
def set_figsize(figsize=(3.5, 2.5)):
    display.set_matplotlib_formats('svg')
    # 设置图的尺寸
    plt.rcParams['figure.figsize'] = figsize
def semilogy(x_vals, y_vals, x_label, y_label, x2_vals=None, y2_vals=None,
             legend=None, figsize=(3.5, 2.5)):
    set_figsize(figsize)
    plt.xlabel(x_label)
    plt.ylabel(y_label)
    plt.semilogy(x_vals, y_vals)
    if x2_vals and y2_vals:
        plt.semilogy(x2_vals, y2_vals, linestyle=':')
        plt.legend(legend)
    plt.show()
def fit_and_plot_pytorch(wd):
#在构造优化器实例时通过weight_decay参数来指定权重衰减超参数。
#默认下，pytorch会对权重和偏差同时衰减。可以分别对权重和偏差构造优化器实例，从而只对权重衰减
    net = nn.Linear(num_inputs, 1)
    nn.init.normal_(net.weight, mean=0, std=1)
    nn.init.normal_(net.bias, mean=0, std=1)
    optimizer_w = torch.optim.SGD(params=[net.weight], lr=lr, weight_decay=wd) # 对权重参数衰减
    optimizer_b = torch.optim.SGD(params=[net.bias], lr=lr)  # 不对偏差参数衰减
    train_ls, test_ls = [], []
    for _ in range(num_epochs):
        for X, y in train_iter:
            l = loss(net(X), y).mean()
            optimizer_w.zero_grad()
            optimizer_b.zero_grad()
            l.backward()
            #对两个optimizer实例分别调用step函数，从而分别更新权重和偏差
            optimizer_w.step()
            optimizer_b.step()
        train_ls.append(loss(net(train_features), train_labels).mean().item())
        test_ls.append(loss(net(test_features), test_labels).mean().item())
    semilogy(range(1, num_epochs + 1), train_ls, 'epochs', 'loss',
                 range(1, num_epochs + 1), test_ls, ['train', 'test'])
    print('L2 norm of w:', net.weight.data.norm().item())
fit_and_plot_pytorch(0)
fit_and_plot_pytorch(3)

两次结果，一次没有使用权重衰减，出现过拟合现象，另一次使用权重衰减，虽然训练误差提高，但在测试集上误差减小

L2 norm of w: 13.700491905212402
L2 norm of w: 0.0327015146613121

Process finished with exit code 0

结果图对比
过拟合权重衰减

方法二丢弃法（dropout）

全连接层当对该隐藏层使用丢弃法时，该层的隐藏单元（图示的隐藏层有5个隐藏单元）。设丢弃概率为p，则hi会有p的概率被清零，有1-p的概率hi会除以1-p做拉伸。丢弃概率是丢弃法的超参数。在训练中隐藏层的神经单元是随机丢弃的，是的在模型训练时起到正则化的作用来应对过拟合。但是在测试模型的时候，为了得到更加确定的结果，一般不适用丢弃法。
随机丢弃后的网络图例子使用的数据集是Fashion-MNIST数据集，定义两个隐藏层的输出个数都是256

'''
引用
@book{zhang2019dive,
    title={Dive into Deep Learning},
    author={Aston Zhang and Zachary C. Lipton and Mu Li and Alexander J. Smola},
    note={\url{http://www.d2l.ai}},
    year={2020}
}
'''
import torch
import torchvision
import torch.nn as nn
import numpy as np
import sys
 
def dropout(X, drop_prob):
    X = X.float()
    assert 0 <= drop_prob <= 1
    keep_prob = 1 - drop_prob
    #这种情况下把全部元素丢弃
    if keep_prob == 0:
        return torch.zeros_like(X)
    mask = (torch.randn(X.shape) < keep_prob).float()
    return mask * X / keep_prob

#数据加载
def load_data_fashion_mnist(batch_size, resize=None, root='~/Datasets/FashionMNIST'):
    """Download the fashion mnist dataset and then load into memory."""
    trans = []
    if resize:
        trans.append(torchvision.transforms.Resize(size=resize))
    trans.append(torchvision.transforms.ToTensor())

    transform = torchvision.transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)
    mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)
    if sys.platform.startswith('win'):
        num_workers = 0  # 0表示不用额外的进程来加速读取数据
    else:
        num_workers = 4
    train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers)
    test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers)

    return train_iter, test_iter
#定义模型参数
num_inputs, num_outputs, num_hiddens1, num_hiddens2 = 784, 10, 256, 256
W1 = torch.tensor(np.random.normal(0, 0.01, size=(num_inputs, num_hiddens1)), dtype=torch.float, requires_grad=True)
b1 = torch.zeros(num_hiddens1, requires_grad=True)
W2 = torch.tensor(np.random.normal(0, 0.01, size=(num_hiddens1, num_hiddens2)), dtype=torch.float, requires_grad=True)
b2 = torch.zeros(num_hiddens2, requires_grad=True)
W3 = torch.tensor(np.random.normal(0, 0.01, size=(num_hiddens2, num_outputs)), dtype=torch.float, requires_grad=True)
b3 = torch.zeros(num_outputs, requires_grad=True)
params = [W1, b1, W2, b2, W3, b3]

#定义模型
drop_prob1, drop_prob2 = 0.2, 0.5
def net(X, is_training=True):
    X = X.view(-1, num_inputs)
    H1 = (torch.matmul(X, W1) + b1).relu()
    if is_training:  # 只在训练模型时使用丢弃法
       H1 = dropout(H1, drop_prob1)  # 在第一层全连接后添加丢弃层
    H2 = (torch.matmul(H1, W2) + b2).relu()
    if is_training:
        H2 = dropout(H2, drop_prob2)  # 在第二层全连接后添加丢弃层
    return torch.matmul(H2, W3) + b3

#模型评估
def evaluate_accuracy(data_iter, net):
    acc_sum, n = 0.0, 0
    for X, y in data_iter:
        if isinstance(net, torch.nn.Module):
            net.eval() #评估模式，这会关闭dropout
            acc_sum += (net(X).argmax(dim=1) == y).float().sum().item()
            net.train() # 改回训练模式
        else: # 自定义模型
           if('is_training' in net.__code__.co_varnames): #如果有is_training这个参数
               #将is_training设置成False
               acc_sum += (net(X, is_training=False).argmax(dim=1) == y).float().sum().item()
           else:
               acc_sum += (net(X).argmax(dim=1) == y).float().sum().item()
        n += y.shape[0]
    return acc_sum / n

#训练和测试模型
num_epochs, lr, batch_size = 5, 100.0, 256
loss = torch.nn.CrossEntropyLoss()
#定义训练
#随机梯度下降
def sgd(params, lr, batch_size):
    for param in params:
        param.data -= lr * param.grad / batch_size  # 注意这里更改param时用的param.data

def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,
              params=None, lr=None, optimizer=None):
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
        for X, y in train_iter:
            y_hat = net(X)
            l = loss(y_hat, y).sum() #每个小样本（batch size）的损失函数和

            # 梯度清零
            if optimizer is not None:
                optimizer.zero_grad()
            elif params is not None and params[0].grad is not None:
                for param in params:
                    param.grad.data.zero_()  #各个参数的梯度清零

            l.backward()
            if optimizer is None:
                sgd(params, lr, batch_size)
            else:
                optimizer.step()  # “softmax回归的简洁实现”一节将用到

            train_l_sum += l.item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
            n += y.shape[0]
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'
              % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc)) #损失的平均值  训练的平均精确度，测试数据的精确度
    
train_iter, test_iter = load_data_fashion_mnist(batch_size)
train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params, lr)

结果输出

epoch 1, loss 0.0042, train acc 0.583, test acc 0.721
epoch 2, loss 0.0022, train acc 0.791, test acc 0.776
epoch 3, loss 0.0019, train acc 0.826, test acc 0.811
epoch 4, loss 0.0017, train acc 0.842, test acc 0.832
epoch 5, loss 0.0016, train acc 0.852, test acc 0.822

Process finished with exit code 0

顺毛黑起

关注

1
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
过拟合的两种解决方法

本文主要是学习了Dive-into-DL-PyTorch这本书。因此这篇博客的大部分内容来源于此书。框架使用的是pytorch，开发工具是pycharm参考动手学深度学习Dive-into-DL-Pytorch参考链接 https://github.com/ShusenTang/Dive-into-DL-PyTorchhttps://github.com/zergtant/pytorch-...
复制链接

扫一扫