深度学习过拟合解决方案（pytorch相关方案实现）

最新推荐文章于 2024-05-22 13:53:26 发布

枫林扬

最新推荐文章于 2024-05-22 13:53:26 发布

阅读量9.2k

点赞数 12

分类专栏：深度学习 pytorch

本文链接：https://blog.csdn.net/zhang2010hao/article/details/89339327

版权

深度学习同时被 2 个专栏收录

15 篇文章 0 订阅

订阅专栏

pytorch

4 篇文章 1 订阅

订阅专栏

描述

最近做项目出现过拟合的情况，具体表现为，使用简单模型的时候需要迭代十几个epoch之后才能达到一个理想的结果，并且之后loss趋于稳定，f1也趋于稳定；后来使用复杂的模型后，两三个epoch后能达到更好的结果但是之后随着loss下降f1值反而下降了。这是一个比较明显的的过拟合现象。

解决方案

对于深度学习网络的过拟合，一般的解决方案有：

1.Early stop

在模型训练过程中，提前终止。这里可以根据具体指标设置early stop的条件，比如可以是loss的大小，或者acc/f1等值的epoch之间的大小对比。

2.More data

用更多的数据集。增加样本也是一种解决方案，根据不同场景和数据有不同的数据增强方法。

3.正则化

常用的有L1、L2正则化

4.Droup Out

以一定的概率使某些神经元停止工作

5.BatchNorm

对神经元作归一化

实现

这里主要讲述一下在pytorch中的过拟合解决方案，early stop和more data都是对于特定的任务去进行的，不同的任务有不同的解决方案，这里不做进一步说明。在pytorch框架下后面几种解决方案是有统一的结构或者解决办法的，这里一一道来。

1.正则化

torch.optim集成了很多优化器，如SGD，Adadelta，Adam，Adagrad，RMSprop等，这些优化器中有一个参数weight_decay，用于指定权值衰减率，相当于L2正则化中的λ参数，注意torch.optim集成的优化器只有L2正则化方法，api中参数weight_decay 的解析是：weight_decay (float, optional): weight decay (L2 penalty) (default: 0)，这里可以看出其weight_decay就是正则化项的作用。可以如下设置L2正则化：

optimizer = optim.Adam(model.parameters(),lr=0.001,weight_decay=0.01)

但是这种方法存在几个问：

（1）一般正则化，只是对模型的权重W参数进行惩罚，而偏置参数b是不进行惩罚的，而torch.optim的优化器weight_decay参数指定的权值衰减是对网络中的所有参数，包括权值w和偏置b同时进行惩罚。很多时候如果对b 进行L2正则化将会导致严重的欠拟合，因此这个时候一般只需要对权值w进行正则即可。（PS：这个我真不确定，源码解析是 weight decay (L2 penalty) ，但有些网友说这种方法会对参数偏置b也进行惩罚，可解惑的网友给个明确的答复）

（2）缺点：torch.optim的优化器只能实现L2正则化，不能实现L1正则化。

（3）根据正则化的公式，加入正则化后，loss会变原来大，比如weight_decay=1的loss为10，那么weight_decay=100时，loss输出应该也提高100倍左右。而采用torch.optim的优化器的方法，如果你依然采用loss_fun= nn.CrossEntropyLoss()进行计算loss，你会发现，不管你怎么改变weight_decay的大小，loss会跟之前没有加正则化的大小差不多。这是因为你的loss_fun损失函数没有把权重W的损失加上。

为了解决torch.optim优化器只能实现L2正则化以及惩罚网络中的所有参数的缺陷，这里实现类似于TensorFlow正则化的方法。自定义正则化类：

class Regularization(torch.nn.Module):
    def __init__(self,model,weight_decay,p=2):
        '''
        :param model 模型
        :param weight_decay:正则化参数
        :param p: 范数计算中的幂指数值，默认求2范数,
                  当p=0为L2正则化,p=1为L1正则化
        '''
        super(Regularization, self).__init__()
        if weight_decay <= 0:
            print("param weight_decay can not <=0")
            exit(0)
        self.model=model
        self.weight_decay=weight_decay
        self.p=p
        self.weight_list=self.get_weight(model)
        self.weight_info(self.weight_list)
 
    def to(self,device):
        '''
        指定运行模式
        :param device: cude or cpu
        :return:
        '''
        self.device=device
        super().to(device)
        return self
 
    def forward(self, model):
        self.weight_list=self.get_weight(model)#获得最新的权重
        reg_loss = self.regularization_loss(self.weight_list, self.weight_decay, p=self.p)
        return reg_loss
 
    def get_weight(self,model):
        '''
        获得模型的权重列表
        :param model:
        :return:
        '''
        weight_list = []
        for name, param in model.named_parameters():
            if 'weight' in name:
                weight = (name, param)
                weight_list.append(weight)
        return weight_list
 
    def regularization_loss(self,weight_list, weight_decay, p=2):
        '''
        计算张量范数
        :param weight_list:
        :param p: 范数计算中的幂指数值，默认求2范数
        :param weight_decay:
        :return:
        '''
        # weight_decay=Variable(torch.FloatTensor([weight_decay]).to(self.device),requires_grad=True)
        # reg_loss=Variable(torch.FloatTensor([0.]).to(self.device),requires_grad=True)
        # weight_decay=torch.FloatTensor([weight_decay]).to(self.device)
        # reg_loss=torch.FloatTensor([0.]).to(self.device)
        reg_loss=0
        for name, w in weight_list:
            l2_reg = torch.norm(w, p=p)
            reg_loss = reg_loss + l2_reg
 
        reg_loss=weight_decay*reg_loss
        return reg_loss
 
    def weight_info(self,weight_list):
        '''
        打印权重列表信息
        :param weight_list:
        :return:
        '''
        print("---------------regularization weight---------------")
        for name ,w in weight_list:
            print(name)
        print("---------------------------------------------------")

正则化类Regularization使用方法：

# 检查GPU是否可用
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 
print("-----device:{}".format(device))
print("-----Pytorch version:{}".format(torch.__version__))
 
weight_decay=100.0 # 正则化参数
 
model = my_net().to(device)
# 初始化正则化
if weight_decay>0:
   reg_loss=Regularization(model, weight_decay, p=2).to(device)
else:
   print("no regularization")
 
 
criterion= nn.CrossEntropyLoss().to(device) # CrossEntropyLoss=softmax+cross entropy
optimizer = optim.Adam(model.parameters(),lr=learning_rate)#不需要指定参数weight_decay
 
# train
batch_train_data=...
batch_train_label=...
 
out = model(batch_train_data)
 
# loss and regularization
loss = criterion(input=out, target=batch_train_label)
if weight_decay > 0:
   loss = loss + reg_loss(model)
total_loss = loss.item()
 
# backprop
optimizer.zero_grad()#清除当前所有的累积梯度
total_loss.backward()
optimizer.step()

正则化说明：

就整体而言，对比加入正则化和未加入正则化的模型，训练输出的loss和评价指标信息，我们可以发现，加入正则化后，loss下降的速度会变慢，评价指标的上升速度会变慢，加入正则化的模型训练loss和评价指标，表现的比较平滑。并且随着正则化的权重lambda越大，表现的更加平滑。这其实就是正则化的对模型的惩罚作用，通过正则化可以使得模型表现的更加平滑，即通过正则化可以有效解决模型过拟合的问题。

2.Drop out实现

pytorch中有两种方式可以实现dropout

1）.使用nn.Dropout类，先初始化掉该类，然后可以在后面直接调用

import torch.nn as nn


class Exmp(nn.Module):
    def __init__(drop_rate):
        self.dropout = nn.Dropout(drop_rate)
        ...

    def forward():
        ...
        output = self.dropout(input)
        ...

2）.使用torch.nn.functional.dropout函数实现dropout

import torch.nn.functional as F


class Exmp(nn.Module):
    def __init__(drop_rate):
        self.drop_rate = drop_rate
        ...

    def forward():
        ...
        output = F.dropout(input, self.drop_rate)
        ...

上面只是一种示例，在实际使用中第二种更加灵活，可以在不同的层之间使用不同的drop_rate，第一种的好处是可以一次初始化后面每次dropout保持一致。

3.BatchNorm

批标准化通俗来说就是对每一层神经网络进行标准化 (normalize) 处理，具体的原理我再次不做赘述，网上资料很多。

pytorch中BatchNorm有BatchNorm1d、BatchNorm2d、BatchNorm3d三种，根据具体数据选择不同的BatchNorm，BatchNorm层的使用与普通的层使用方法类似。

枫林扬

关注

12
点赞
踩
46

收藏

觉得还不错? 一键收藏
5
评论
深度学习过拟合解决方案（pytorch相关方案实现）

描述最近做项目出现过拟合的情况，具体表现为，使用简单模型的时候需要迭代十几个epoch之后才能达到一个理想的结果，并且之后loss趋于稳定，f1也趋于稳定；后来使用复杂的模型后，两三个epoch后能达到更好的结果但是之后随着loss下降f1值反而下降了。这是一个比较明显的的过拟合现象。解决方案对于深度学习网络的过拟合，一般的解决方案有：1.Early stop在模型训练过程中，...
复制链接

扫一扫