神经网络训练防止过拟合和欠拟合的方法

知识推荐号

已于 2023-09-22 19:52:03 修改

阅读量297

点赞数 1

分类专栏：深度学习文章标签：神经网络深度学习人工智能

于 2023-09-19 10:47:48 首次发布

本文链接：https://blog.csdn.net/m0_46256255/article/details/133016300

版权

深度学习专栏收录该内容

25 篇文章 1 订阅

订阅专栏

神经网络训练防止过拟合和欠拟合的方法

1 过拟合的概念
2 欠拟合的概念
3 防止过拟合和欠拟合的方法

1 过拟合的概念

在这里插入图片描述
如上图所示，模型在训练时表现较好，在验证或测试时，表现较差，即表示过拟合。
过拟合的本质是模型对训练样本过度学习，反而失去泛化能力，当发现过拟合时，一般说明模型的拟合能力没问题，但是泛化能力需要提高。

2 欠拟合的概念

在这里插入图片描述
如上图所示，模型在训练时准确率不到30%就开始饱和了，拟合效果很差，在测试时拟合效果同样更差，这种情况称为欠拟合。

3 防止过拟合和欠拟合的方法

首先开发一个过拟合的模型，可以采取以下措施：

1.增加模型深度，添加更多层
2.让每一层变得更大，增加每一层的通道数或者神经元个数
3.训练更多轮数

当出现过拟合时，可以采取以下措施：

1.减少神经元个数，如采用Dropout，随机丢弃一些神经元，不一般在激活函数前丢弃一些神经元：

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 3)
        self.conv2 = nn.Conv2d(16, 32, 3)
        self.conv3 = nn.Conv2d(32, 64, 3)
        self.fc1 = nn.Linear(64*10*10, 1024)
        self.fc2 = nn.Linear(1024, 4)
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2)
        x = F.relu(self.conv3(x))
        x = F.max_pool2d(x, 2)
        x = x.view(-1, 64*10*10)
        x = F.dropout(x,0.5)            # 默认比例0.5
        x = F.relu(self.fc1(x))
        x = F.dropout(x,0.2)
        x = self.fc2(x)
        return x

2.批归一化，有助于数据分布均匀分布，设置批归一化后学习率可以加大，对参数初始值不敏感，加快训练速度，使网络更加稳定，类似于Dropout，标准化有3个：
nn.BatchNorm1d：适用于2D或3D输入，如一维卷积层和全连接层
nn.BatchNorm2d：适用于4D输入，如图片数据4维度（batch，channel，height，width），卷积层
nn.BatchNorm3d：适用于5D输入，如视频或图片序列
BN的参数num_features是上一层的输出大小

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 3)
        self.bn1 = nn.BatchNorm2d(16)
        self.conv2 = nn.Conv2d(16, 32, 3)
        self.bn2 = nn.BatchNorm2d(32)
        self.conv3 = nn.Conv2d(32, 64, 3)
        self.bn3 = nn.BatchNorm2d(64)
        self.fc1 = nn.Linear(64*10*10, 1024)
        self.fc2 = nn.Linear(1024, 4)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.bn1(x)
        x = F.max_pool2d(x, 2)
        x = F.relu(self.conv2(x))
        x = self.bn2(x)
        x = F.max_pool2d(x, 2)
        x = F.relu(self.conv3(x))
        x = self.bn3(x)
        x = F.max_pool2d(x, 2)
        x = x.view(-1, 64*10*10)
        x = F.dropout(x)
        x = F.relu(self.fc1(x))

        x = F.dropout(x)
        x = self.fc2(x)

        return x

当模型不再过拟合时，再次调节超参数：

1.学习速率。如果模型训练时，准确率或者损失变化缓慢，则需要调整学习率，越小的学习率，模型的准确率或者损失变化越慢
如下采用指数衰减策略：

epochs = 50
model = Net().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
exp_lr_scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer,gamma=0.98,verbose=True)

def fit(epochs, train_dl, test_dl, model, loss_fn, optimizer, exp_lr_scheduler=None):
    train_loss = []
    train_acc = []
    test_loss = []
    test_acc = []

    for epoch in range(epochs):
        epoch_loss, epoch_acc = train(train_dl, model, loss_fn, optimizer)
        epoch_test_loss, epoch_test_acc = test(test_dl, model)
        train_loss.append(epoch_loss)
        train_acc.append(epoch_acc)
        test_loss.append(epoch_test_loss)
        test_acc.append(epoch_test_acc)
        if exp_lr_scheduler:
            exp_lr_scheduler.step()       # 学习速率衰减
    
        template = ("epoch:{:2d}, train_loss: {:.5f}, train_acc: {:.1f}% ," 
                    "test_loss: {:.5f}, test_acc: {:.1f}%")
        print(template.format(
              epoch, epoch_loss, epoch_acc*100, epoch_test_loss, epoch_test_acc*100))
        
    print("Done!")
    return train_loss, test_loss, train_acc, test_acc
 
train_loss, test_loss, train_acc, test_acc= fit(epochs, train_dl, test_dl, model, loss_fn, optimizer, exp_lr_scheduler)

2.网络深度
3.隐藏层单元数，神经元个数或者卷积层通道数
4.训练轮数
5.增加训练样本，提高模型泛化能力
6.调节其他参数

知识推荐号

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
神经网络训练防止过拟合和欠拟合的方法

过拟合的本质是模型对训练样本过度学习，反而失去泛化能力，当发现过拟合时，一般说明模型的拟合能力没问题，但是泛化能力需要提高。如果模型训练时，准确率或者损失变化缓慢，则需要调整学习率，越小的学习率，模型的准确率或者损失变化越慢。1.减少神经元个数，如采用Dropout，随机丢弃一些神经元。2.让每一层变得更大，增加每一层的通道数或者神经元个数。3.隐藏层单元数，神经元个数或者卷积层通道数。5.增加训练样本，提高模型泛化能力。1.增加模型深度，添加更多层。
复制链接

扫一扫