神经网络训练防止过拟合和欠拟合的方法
1 过拟合的概念
如上图所示,模型在训练时表现较好,在验证或测试时,表现较差,即表示过拟合。
过拟合的本质是模型对训练样本过度学习,反而失去泛化能力,当发现过拟合时,一般说明模型的拟合能力没问题,但是泛化能力需要提高。
2 欠拟合的概念
如上图所示,模型在训练时准确率不到30%就开始饱和了,拟合效果很差,在测试时拟合效果同样更差,这种情况称为欠拟合。
3 防止过拟合和欠拟合的方法
- 首先开发一个过拟合的模型,可以采取以下措施:
1.增加模型深度,添加更多层
2.让每一层变得更大,增加每一层的通道数或者神经元个数
3.训练更多轮数
- 当出现过拟合时,可以采取以下措施:
1.减少神经元个数,如采用Dropout,随机丢弃一些神经元,不一般在激活函数前丢弃一些神经元:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 16, 3)
self.conv2 = nn.Conv2d(16, 32, 3)
self.conv3 = nn.Conv2d(32, 64, 3)
self.fc1 = nn.Linear(64*10*10, 1024)
self.fc2 = nn.Linear(1024, 4)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2)
x = F.relu(self.conv3(x))
x = F.max_pool2d(x, 2)
x = x.view(-1, 64*10*10)
x = F.dropout(x,0.5) # 默认比例0.5
x = F.relu(self.fc1(x))
x = F.dropout(x,0.2)
x = self.fc2(x)
return x
2.批归一化,有助于数据分布均匀分布,设置批归一化后学习率可以加大,对参数初始值不敏感,加快训练速度,使网络更加稳定,类似于Dropout,标准化有3个:
nn.BatchNorm1d:适用于2D或3D输入,如一维卷积层和全连接层
nn.BatchNorm2d:适用于4D输入,如图片数据4维度(batch,channel,height,width),卷积层
nn.BatchNorm3d:适用于5D输入,如视频或图片序列
BN的参数num_features是上一层的输出大小
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 16, 3)
self.bn1 = nn.BatchNorm2d(16)
self.conv2 = nn.Conv2d(16, 32, 3)
self.bn2 = nn.BatchNorm2d(32)
self.conv3 = nn.Conv2d(32, 64, 3)
self.bn3 = nn.BatchNorm2d(64)
self.fc1 = nn.Linear(64*10*10, 1024)
self.fc2 = nn.Linear(1024, 4)
def forward(self, x):
x = F.relu(self.conv1(x))
x = self.bn1(x)
x = F.max_pool2d(x, 2)
x = F.relu(self.conv2(x))
x = self.bn2(x)
x = F.max_pool2d(x, 2)
x = F.relu(self.conv3(x))
x = self.bn3(x)
x = F.max_pool2d(x, 2)
x = x.view(-1, 64*10*10)
x = F.dropout(x)
x = F.relu(self.fc1(x))
x = F.dropout(x)
x = self.fc2(x)
return x
- 当模型不再过拟合时,再次调节超参数:
1.学习速率。 如果模型训练时,准确率或者损失变化缓慢,则需要调整学习率,越小的学习率,模型的准确率或者损失变化越慢
如下采用指数衰减策略:
epochs = 50
model = Net().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
exp_lr_scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer,gamma=0.98,verbose=True)
def fit(epochs, train_dl, test_dl, model, loss_fn, optimizer, exp_lr_scheduler=None):
train_loss = []
train_acc = []
test_loss = []
test_acc = []
for epoch in range(epochs):
epoch_loss, epoch_acc = train(train_dl, model, loss_fn, optimizer)
epoch_test_loss, epoch_test_acc = test(test_dl, model)
train_loss.append(epoch_loss)
train_acc.append(epoch_acc)
test_loss.append(epoch_test_loss)
test_acc.append(epoch_test_acc)
if exp_lr_scheduler:
exp_lr_scheduler.step() # 学习速率衰减
template = ("epoch:{:2d}, train_loss: {:.5f}, train_acc: {:.1f}% ,"
"test_loss: {:.5f}, test_acc: {:.1f}%")
print(template.format(
epoch, epoch_loss, epoch_acc*100, epoch_test_loss, epoch_test_acc*100))
print("Done!")
return train_loss, test_loss, train_acc, test_acc
train_loss, test_loss, train_acc, test_acc= fit(epochs, train_dl, test_dl, model, loss_fn, optimizer, exp_lr_scheduler)
2.网络深度
3.隐藏层单元数,神经元个数或者卷积层通道数
4.训练轮数
5.增加训练样本,提高模型泛化能力
6.调节其他参数