手动添加K折交叉验证
K折交叉验证,就是把训练数据K等分,然后分别将其中的一份拿来作验证集(valid set),以此训练多个模型,最后从多个模型中挑选出性能最好的。代价是从计算的角度讲增加了训练量。
现在神经网络应用广泛,在各个领域中均有应用,虽然网上的开源代码模板很多,也有很多出色的集成框架,但还是避免不了针对一些领域内的具体问题,需要自行编写神经网络算法。本文主要介绍手动编写K折交叉验证的算法,用到的是pytorch框架。本文的数据是序列信号,本代码不需要修改Dataset和dataloader部分的相关代码,主要内容为K折交叉验证的实现以及从多个模型中保存最佳(acc最大)模型。
请根据本代码结合需求自行修改,如有疑问欢迎交流。
首先导入包然后定义一些超参数和接口:
from sklearn.model_selection import KFold
num_folds = 5
kf = KFold(n_splits=num_folds)
best_accuracy = 0.0
best_model = None
fold_losses = []
fold_accuracies = []
训练部分:
#将原先的从dataset中枚举的更改为从切分好的数据集中枚举
for fold, (train_idx, val_idx) in enumerate(kf.split(dataset)):
train_data = torch.utils.data.Subset(dataset, train_idx)
val_data = torch.utils.data.Subset(dataset, val_idx)
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=False)
model = Net(input_dim, hidden_dim, output_dim)
model.cuda()
optimizer = optim.Adam(model.parameters(), lr=0.0001)
criterion = nn.CrossEntropyLoss()
# 将原本的训练代码放进交叉验证的循环里
print(f'Fold {fold + 1}/{num_folds}')
for epoch in range(num_epochs):
train_loss = train(model, train_loader, optimizer, criterion)
print(f'Fold {fold + 1} - Epoch [{epoch + 1}/{num_epochs}] - Train Loss: {train_loss:.4f}')
# 验证
test_loss, accuracy = test(model, val_loader, criterion)
print(f'Fold {fold + 1} - Validation Loss: {test_loss:.4f}, Validation Accuracy: {accuracy:.2f}%')
# 指标
fold_losses.append(test_loss)
fold_accuracies.append(accuracy)
if accuracy > best_accuracy:
best_accuracy = accuracy
best_model = model
# 评估和保存部分
average_loss = sum(fold_losses) / num_folds
average_accuracy = sum(fold_accuracies) / num_folds
print(f'Average Loss: {average_loss:.4f}, Average Accuracy: {average_accuracy:.2f}%')
torch.save(Net, "models/model3.pkl")
运行情况: