数据增强
数据增强的目的是通过人工方式增加训练数据的多样性,从而提高模型的泛化能力,使其能够在未见过的数据上表现得更好。
在之前baseline中,
train_loader = torch.utils.data.DataLoader(
FFDIDataset(train_label['path'].head(1000), train_label['target'].head(1000),
transforms.Compose([
transforms.Resize((256, 256)),
transforms.RandomHorizontalFlip(),
transforms.RandomVerticalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
), batch_size=40, shuffle=True, num_workers=4, pin_memory=True
)
val_loader = torch.utils.data.DataLoader(
FFDIDataset(val_label['path'].head(1000), val_label['target'].head(1000),
transforms.Compose([
transforms.Resize((256, 256)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
), batch_size=40, shuffle=False, num_workers=4, pin_memory=True
)
- 图像大小调整:使用transforms.Resize((256, 256))将所有图像调整到256x256像素的尺寸,这有助于确保输入数据的一致性。
- 随机水平翻转:transforms.RandomHorizontalFlip()随机地水平翻转图像,这种变换可以模拟物体在不同方向上的观察,从而增强模型的泛化能力。
- 随机垂直翻转:transforms.RandomVerticalFlip()随机地垂直翻转图像,这同样是为了增加数据多样性,让模型能够学习到不同视角下的特征。
- 转换为张量:transforms.ToTensor()将图像数据转换为PyTorch的Tensor格式,这是在深度学习中处理图像数据的常用格式。
- 归一化:transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])对图像进行归一化处理,这里的均值和标准差是根据ImageNet数据集计算得出的,用于将图像像素值标准化,这有助于模型的训练稳定性和收敛速度。
实操
1.基于baseline进行调整训练
在前两次使用baseline训练时获得了0.55、0.96、0.97的分数,在最近一次采用如下设置,
train_loader = torch.utils.data.DataLoader(
FFDIDataset(train_label['path'].head(300000), train_label['target'].head(300000),
transforms.Compose([
transforms.Resize((256, 256)),
transforms.RandomHorizontalFlip(),
transforms.RandomVerticalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
), batch_size=64, shuffle=True, num_workers=4, pin_memory=True
)
val_loader = torch.utils.data.DataLoader(
FFDIDataset(val_label['path'].head(50000), val_label['target'].head(50000),
transforms.Compose([
transforms.Resize((256, 256)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
), batch_size=64, shuffle=False, num_workers=4, pin_memory=True
)
提交得分为0.974,若使用全部数据+增加数据增强方式,得分应该可以可以在0.98以上。
2.简易CNN模型训练
class DeepfakeDetector(nn.Module):
def __init__(self):
super(DeepfakeDetector, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.AdaptiveAvgPool2d((1, 1))
)
self.classifier = nn.Sequential(
nn.Linear(512, 1024),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(1024, 512),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(512, 2)
)
实现二分类,并使用早停技术以免过度浪费资源,
for epoch in range(20):
print(f'Epoch {epoch + 1}/20')
train_loss = train(train_loader, model, criterion, optimizer, epoch)
val_acc = validate(val_loader, model, criterion)
if val_acc > best_acc:
best_acc = val_acc
best_model_wts = model.state_dict().copy()
print(f'Validation Accuracy improved to {best_acc:.4f}')
epochs_since_improvement = 0
else:
print(f'Validation Accuracy did not improve from {best_acc:.4f}')
epochs_since_improvement += 1
if epochs_since_improvement >= patience:
print("Early stopping triggered")
break
因为模型相对架构简单等问题,每一个epoch的准确率更新存在问题,不过基本实现了模型训练及验证的过程,其中添加了部分数据增强、早停等技术,对流程有了进一步的了解。
参考资料:Datawhale AI学习指南