最近做对比试验的时候,复现别人很简单的网络结构来对比,但发现它们不能训练,可能存在的主要原因有:
1.学习率设置不当,调整学习率大小,并且检查是否有学习率改变策略
2.可能忘记梯度回传了,检查是否写了loss.back()
3.检查学习率调整策略的函数位置是否放置正确,学习率调整策略的函数要放置在epoch这个for循环下,而不是放置在train那个for循环里,否则学习率会每隔几个batch就衰减一次,很快就接近0,不能训练
for epoch in range(args.start_epoch, args.epochs):
adjust_learning_rate(optimizer, epoch) #放置在epoch这个for循环下
# train for one epoch
train(train_loader, model, criterion, optimizer, epoch)
# evaluate on validation set
prec1 = validate(val_loader, model, criterion, epoch)
4.这是我复现别人网络不能训练的原因:没有batchnorm。古早网络都没有加batchnorm,还有一些非cv领域(比如通信),他们设计的神经网络都比较简单,没有batchnorm。
batchnorm的作用有:
1.加快网络的训练和收敛速度
2.控制梯度爆炸防止梯度消失
3.防止过拟合
原论文中说的网络:
class scnn(nn.Module):
def __init__(self):
super(scnn, self).__init__()
self.conv1 = nn.Conv2d(1, 64, kernel_size=3, padding=1, bias=True)
self.conv2 = nn.Conv2d(64, 32, kernel_size=3, padding=1, bias=True)
self.conv3 = nn.Conv2d(32, 12, kernel_size=3, padding=1, bias=True)
self.conv4 = nn.Conv2d(12, 8, kernel_size=3, padding=1, bias=True)
self.cls = nn.Sequential(
nn.Flatten(),
nn.Linear(8*8*8, 54,bias=False),
)
def forward(self, x):
out = F.relu(self.conv1(x))
out = F.max_pool2d(out, kernel_size=2, stride=2)
out = F.relu(self.conv2(out))
out = F.max_pool2d(out, kernel_size=2, stride=2)
out = F.relu(self.conv3(out))
out = F.max_pool2d(out, kernel_size=2, stride=2)
out = F.relu(self.conv4(out))
out = self.cls(out)
return out
加上batchnorm后:
class scnn(nn.Module):
def __init__(self):
super(scnn, self).__init__()
self.conv1 = nn.Conv2d(1, 64, kernel_size=3, padding=1, bias=True)
self.conv2 = nn.Conv2d(64, 32, kernel_size=3, padding=1, bias=True)
self.conv3 = nn.Conv2d(32, 12, kernel_size=3, padding=1, bias=True)
self.conv4 = nn.Conv2d(12, 8, kernel_size=3, padding=1, bias=True)
self.cls = nn.Sequential(
nn.Flatten(),
nn.Linear(8*8*8, 54,bias=False),
)
self.bn1 = nn.BatchNorm2d(64)
self.bn2 = nn.BatchNorm2d(32)
self.bn3 = nn.BatchNorm2d(12)
self.bn4 = nn.BatchNorm2d(8)
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = F.max_pool2d(out, kernel_size=2, stride=2)
out = F.relu(self.bn2(self.conv2(out)))
out = F.max_pool2d(out, kernel_size=2, stride=2)
out = F.relu(self.bn3(self.conv3(out)))
out = F.max_pool2d(out, kernel_size=2, stride=2)
out = F.relu(self.bn4(self.conv4(out)))
out = self.cls(out)
return out
加上后就可以跑啦!!!