文章目录
Train-Val-Test 检验Overfitting
使用Val Set防止过拟合,使用Test检验成果。
可以手动划分:
train_db, val_db = torch.utils.data.random_split(train_db,[50000,10000])
K-fold cross-validation
正则化
Occam’s Razor: More things should not be used than are necessary.
L2正则化
optimizer = optim.SGD(net.parameters(), lr = 0.01, weight_decay = 0.01)
weight_decay = 0.01 即设置二范数的系数
L1正则化
必须手动求出
#计算正则项
regularzation_loss = 0
for param in model.parameters():
regularzation_loss += torch.sum(torch.abs(param))
#损失相加
classify_loss = criteon(logits,target)
loss = classify_loss + 0.01 * regularzation_loss
optimizer.zero_grad()
loss.backward()
optimizer.step()
动量
等于多减一个
β
z
k
\beta z^k
βzk,即同时往
β
z
k
\beta z^k
βzk方向更新,其代表上一次梯度方向。
optimizer = optim.SGD(net.parameters(), lr = 0.01, momentum = args.momentum, weight_decay = 0.01)
像Adam优化器已经内置了momentum。
学习率衰减 Learning rate decay
optimizer = optim.SGD(net.parameters(), lr = 0.01, momentum = args.momentum, weight_decay = 0.01)
#声明一个学习率管理器
scheduler = ReduceLROnPlateau(optimizer,'min')
for epoch in epochs:
train()
result_avg, loss_val = validate()
scheduler.step(loss_val) #监视10次不变,则减小学习率
Early Stopping
Dropout
Learning less to learn better
Each connection has
p
=
[
0
,
1
]
p = [0, 1]
p=[0,1] to lose
self.model = nn.Sequential(
nn.Linear(784,200),
nn.Dropout(0.5),
nn.ReLU(inplace=True),
nn.Linear(200, 200),
nn.Dropout(0.5),
nn.ReLU(inplace=True),
nn.Linear(200, 10),
nn.ReLU(inplace=True),
)
其实现的是从L1.out->L2.int之间的一对一的连接的遗忘。
参数越大,遗忘概率越大,与TensorFlow相反