模型训练Tricks——EMA指数移动平均
一、一句话总结 EMA指数滑动平均(Exponential Moving Average)就是:Copy一份模型所有权重(记为Weights)的备份(记为EMA_weights,影子参数),训练过程中每次更新权重时同时也对EMA_weights进行滑动平均更新,训练阶段结束后用EMA_weights替换模型权重进行预测。
具体地,EMA的超参decay一般设为接近1的数,从而保证每次EMA_weights的更新都很稳定。每batch更新流程为:
Weights=Weights+LR*Grad(模型正常的梯度下降)
EMA_weights=EMA_weights*decay+(1-decay)*Weights (根据新Weight更新EMA_weights)
需要知道训练阶段无关EMA_weights,它只在测试阶段时导入进行预测。
二、如何保存最优EMA_weights?
keras可以通过回调on_epoch_end在每个epoch结束时,将EMA_weights加载至模型并save_weights,计算验证集Metrics从而决定是否保存。(注意要在下一个epoch开始前将模型参数替换为Weights)
三、Pytorch实现
class EMA():
def __init__(self, model, decay):
self.model = model
self.decay = decay
self.shadow = {}
self.backup = {}
def register(self):
for name, param in self.model.named_parameters():
if param.requires_grad:
self.shadow[name] = param.data.clone()
def update(self):
for name, param in self.model.named_parameters():
if param.requires_grad:
assert name in self.shadow
new_average = (1.0 - self.decay) * param.data + self.decay * self.shadow[name]
self.shadow[name] = new_average.clone()
def apply_shadow(self):
for name, param in self.model.named_parameters():
if param.requires_grad:
assert name in self.shadow
self.backup[name] = param.data
param.data = self.shadow[name]
def restore(self):
for name, param in self.model.named_parameters():
if param.requires_grad:
assert name in self.backup
param.data = self.backup[name]
self.backup = {}
# 初始化
ema = EMA(model, 0.999)
ema.register()
# 训练过程中,更新完参数后,同步update shadow weights
def train():
optimizer.step()
ema.update()
# eval前,apply shadow weights;eval之后,恢复原来模型的参数
def evaluate():
ema.apply_shadow()
# evaluate
ema.restore()