keras中默认的方法
lr_reduce = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=10, verbose=1, mode='min', cooldown=0)
model.fit(X_train, Y_train, callbacks=[reduce_lr])
参数说明:
monitor:监听的指标
factor:为衰减系数,new_lr = lr * factor
patience:当监听的指标在多少epoch直不发生变时,减小学习率
verbose:为
参数 | 解释 |
---|---|
monitor | quantity to be monitored. |
factor | factor by which the learning rate will be reduced. new_lr = lr * factor |
patience | number of epochs with no improvement after which learning rate will be reduced. |
verbose | quantity to be monitored. |
mode | one of {auto, min, max}. In min mode, lr will be reduced when the quantity monitored has stopped decreasing; in max mode it will be reduced when the quantity monitored has stopped increasing; in auto mode, the direction is automatically inferred from the name of the monitored quantity. |
min_delta | threshold for measuring the new optimum, to only focus on significant changes. |
cooldown | number of epochs to wait before resuming normal operation after lr has been reduced. |
min_lr | lower bound on the learning rate. |
自定义学习率衰减
def scheduler(epoch):
if epoch < 10:
return 0.001
else:
return 0.001 * tf.math.exp(0.1 * (10 - epoch))
callback = tf.keras.callbacks.LearningRateScheduler(scheduler)
model.fit(data, labels, epochs=100, callbacks=[callback],
validation_data=(val_data, val_labels))
Snapshot Ensembles
引用 使用余弦退火逃离局部最优点——快照集成(Snapshot Ensembles)在Keras上的应用
同样的,我们定义一个余弦退火的callback来调整学习率,并在每一次学习率循环的最低点,保存模型。
class SnapshotEnsemble(Callback):
def __init__(self, n_epochs, n_cycles, lrate_max, verbose=0):
self.epochs = n_epochs
self.cycles = n_cycles
self.lr_max = lrate_max
self.lrates = list()
def cosine_annealing(self, epoch, n_epochs, n_cycles, lrate_max):
epochs_per_cycle = n_epochs // n_cycles
cos_inner = (np.pi * (epoch % epochs_per_cycle)) / (epochs_per_cycle)
return lrate_max/2 * (np.cos(cos_inner) + 1)
def on_epoch_begin(self, epoch, logs={}):
lr = self.cosine_annealing(epoch, self.epochs, self.cycles, self.lr_max)
print(f'epoch {epoch+1}, lr {lr}')
K.set_value(self.model.optimizer.lr, lr)
self.lrates.append(lr)
def on_epoch_end(self, epoch, logs={}):
epochs_per_cycle = n_epochs // n_cycles
if epoch != 0 and (epoch + 1) % epochs_per_cycle == 0:
filename = f"snapshot_model_{int((epoch+1) / epochs_per_cycle)}.h5"
self.model.save(filename)
print(f'>saved snapshot {filename}, epoch {epoch}')
epoch和batch_size都和baseline模型保持一致。为了保证每个快照模型能比较充分的训练,每个快照模型训练的epoch为20,所以总共有3次循环,也就是3个模型。
model2 = my_model()
model2.compile('sgd', loss='categorical_crossentropy', metrics=['accuracy'])
n_epochs = 60
n_cycles = n_epochs / 20
ca = SnapshotEnsemble(n_epochs, n_cycles, 0.1)
hist2 = model2.fit(trainX, trainY_cat, validation_data=(testX, testY_cat), epochs=n_epochs, batch_size = batch_size, callbacks=[ca])