1.学习率衰减
(1)简介
从经验上看,学习率在一开始要保持大些来保证收敛速度,在收敛到最优点附近时要小些以避免来回震荡。
比较简单的学习率调整可以通过学习率衰减来实现。
(2)常用公式
假设初始学习率为
α
0
\alpha _{0}
α0,第
t
t
t轮的学习率为
α
t
\alpha _{t}
αt,常见的学习率衰减函数有一下几种:
a.逆时衰减
α
t
=
α
0
1
1
+
β
t
\alpha _{t}=\alpha _{0}\frac{1}{1+\beta t}
αt=α01+βt1
其中,
β
\beta
β为衰减率
b.指数衰减
α
t
=
α
0
β
t
\alpha _{t}=\alpha _{0}\beta ^{t}
αt=α0βt
c.余弦衰减
α
t
=
1
2
α
0
(
1
+
cos
(
t
π
T
)
)
\alpha _{t}=\frac{1}{2}\alpha _{0}(1+\cos(\frac{t\pi }{T}))
αt=21α0(1+cos(Ttπ))
其中,
T
T
T为总迭代轮数。
(3)指数衰减学习率的实现
#第一轮,loss=36,grads=12,w=5-12*0.2=2.6
import tensorflow as tf
w=tf.Variable(tf.constant(5,dtype=tf.float32))
epochs = 5
LR_BASE=0.2
LR_DECAY=0.99
for epoch in range(epochs):
with tf.GradientTape() as tape:
loss = tf.square(w+1)
grads = tape.gradient(loss,w)
w.assign_sub(LR_BASE*grads)
LR_BASE*=LR_DECAY
print('After %d epoch,w is %f,loss is %f,lr is %f'%(epoch,w,loss,LR_BASE))
输出:
After 0 epoch,w is 2.600000,loss is 36.000000,lr is 0.198000
After 1 epoch,w is 1.174400,loss is 12.959999,lr is 0.196020
After 2 epoch,w is 0.321948,loss is 4.728015,lr is 0.194060
After 3 epoch,w is -0.191126,loss is 1.747547,lr is 0.192119
After 4 epoch,w is -0.501926,loss is 0.654277,lr is 0.190198