深度学习的训练中,学习率的初始值对训练的最终结果有着较大的影响,过大或过小的学习率可能使得网络无法收敛,或者收敛很慢,因此需要选择合适的初值,并且要合理的降低学习率;
参考博客:https://www.cnblogs.com/chenzhen0530/p/10632937.html
下面主要介绍如下几种常见的学习率衰减方式:
- 指数衰减
- 分段常数衰减
本文主要是tensorflow中学习率衰减的语法:
1、指数衰减
tf.train.exponential_decay(
learning_rate, # 初始学习率大小
global_step, # 设置记录全局步数
decay_steps, # 衰减周期,若staircase=True时,那么decay_steps内,学习率不变,否则就是指数衰减
decay_rate, # 衰减系数
staircase=False, # 是否为离散型学习率,默认为False
name=None):
计算公式:
decayed_learning_rate = learning_rate * decay_rate ^ (global_step/decay_step)
若staircase=True,则(global_step/decay_step)会向下取整,也即是decay_step内学习率不变
使用实例:
import matplotlib.pyplot as plt
import tensorflow as tf
global_step = tf.Variable(0, name='global_step', trainable=False)
y = []
z = []
EPOCH = 200
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for global_step in range(EPOCH):
# 阶梯型衰减
learing_rate1 = tf.train.exponential_decay(
learning_rate=0.5, global_step=global_step, decay_steps=10, decay_rate=0.9, staircase=True)
# 标准指数型衰减
learing_rate2 = tf.train.exponential_decay(
learning_rate=0.5, global_step=global_step, decay_steps=10, decay_rate=0.9, staircase=False)
lr1 = sess.run([learing_rate1])
lr2 = sess.run([learing_rate2])
y.append(lr1)
z.append(lr2)
x = range(EPOCH)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_ylim([0, 0.55])
plt.plot(x, y, 'r-', linewidth=2)
plt.plot(x, z, 'g-', linewidth=2)
plt.title('exponential_decay')
ax.set_xlabel('step')
ax.set_ylabel('learing rate')
plt.legend(labels = ['staircase', 'continus'], loc = 'upper right')
plt.show()
2、分段常数衰减
tf.train.piecewise_constant(
global_step, # 迭代次数
boundaries, # 表示阶段的分割边界
values, # 分段学习率取值
name=None):
计算方式:
# parameter
global_step = tf.Variable(0, trainable=False)
boundaries = [100, 200]
values = [1.0, 0.5, 0.1]
# learning rate
learning_rate = tf.train.piecewise_constant(global_step, boundaries, values)
# explain
# 当 global_step=[1,100], learning_rate = 1.0;
# 当 global_step=[100,200], learning_rate = 0.5;
当 global_step=[200,~], learning_rate = 0.1;
使用实例:
import matplotlib.pyplot as plt
import tensorflow as tf
global_step = tf.Variable(0, name='global_step', trainable=False)
boundaries = [10, 20, 30]
learing_rates = [0.1, 0.07, 0.025, 0.0125]
y = []
N = 40
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for global_step in range(N):
learing_rate = tf.train.piecewise_constant(global_step, boundaries=boundaries, values=learing_rates)
lr = sess.run([learing_rate])
y.append(lr)
x = range(N)
plt.plot(x, y, 'r-', linewidth=2)
plt.title('piecewise_constant')
plt.show()