学习率
学习率:表示了每次参数更新的幅度大小。学习率过大,会导致待优化的参数在最小值附近波动,不收敛;学习率过小,会导致待优化的参数收敛缓慢。在训练过程中,参数的更新向着损失函数梯度下降的方向。
- 参数的更新公式为:
wn+1=wn−lr∗∇ w n + 1 = w n − l r ∗ ∇ 例子
lr = 0.2 w = tf.Variable(tf.constant(5, dtype=tf.float32)) loss = tf.square(w+1) train_op = tf.train.GradientDescentOptimizer(lr).minimize(loss) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for i in range(40): sess.run(train_op) w_val = sess.run(w) loss_val = sess.run(loss) print('After %s steps: w is %f, loss is %f.' % (i, w_val, loss_val))
输出:
After 0 steps: w is 2.600000, loss is 12.959999.
After 1 steps: w is 1.160000, loss is 4.665599.
After 2 steps: w is 0.296000, loss is 1.679616.
After 3 steps: w is -0.222400, loss is 0.604662.
After 4 steps: w is -0.533440, loss is 0.217678.
After 5 steps: w is -0.720064, loss is 0.078364.
……
After 35 steps: w is -1.000000, loss is 0.000000.
After 36 steps: w is -1.000000, loss is 0.000000.
After 37 steps: w is -1.000000, loss is 0.000000.
After 38 steps: w is -1.000000, loss is 0.000000.
After 39 steps: w is -1.000000, loss is 0.000000.学习率过大
lr = 1
输出:
After 0 steps: w is -7.000000, loss is 36.000000.
After 1 steps: w is 5.000000, loss is 36.000000.
After 2 steps: w is -7.000000, loss is 36.000000.
After 3 steps: w is 5.000000, loss is 36.000000.
After 4 steps: w is -7.000000, loss is 36.000000.
After 5 steps: w is 5.000000, loss is 36.000000.
…….
After 35 steps: w is 5.000000, loss is 36.000000.
After 36 steps: w is -7.000000, loss is 36.000000.
After 37 steps: w is 5.000000, loss is 36.000000.
After 38 steps: w is -7.000000, loss is 36.000000.
After 39 steps: w is 5.000000, loss is 36.000000.学习率过小
lr = 0.0001
输出:
After 0 steps: w is 4.998800, loss is 35.985600.
After 1 steps: w is 4.997600, loss is 35.971207.
After 2 steps: w is 4.996400, loss is 35.956818.
After 3 steps: w is 4.995201, loss is 35.942436.
After 4 steps: w is 4.994002, loss is 35.928059.
After 5 steps: w is 4.992803, loss is 35.913689.
……
After 35 steps: w is 4.956947, loss is 35.485222.
After 36 steps: w is 4.955756, loss is 35.471027.
After 37 steps: w is 4.954565, loss is 35.456841.
After 38 steps: w is 4.953373, loss is 35.442654.
After 39 steps: w is 4.952183, loss is 35.428478.
- 参数的更新公式为:
指数衰减学习率:学习率随着训练轮数变化而动态更新
- 学习率计算公式如下:
decayed_lr=learning_rate∗decay_rateglobal_stepdecay_steps d e c a y e d _ l r = l e a r n i n g _ r a t e ∗ d e c a y _ r a t e g l o b a l _ s t e p d e c a y _ s t e p s 用 Tensorflow 的函数表示为:
LR_BASE = 0.1 LR_DECAY_RATE = 0.99 LR_DECAY_STEPS = 1 global_step = tf.Variable(0, trainable=False) lr = tf.train.exponential_decay( learning_rate = LR_BASE, #最初学习率 global_step = global_step, decay_steps = LR_DECAY_STEPS, # 喂入多少轮 BATCH_SIZE后,更新一次学习率,一般设为:总样本数/BATCH_SIZE,即代表了完整的使用一遍训练数据所需要的迭代轮数 decay_rate = LR_DECAY_RATE, #学习率衰减率 staircase=True # 或者False # 若 staircase 设置为 True 时,表示 global_step / decay_steps 取整数,学习率阶梯型衰减; # 若 staircase 设置为 False 时,学习率会是一条平滑下降的曲线。 )
例子:
import tensorflow as tf LR_BASE = 0.1 LR_DECAY_RATE = 0.99 LR_DECAY_STEPS = 1 global_step = tf.Variable(0, trainable=False) lr = tf.train.exponential_decay( learning_rate = LR_BASE, #最初学习率 global_step = global_step, decay_steps = LR_DECAY_STEPS, # 喂入多少轮 BATCH_SIZE后,更新一次学习率,一般设为:总样本数/BATCH_SIZE,即代表了完整的使用一遍训练数据所需要的迭代轮数 decay_rate = LR_DECAY_RATE, #学习率衰减率 staircase=True # 或者False # 若 staircase 设置为 True 时,表示 global_step / decay_steps 取整数,学习率阶梯型衰减; # 若 staircase 设置为 False 时,学习率会是一条平滑下降的曲线。 ) w = tf.Variable(tf.constant(5, dtype=tf.float32)) loss = tf.square(w+1) train_op = tf.train.GradientDescentOptimizer(lr).minimize(loss, global_step = global_step) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for i in range(40): sess.run(train_op) global_step_val = sess.run(global_step) lr_val = sess.run(lr) w_val = sess.run(w) loss_val = sess.run(loss) print('After %s steps: global_step is %f, lr is %f, w is %f, loss is %f.' % (i, global_step_val, lr_val, w_val, loss_val))
输出:
After 0 steps: global_step is 1.000000, lr is 0.099000, w is 3.800000, loss is 23.040001.
After 1 steps: global_step is 2.000000, lr is 0.098010, w is 2.849600, loss is 14.819419.
After 2 steps: global_step is 3.000000, lr is 0.097030, w is 2.095001, loss is 9.579033.
After 3 steps: global_step is 4.000000, lr is 0.096060, w is 1.494386, loss is 6.221961.
After 4 steps: global_step is 5.000000, lr is 0.095099, w is 1.015167, loss is 4.060896.
After 5 steps: global_step is 6.000000, lr is 0.094148, w is 0.631886, loss is 2.663051.
……
After 35 steps: global_step is 36.000000, lr is 0.069641, w is -0.992297, loss is 0.000059.
After 36 steps: global_step is 37.000000, lr is 0.068945, w is -0.993369, loss is 0.000044.
After 37 steps: global_step is 38.000000, lr is 0.068255, w is -0.994284, loss is 0.000033.
After 38 steps: global_step is 39.000000, lr is 0.067573, w is -0.995064, loss is 0.000024.由结果可以看出,随着训练轮数增加学习率在不断减小。
- 学习率计算公式如下: