Learning rate decay 函数介绍
### 当前所述内容Tensorflow版本为 r1.15 ###
对于特定问题,神经网络模型的常用训练优化器可能也无法都适用。通过设定学习率衰减曲线,即可以特定手段克服局部最优的问题。
Tensorflow 的 train 模块当中有很多关于学习率衰减的函数可以用:
- tf.train.cosine_decay(余弦)
- tf.train.exponential_decay(指数)
- tf.train.inverse_time_decay(逆时序)
- tf.train.linear_cosine_decay(线性余弦)
- tf.train.natural_exp_decay(自然指数)
- tf.train.noisy_linear_cosine_decay(噪声线性余弦)
- tf.train.piecewise_constant_decay(分段常数)
- tf.train.polynominal_decay(多项式)
以上所有的衰减函数都需要三个参数 learning_rate, global_step, 以及 decay_steps 来参与计算 decayed_learning_rate:
即随着训练进展,global_step在不断的变化(通常为增加),以global_step/decay_steps的速率来衰减。
例如:tf.train.exponential_decay的使用和可视化(by 梦沁清风)
Eager模式下使用learning_rate_decay的坑
1. 在Eager模式下所有learning_rate函数的返回值为一个通过functools.partial装饰器调用的函数
tensorflow.python.training.learning_rate_decay
@compatibility(eager) When eager execution is enabled, this function returns a function which in turn returns the decayed learning rate Tensor. This can be useful for changing the learning rate value across different invocations of optimizer functions.
当Eager Execution开启的时候,该函数会返回一个函数,这个函数最后会返回被衰减之后的学习率Tensors,以便于不同的优化器函数对学习率值的修改。
模仿上例并打印learning_rate发现其返回值为一个函数,再其后加括号才返回的是learning_rate值
from __future__ import absolute_import, division, print_function, unicode_literals
# Import TensorFlow >= 1.10 and enable eager execution
import tensorflow.python as tf # change "import tensorflow" to " import tensorflow.python"
tf.enable_eager_execution()
glb_step = tf.Variable(0, trainable=False)
boundaries = [10, 20]
values = [0.01, 0.001, 0.0001]
learning_rate = tf.train.piecewise_constant(glb_step, boundaries, values)
optimizer = tf.train.AdamOptimizer(learning_rate)
print("learning_rate shape:%s" % learning_rate)
print("learning_rate value:%s" % learning_rate())
打印结果:
learning_rate shape:
functools.partial(<tensorflow.python.keras.optimizer_v2.learning_rate_schedule.PiecewiseConstantDecay object at 0x000002C42079F320>, [<tf.Variable 'Variable:0' shape=() dtype=int32, numpy=0>])
learning_rate value:
tf.Tensor(0.01, shape=(), dtype=float32)
2. 传入函数之后global_step不改变(导致learning_rate不改变)
先祭出源码:
```python
Example: decay every 100000 steps with a base of 0.96:
```
...
global_step = tf.Variable(0, trainable=False)
starter_learning_rate = 0.1
learning_rate = tf.compat.v1.train.exponential_decay(starter_learning_rate,global_step,100000, 0.96, staircase=True)
# Passing global_step to minimize() will increment it at each step.
learning_step = (tf.compat.v1.train.GradientDescentOptimizer(learning_rate).minimize(...my loss..., global_step=global_step))
最重要的一点是最后一句,在Optimizer的修正优化函数当中包含global_step = glb_step
脑洞记录
之前还尝试过使用list传入,手工增加的方法,手工print的时候是正确的
glb_step = [tf.Variable(0, trainable=False)]
...
for i in range(15):
glb_step[0] = glb_step[0] + 1
t = learning_rate()
print('%s'%t)
打印结果:
tf.Tensor(0.01, shape=(), dtype=float32)
tf.Tensor(0.01, shape=(), dtype=float32)
tf.Tensor(0.01, shape=(), dtype=float32)
tf.Tensor(0.01, shape=(), dtype=float32)
tf.Tensor(0.01, shape=(), dtype=float32)
tf.Tensor(0.01, shape=(), dtype=float32)
tf.Tensor(0.01, shape=(), dtype=float32)
tf.Tensor(0.01, shape=(), dtype=float32)
tf.Tensor(0.01, shape=(), dtype=float32)
tf.Tensor(0.01, shape=(), dtype=float32)
# 在此处 boundary 10 后变换
tf.Tensor(0.001, shape=(), dtype=float32)
tf.Tensor(0.001, shape=(), dtype=float32)
tf.Tensor(0.001, shape=(), dtype=float32)
tf.Tensor(0.001, shape=(), dtype=float32)
tf.Tensor(0.001, shape=(), dtype=float32)
结果在传输给optimizer时,会出现list不存在dtype这个特征的Error情况:
for ...:
...
tf.train.AdamOptimizer(learning_rate).apply_gradient(...my gradient...)
glb_step = glb_step + 1
...
pass
Error提示:
Traceback (most recent call last):
File "C:\Users\Admin\.conda\envs\py_3.5_tfv1\lib\site-packages\tensorflow_core\python\training\optimizer.py", line 629, in apply_gradients
apply_updates = state_ops.assign_add(global_step, 1, name=name)
File "C:\Users\Admin\.conda\envs\py_3.5_tfv1\lib\site-packages\tensorflow_core\python\ops\state_ops.py", line 192, in assign_add
if ref.dtype._is_ref_dtype:
AttributeError: 'list' object has no attribute 'dtype'
按源码修改之后,通过调试进入tensorflow.python.keras.optimizer_v2.learning_rate_schedule
def __call__(self, step):
with ops.name_scope_v2(self.name or "PiecewiseConstant"):
boundaries = ops.convert_n_to_tensor(self.boundaries)
values = ops.convert_n_to_tensor(self.values)
x_recomp = ops.convert_to_tensor(step)
发现x_recomp值在稳步增加。
结语
认真看源码,不要自己瞎胡搞...