Tensorflow中Eager下使用 tf.train 的 learning rate decay（学习率衰减）

最新推荐文章于 2023-02-14 11:37:36 发布

老羴羊不膻

最新推荐文章于 2023-02-14 11:37:36 发布

阅读量1k

点赞数

分类专栏：机器学习文章标签： Tensorflow 机器学习学习率优化 Eager

本文链接：https://blog.csdn.net/weixin_38909544/article/details/103580858

版权

机器学习专栏收录该内容

4 篇文章 0 订阅

订阅专栏

Learning rate decay 函数介绍

### 当前所述内容Tensorflow版本为 r1.15 ###

对于特定问题，神经网络模型的常用训练优化器可能也无法都适用。通过设定学习率衰减曲线，即可以特定手段克服局部最优的问题。

Tensorflow 的 train 模块当中有很多关于学习率衰减的函数可以用：

tf.train.cosine_decay（余弦）
tf.train.exponential_decay（指数）
tf.train.inverse_time_decay（逆时序）
tf.train.linear_cosine_decay（线性余弦）
tf.train.natural_exp_decay（自然指数）
tf.train.noisy_linear_cosine_decay（噪声线性余弦）
tf.train.piecewise_constant_decay（分段常数）
tf.train.polynominal_decay（多项式）

以上所有的衰减函数都需要三个参数 learning_rate, global_step, 以及 decay_steps 来参与计算 decayed_learning_rate:

即随着训练进展，global_step在不断的变化（通常为增加），以global_step/decay_steps的速率来衰减。

例如：tf.train.exponential_decay的使用和可视化（by 梦沁清风）

Eager模式下使用learning_rate_decay的坑

1. 在Eager模式下所有learning_rate函数的返回值为一个通过functools.partial装饰器调用的函数

tensorflow.python.training.learning_rate_decay
@compatibility(eager)
When eager execution is enabled, this function returns a function which in
turn returns the decayed learning rate Tensor. This can be useful for changing
the learning rate value across different invocations of optimizer functions.
当Eager Execution开启的时候，该函数会返回一个函数，这个函数最后会返回被衰减之后的学习率Tensors，以便于不同的优化器函数对学习率值的修改。

模仿上例并打印learning_rate发现其返回值为一个函数，再其后加括号才返回的是learning_rate值

from __future__ import absolute_import, division, print_function, unicode_literals
# Import TensorFlow >= 1.10 and enable eager execution
import tensorflow.python as tf  # change "import tensorflow" to " import tensorflow.python"
tf.enable_eager_execution()

glb_step = tf.Variable(0, trainable=False)
boundaries = [10, 20]
values = [0.01, 0.001, 0.0001]
learning_rate = tf.train.piecewise_constant(glb_step, boundaries, values)
optimizer = tf.train.AdamOptimizer(learning_rate) 
print("learning_rate shape:%s" % learning_rate)
print("learning_rate value:%s" % learning_rate())

打印结果：

learning_rate shape:
functools.partial(<tensorflow.python.keras.optimizer_v2.learning_rate_schedule.PiecewiseConstantDecay object at 0x000002C42079F320>, [<tf.Variable 'Variable:0' shape=() dtype=int32, numpy=0>])

learning_rate value:
tf.Tensor(0.01, shape=(), dtype=float32)

2. 传入函数之后global_step不改变（导致learning_rate不改变）

先祭出源码：

```python
Example: decay every 100000 steps with a base of 0.96:
```
...
global_step = tf.Variable(0, trainable=False)
starter_learning_rate = 0.1
learning_rate = tf.compat.v1.train.exponential_decay(starter_learning_rate,global_step,100000, 0.96, staircase=True)
# Passing global_step to minimize() will increment it at each step.
learning_step = (tf.compat.v1.train.GradientDescentOptimizer(learning_rate).minimize(...my loss..., global_step=global_step))

最重要的一点是最后一句，在Optimizer的修正优化函数当中包含global_step = glb_step

脑洞记录

之前还尝试过使用list传入，手工增加的方法，手工print的时候是正确的

glb_step = [tf.Variable(0, trainable=False)]
...
for i in range(15):
    glb_step[0] = glb_step[0] + 1
    t = learning_rate()
    print('%s'%t)

打印结果：

tf.Tensor(0.01, shape=(), dtype=float32)
tf.Tensor(0.01, shape=(), dtype=float32)
tf.Tensor(0.01, shape=(), dtype=float32)
tf.Tensor(0.01, shape=(), dtype=float32)
tf.Tensor(0.01, shape=(), dtype=float32)
tf.Tensor(0.01, shape=(), dtype=float32)
tf.Tensor(0.01, shape=(), dtype=float32)
tf.Tensor(0.01, shape=(), dtype=float32)
tf.Tensor(0.01, shape=(), dtype=float32)
tf.Tensor(0.01, shape=(), dtype=float32)
# 在此处 boundary 10 后变换
tf.Tensor(0.001, shape=(), dtype=float32)
tf.Tensor(0.001, shape=(), dtype=float32)
tf.Tensor(0.001, shape=(), dtype=float32)
tf.Tensor(0.001, shape=(), dtype=float32)
tf.Tensor(0.001, shape=(), dtype=float32)

结果在传输给optimizer时，会出现list不存在dtype这个特征的Error情况：

for ...:
    ...
    tf.train.AdamOptimizer(learning_rate).apply_gradient(...my gradient...)
    glb_step = glb_step + 1
    ...
    pass

Error提示：

Traceback (most recent call last):
  File "C:\Users\Admin\.conda\envs\py_3.5_tfv1\lib\site-packages\tensorflow_core\python\training\optimizer.py", line 629, in apply_gradients
    apply_updates = state_ops.assign_add(global_step, 1, name=name)
  File "C:\Users\Admin\.conda\envs\py_3.5_tfv1\lib\site-packages\tensorflow_core\python\ops\state_ops.py", line 192, in assign_add
    if ref.dtype._is_ref_dtype:
AttributeError: 'list' object has no attribute 'dtype'

按源码修改之后，通过调试进入tensorflow.python.keras.optimizer_v2.learning_rate_schedule

def __call__(self, step):
    with ops.name_scope_v2(self.name or "PiecewiseConstant"):
      boundaries = ops.convert_n_to_tensor(self.boundaries)
      values = ops.convert_n_to_tensor(self.values)
      x_recomp = ops.convert_to_tensor(step)

发现x_recomp值在稳步增加。

结语

认真看源码，不要自己瞎胡搞...

老羴羊不膻

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Tensorflow中Eager下使用 tf.train 的 learning rate decay（学习率衰减）

Learning rate decay 函数介绍### 当前所述内容Tensorflow版本为 r1.15 ###对于特定问题，神经网络模型的常用训练优化器可能也无法都适用。通过设定学习率衰减曲线，即可以特定手段克服局部最优的问题。Tensorflow 的 train 模块当中有很多关于学习率衰减的函数可以用：tf.train.cosine_decay（余弦） tf.train....
复制链接

扫一扫