global_step的用法与理解

最新推荐文章于 2024-06-14 22:24:53 发布

青竹aaa

最新推荐文章于 2024-06-14 22:24:53 发布

阅读量4.9k

点赞数 8

分类专栏： TensorFlow 文章标签：深度学习人工智能

本文链接：https://blog.csdn.net/qq_36575363/article/details/112727440

版权

TensorFlow 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

global_step只是跟踪到目前为止看到的批数（global_step记录的是当前训练的迭代次数，个人暂时理解，错了请大佬在评论区纠正）。

global_step在滑动平均、优化器、指数衰减学习率等方面都有用到

global_step的初始化值是0

主要是用在梯度下降中的学习率问题上，用来解决lr过大容易越过最优值造成振荡，lr过小造成收敛太慢并且可能达到局部最优。

具体公式:

 decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)

decay_rate = 衰减系数
decay_steps = 完整使用一遍训练数据所需迭代论数
global_step = 当前迭代的轮数）

用于公式中的learning_rate、decay_rate以及decay_steps都是固定值，
可见decayed_learning_rate只与global_rate的变化有关

老规矩：

 global_step = tf.Variable(0)

# 通过exponential_decay函数生成学习率
learning_rate = tf.train.exponential_decay(0.1, global_step, 100, 0.96, staircase = True)

# 使用指数衰减的学习率。在minimize函数中传入global_step将自动更新
# global_ste参数，从而使得学习率也得到相应更新
learning_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(..my loss.., global_step = global_step)

因为学习率的大小很关键，过大可能造成震荡，过小或导致学习速率过慢，学习时间很长。于是利用指数衰减法来得到一个合适的学习率。最初的学习率比较大，能够快速到达最低点，解决学习速率过慢的缺陷。随着训练步数的增加，学习率呈指数形式衰减，防止因为学习率过大，到达不了最低点。最后趋于平稳，达到一个较稳定的学习状态。

TensorFlow 提供了一个指数衰减函数

tf.train.exponential_decay(learning_rate,gloable_step,decay_steps,decay_rate,staircase:bool,name=None)

他实际上实现了以下代码的功能

decayed_learning_rate = learning_rate * decay_rate ^ (gloable_step / decay_steps)

之前以为 gloable_step 是训练完所有轮需要迭代的次数，但是经常看到把 gloable_step 初始化为 0 的操作

gloable_step = tf.Variable(0)

通过查阅资料发现了 Stack Overflow 上面对 gloable_step 的一段解释

global_step refer to the number of batches seen by the graph. Everytime a batch is provided, the weights are updated in the direction that minimizes the loss. global_step just keeps track of the number of batches seen so far. When it is passed in the minimize() argument list, the variable is increased by one. Have a look at optimizer.minimize().

You can get the global_step value using tf.train.global_step().

The 0 is the initial value of the global step in this context.

原来是在传入了 minimize() 中传入了 gloable_step 参数后，每训练完一个 batch ，gloable_step 就增加 1，他是变化的。那么 gloable_step 就是相当于一个定时器，只不过这个定时器记录的当前迭代的次数，当达到某个值 A 的时候就会执行某个操作。

我们在来看最开始的提到的（gloable_step/decay_step），这个 decay_step 通常代表了完整的使用一遍训练数据所需要的迭代轮数。这个迭代轮数就是总训练样本数除以每个 batch 中的训练样本数。所以这个 decay_step 就是设置为一个特定的值，而 gloabal_step 是变化的。当 staircase=True 的时候，（gloable_step/decay_step）就会被转化为一个整数，于是学习率就是一个阶梯函数。

结合 gloable_step 是当前迭代次数并随着每个 batch 增加的概念，就会很好理解为什么会是一个阶梯函数了。假如设置 decay_steps=100，decay_rate=0.96，也就是每训练 100 轮后学习率乘以 0.96 (这里就对衰减速率有了进一步的理解，当我们设置更大的 decay_step，衰减速率就更慢)。gloable_step 只有增长到 100 的整数倍的时候，（gloable_step/decay_step）才是整数，在 gloable_step 没有达到 100 的整数的时候，（gloable_step/decay_step）小数会一直被转化为一个整数(也就是0)，也即是这个值是不变的，那么学习率也就不会变，表现为随着训练迭代轮数的增加，学习率不变的现象。只有当 gloable_step 增长到 100 的整数倍的时候，学习率才会变化，表现为陡降的现象(垂直下降)，于是就变为了阶梯形，这种场景就是每完整过完一遍训练数据，学习率就减小一次。staircase=False 的时候，是每训练一轮都会导致学习率的更新，因此学习率的变化就会表现为图中红色的曲线。综合来说，staircase=True 的时候，是每 decay_step 轮后更新学习率，更新为 learning_rate = learning_rate* decay_rate^（gloable_step/decay_step）[（gloable_step/decay_step）为1,2,3...的时候更新，也就是指数以1,2,3... 的次序变化]；staircase=False 的时候，是每个bitch 更新一次学习率（每个 bitch 为一轮）更新为learning_rate = learning_rate* decay_rate^（gloable_step/decay_step）[指数以1/100，2/100，3/100...的次序更新]

TensorFlow 中，通过指数衰减函数生成学习率

learning_rate = tf.train.exponential_decay(0.1,gloable_step,100,0.96,staircase=True)

因为指定了 staircase = True，所以每训练 100 轮后学习率乘以 0.96。

在 minimize 函数中传入 gloable_step 将自动更新 gloable_step 参数，从而使学习率得到相应的更新

learning_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,gloable_step=gloable_step)

青竹aaa

关注

8
点赞
踩
20

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录