tensorflow实现seq2seq模型细节（4）：tensorflow nmt中的attention（scaled luong 和 normed bahdanau）和optimizer

最新推荐文章于 2021-07-26 18:14:17 发布

outsider0007

最新推荐文章于 2021-07-26 18:14:17 发布

阅读量770

点赞数

分类专栏： ML&DL原理文章标签： tensorflow nmt seq2seq

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/qq_37667364/article/details/90301448

版权

ML&DL原理专栏收录该内容

26 篇文章 4 订阅

订阅专栏

1.attention

Tensorflow的nmt教程中这样提到：

Attention: Bahdanau-style attention often requires bidirectionality on the encoder side to work well; whereas Luong-style attention tends to work well for different settings. For this tutorial code, we recommend using the two improved variants of Luong & Bahdanau-style attentions: scaled_luong & normed bahdanau.

Scaled_luong在tensorflow的体现：

注意到scale=True这个参数就是scaled_luong和luong参数设置的差别！

normed_bahdanau设置了normalize=True

2.optimizer

nmt教程中这样说到：

Optimizer: while Adam can lead to reasonable results for "unfamiliar" architectures, SGD with scheduling will generally lead to better performance if you can train with SGD.

“SGD with scheduling“ 我不太明白是什么意思，

此前在知乎上看到有人说过用adam更耗费显存，这一点似乎是对的，我用sgd时batch_size可以稍微大一些（原谅穷人只能用用笔记本自带的gtx960m）。

我继续看了下nmt的源码里面对sgd的学习率进行了衰减

提供了几种衰减方式：

Luong234 在进行了2/3的训练步骤后，开始对lr每4步衰减一半。

查看nmt提供的标准超参数：

大数据集attention都使用了：normed bahdanau

优化器都使用了sgd，也就是tf.train.GradientDecentOptimizer

初始学习率为 1.0

当然还有其他很多参数

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
2
评论
tensorflow实现seq2seq模型细节（4）：tensorflow nmt中的attention（scaled luong 和 normed bahdanau）和optimizer

1.attentionTensorflow的nmt教程中这样提到：Attention: Bahdanau-style attention often requires bidirectionality on the encoder side to work well; whereas Luong-style attention tends to work well for diffe...
复制链接

扫一扫

专栏目录

评论 2

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。