TensorFlow 学习笔记 - 几种 LSTM 对比

最新推荐文章于 2023-05-26 16:27:21 发布

David_Hernandez

最新推荐文章于 2023-05-26 16:27:21 发布

阅读量9.1k

点赞数 4

分类专栏： TensorFlow 学习笔记

本文链接：https://blog.csdn.net/kisslotus/article/details/80199373

版权

TensorFlow 学习笔记专栏收录该内容

8 篇文章 0 订阅

订阅专栏

TensorFlow 学习笔记 - 几种 LSTM 对比

tf.nn.rnn_cell.BasicLSTMCell
tf.nn.static_rnn
tf.nn.static_rnn
tf.nn.dynamic_rnn
tf.contrib.cudnn_rnn
tf.contrib.rnn.LSTMBlockCell
tf.contrib.rnn.LSTMBlockFusedCell
tf.contrib.rnn.BasicLSTMCell variants

1. BasicLSTMCell

tf.nn.rnn_cell.BasicLSTMCell 是一种参考或者标准实现。一般情况下，都不应该是首选。

The tf.nn.rnn_cell.BasicLSTMCell should be considered a reference implementation and used only as a last resort when no other options will work.

2. tf.nn.static_rnn vs tf.nn.dynamic_rnn

如果不是使用一个 RNN layer，而是只使用一个 RNN cell，应该首要选择 tf.nn.dynamic_rnn。好处有：

1. 如果 inputs 过大的话，使用 tf.nn.static_rnn 会增加 graph 的大小，并且有增加编译时间。
2. tf.nn.dynamic_rnn 能够很好地处理长 sequence，它可以从 GPU 和 CPU 中交换内存。

When using one of the cells, rather than the fully fused RNN layers, you have a choice of whether to use tf.nn.static_rnn or tf.nn.dynamic_rnn. There shouldn’t generally be a performance difference at runtime, but large unroll amounts can increase the graph size of the tf.nn.static_rnn and cause long compile times. An additional advantage of tf.nn.dynamic_rnn is that it can optionally swap memory from the GPU to the CPU to enable training of very long sequences. Depending on the model and hardware configuration, this can come at a performance cost. It is also possible to run multiple iterations of tf.nn.dynamic_rnn and the underlying tf.while_loop construct in parallel, although this is rarely useful with RNN models as they are inherently sequential.

3. tf.contrib.cudnn_rnn

1. 如果 NN 限定只在 NVIDIA 的 GPU 上运行，可以考虑使用 tf.contrib.cudnn_rnn，它通常比 tf.contrib.rnn.BasicLSTMCell 和 tf.contrib.rnn.LSTMBlockCell 快一个数量级，并且，  相比于 tf.contrib.rnn.BasicLSTMCell，它使用少三四倍的内存。
2. 如果 NN 需要 layer normalization, 则不应该使用 tf.contrib.cudnn_rnn。

On NVIDIA GPUs, the use of tf.contrib.cudnn_rnn should always be preferred unless you want layer normalization, which it doesn’t support. It is often at least an order of magnitude faster than tf.contrib.rnn.BasicLSTMCell and tf.contrib.rnn.LSTMBlockCell and uses 3-4x less memory than tf.contrib.rnn.BasicLSTMCell.

4. tf.contrib.rnn.LSTMBlockCell

tf.contrib.rnn.LSTMBlockCell 通常在 reinforcement learning 中使用，适用于一个时间步伐运行一次 RNN 的场景。一般会和 tf.while_loop 结合使用，用来与 environment 进行交互。

If you need to run one step of the RNN at a time, as might be the case in reinforcement learning with a recurrent policy, then you should use the tf.contrib.rnn.LSTMBlockCell with your own environment interaction loop inside a tf.while_loop construct. Running one step of the RNN at a time and returning to Python is possible, but it will be slower.

5. tf.contrib.rnn.LSTMBlockFusedCell

在只有 CPU，或者 GPU 机器上无法获得 tf.contrib.cudnn_rnn，或者移动设备上，应该使用 tf.contrib.rnn.LSTMBlockFusedCell。

On CPUs, mobile devices, and if tf.contrib.cudnn_rnn is not available on your GPU, the fastest and most memory efficient option is tf.contrib.rnn.LSTMBlockFusedCell.

6. tf.contrib.rnn.BasicLSTMCell variants

对于 tf.contrib.rnn.BasicLSTMCell 变体，比如 tf.contrib.rnn.NASCell, tf.contrib.rnn.PhasedLSTMCell, tf.contrib.rnn.UGRNNCell, tf.contrib.rnn.GLSTMCell, tf.contrib.rnn.Conv1DLSTMCell, tf.contrib.rnn.Conv2DLSTMCell, tf.contrib.rnn.LayerNormBasicLSTMCell, etc., 它们都具有 tf.contrib.rnn.BasicLSTMCell 的缺点：性能差，高耗内存。
For all of the less common cell types like tf.contrib.rnn.NASCell, tf.contrib.rnn.PhasedLSTMCell, tf.contrib.rnn.UGRNNCell, tf.contrib.rnn.GLSTMCell, tf.contrib.rnn.Conv1DLSTMCell, tf.contrib.rnn.Conv2DLSTMCell, tf.contrib.rnn.LayerNormBasicLSTMCell, etc., one should be aware that they are implemented in the graph like tf.contrib.rnn.BasicLSTMCell and as such will suffer from the same poor performance and high memory usage. One should consider whether or not those trade-offs are worth it before using these cells. For example, while layer normalization can speed up convergence, because cuDNN is 20x faster the fastest wall clock time to convergence is usually obtained without it.