TensorFlow 学习笔记 - 几种 LSTM 对比

TensorFlow 学习笔记 - 几种 LSTM 对比

  • tf.nn.rnn_cell.BasicLSTMCell
  • tf.nn.static_rnn
  • tf.nn.static_rnn
  • tf.nn.dynamic_rnn
  • tf.contrib.cudnn_rnn
  • tf.contrib.rnn.LSTMBlockCell
  • tf.contrib.rnn.LSTMBlockFusedCell
  • tf.contrib.rnn.BasicLSTMCell variants

1. BasicLSTMCell

tf.nn.rnn_cell.BasicLSTMCell 是一种参考或者标准实现。一般情况下,都不应该是首选。

The tf.nn.rnn_cell.BasicLSTMCell should be considered a reference implementation and used only as a last resort when no other options will work.

2. tf.nn.static_rnn vs tf.nn.dynamic_rnn

如果不是使用一个 RNN layer,而是只使用一个 RNN cell,应该首要选择 tf.nn.dynamic_rnn。好处有:

1. 如果 inputs 过大的话,使用 tf.nn.static_rnn 会增加 graph 的大小,并且有增加编译时间。
2. tf.nn.dynamic_rnn 能够很好地处理长 sequence,它可以从 GPU 和 CPU 中交换内存。

When using one of the cells, rather than the fully fused RNN layers, you have a choice of whether to use tf.nn.static_rnn or tf.nn.dynamic_rnn. There shouldn’t generally be a performance difference at runtime, but large unroll amounts can increase the graph size of the tf.nn.static_rnn and cause long compile times. An additional advantage of tf.nn.dynamic_rnn is that it can optionally swap memory from the GPU to the CPU to enable training of very long sequences. Depending on the model and hardware configuration, this can come at a performance cost. It is also possible to run multiple iterations of tf.nn.dynamic_rnn and the underlying tf.while_loop construct in parallel, although this is rarely useful with RNN models as they are inherently sequential.

3. tf.contrib.cudnn_rnn

1. 如果 NN 限定只在 NVIDIA 的 GPU 上运行,可以考虑使用 tf.contrib.cudnn_rnn,它通常比 tf.contrib.rnn.BasicLSTMCell 和 tf.contrib.rnn.LSTMBlockCell 快一个数量级,并且,  相比于 tf.contrib.rnn.BasicLSTMCell,它使用少三四倍的内存。
2. 如果 NN 需要 layer normalization, 则不应该使用 tf.contrib.cudnn_rnn。

On NVIDIA GPUs, the use of tf.contrib.cudnn_rnn should always be preferred unless you want layer normalization, which it doesn’t support. It is often at least an order of magnitude faster than tf.contrib.rnn.BasicLSTMCell and tf.contrib.rnn.LSTMBlockCell and uses 3-4x less memory than tf.contrib.rnn.BasicLSTMCell.

4. tf.contrib.rnn.LSTMBlockCell

tf.contrib.rnn.LSTMBlockCell 通常在 reinforcement learning 中使用, 适用于一个时间步伐运行一次 RNN 的场景。一般会和 tf.while_loop 结合使用,用来与 environment 进行交互。

If you need to run one step of the RNN at a time, as might be the case in reinforcement learning with a recurrent policy, then you should use the tf.contrib.rnn.LSTMBlockCell with your own environment interaction loop inside a tf.while_loop construct. Running one step of the RNN at a time and returning to Python is possible, but it will be slower.

5. tf.contrib.rnn.LSTMBlockFusedCell

在只有 CPU,或者 GPU 机器上无法获得 tf.contrib.cudnn_rnn,或者移动设备上,应该使用 tf.contrib.rnn.LSTMBlockFusedCell。

On CPUs, mobile devices, and if tf.contrib.cudnn_rnn is not available on your GPU, the fastest and most memory efficient option is tf.contrib.rnn.LSTMBlockFusedCell.

6. tf.contrib.rnn.BasicLSTMCell variants

对于 tf.contrib.rnn.BasicLSTMCell 变体,比如 tf.contrib.rnn.NASCell, tf.contrib.rnn.PhasedLSTMCell, tf.contrib.rnn.UGRNNCell, tf.contrib.rnn.GLSTMCell, tf.contrib.rnn.Conv1DLSTMCell, tf.contrib.rnn.Conv2DLSTMCell, tf.contrib.rnn.LayerNormBasicLSTMCell, etc., 它们都具有 tf.contrib.rnn.BasicLSTMCell 的缺点:性能差,高耗内存。
For all of the less common cell types like tf.contrib.rnn.NASCell, tf.contrib.rnn.PhasedLSTMCell, tf.contrib.rnn.UGRNNCell, tf.contrib.rnn.GLSTMCell, tf.contrib.rnn.Conv1DLSTMCell, tf.contrib.rnn.Conv2DLSTMCell, tf.contrib.rnn.LayerNormBasicLSTMCell, etc., one should be aware that they are implemented in the graph like tf.contrib.rnn.BasicLSTMCell and as such will suffer from the same poor performance and high memory usage. One should consider whether or not those trade-offs are worth it before using these cells. For example, while layer normalization can speed up convergence, because cuDNN is 20x faster the fastest wall clock time to convergence is usually obtained without it.

CNN-LSTM是一种结合了卷积神经网络(CNN)和长短期记忆网络(LSTM)的神经网络模型,用于处理序列数据。它在处理时间序列数据时具有很好的性能,但也可以通过一些改进方法进一步提升其性能。 以下是一些CNN-LSTM改进方法的示例: 1. 增加卷积层的深度:通过增加卷积层的数量和深度,可以提取更多的特征信息。可以尝试增加更多的卷积层,并调整卷积核的大小和数量,以适应不同的数据集和任务。 2. 使用更大的卷积核:增加卷积核的大小可以扩大感受野,从而捕捉更长的时间依赖关系。可以尝试使用更大的卷积核,例如5或7,以捕捉更长的时间序列模式。 3. 添加正则化技术:为了防止过拟合,可以在模型中添加正则化技术,如Dropout或L2正则化。这些技术可以减少模型的复杂性,并提高其泛化能力。 4. 使用注意力机制:注意力机制可以帮助模型更好地关注重要的时间步长或特征。可以尝试在CNN-LSTM模型中引入注意力机制,以提高模型对关键时间步长的关注度。 5. 融合其他模型:除了CNN和LSTM之外,还可以尝试将其他类型的模型与CNN-LSTM进行融合,以进一步提升性能。例如,可以将Transformer模型与CNN-LSTM结合,以利用Transformer在处理序列数据方面的优势。 这些是一些常见的CNN-LSTM改进方法,具体的改进方法取决于数据集和任务的特点。可以根据实际情况选择适合的改进方法来提升模型性能。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值