Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs

最新推荐文章于 2018-10-15 12:30:09 发布

frankzd

最新推荐文章于 2018-10-15 12:30:09 发布

阅读量237

点赞数

分类专栏：深度学习文章标签： RNN

本文链接：https://blog.csdn.net/frankzd/article/details/80006290

版权

深度学习专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs

What is RNN？

RNN的核心思想是利用时序信息。在传统的神经网络中，我们通常假设所有的输入(输出)相互之间都是独立的。但是在很多实际的应用中这是一个非常不好的假设。比如我们要预测一个句子中的下一个单词，我们最好能知道上一个单词是什么。
RNN中的R代表Recurrent，意味着它对每一个单元进行顺序的重复操作，每一次输出都和前面的运算结果相关。另一个理解RNN的方法就是构造“记忆”的概念，RNN拥有的“记忆”可以获取之前计算的信息。理论上，RNN可以利用任意长结果的时序信息，但是在实际应用中，RNN受限于只能获取之前几个块的信息。

$x_t$ is the input at time step t. For example, $x_1$ could be a one-hot vector corresponding to the second word of a sentence.
$s_t$ is the hidden state at time step t. It’s the “memory” of the network. $s_t$ is calculated based on the previous hidden state and the input at the current step: $s_t=f(Ux_t + Ws_{t-1})$ . The function f usually is a nonlinearity such as tanh or ReLU. $s_{-1}$ , which is required to calculate the first hidden state, is typically initialized to all zeroes.
$o_t$ is the output at step t. For example, if we wanted to predict the next word in a sentence it would be a vector of probabilities across our vocabulary. $o_t = \mathrm{softmax}(Vs_t)$ .
我们可以将隐藏状态 $s_t$ 看作是网络的记忆。 $s_t$ 捕捉在前几次网络运算中所包含的信息。每个时刻的输出 $o_t$ 只和该时刻的记忆有关。
和传统的神经网络不同， RNN每一层都共享同样的参数(如前文中的U,V,W)。这表明我们是在重复地执行同样的步骤，只是每个时刻的输入有所不同。这极大地减少了我们运算所需要存储的权值。
上述过程的每个时刻都有一个输出，但根据不同的应用场景，这个输出不是必要的。