RNN的原理
a<t>=g1(Waaa(t−1)+Waxa(t)+ba)
a
<
t
>
=
g
1
(
W
a
a
a
(
t
−
1
)
+
W
a
x
a
(
t
)
+
b
a
)
y¯¯¯<t>=g2(Wyaa(t)+by)
y
¯
<
t
>
=
g
2
(
W
y
a
a
(
t
)
+
b
y
)
若g1与g2分别为tanh和softmax,则上式可以改写为:
1. RNN cell forward
代码的实现:
import numpy as np
/*假设有m个样本数据,则:
x_t: shape(n_x, m)
a_prev: shape(n_a, m)
Wax: shape(n_a, n_x)
Waa: shape(n_a, n_a)
Wya: shape(n_y, n_a)
ba: shape(n_a, 1)
by: shape(n_y, 1)
举个例子来说明一下以上参数的意思:样本一共有100条句子,
每条句子20个单词,每个单词由80维向量表示(one-hot),
所以n_x是表示一个词的80维向量,m表示100个样本,n_a,n_y也是表示维度。
所以x_t就表示m个样本中第t个单词的shape
*/
def rnn_cell_forward(x_t, a_prev, parameters):
Wax = parameters["Wax"]
Waa = parameters["Waa"]
Wya = parameters["Wya"]
ba = parameters["ba"]
by = parameters["by"]
a_next = np.tanh(np.dot(Wax, x_t) + np.dot(Waa, a_prev) + ba)
yt_pred = softmax(np.dot(Wya, a_next) + by)
cache = (x_t, parameters, a_prev, a_next)
return a_next, yt_pred, cache
2. RNN forward
RNN网络其实是RNN_cell的重复
代码的实现:
import numpy as np
def rnn_forward(x, a0, parameters):
/*
x: shape(n_x, m, T_x) T_x表示单个样本的长度,比如上文说的100个句子,每个句子20个单词,怎T_x就是20
a0: shape(n_a, m)
Wax: shape(n_a, n_x)
Waa: shape(n_a, n_a)
Wya: shape(n_y, n_a)
ba: shape(n_a, 1)
by: shape(n_y, 1)
a: shape(n_a, m, T_x)
y_pred: shape(n_y, m, T_x)
*/
caches = []
//获取数据shape信息
n_x, m, T_x = x.shape
n_y, n_a = parameters["Wya"].shape
//初始化a与y_pred
a = np.zeros((n_a, m, T_X))
y_pred = np.zeros((n_y, m, T_x))
a_next = a0
//RNN_cell_forward的T_x次重复
for t in rang(T_x):
a_next, yt_pred, cache = RNN_cell_forward(x[:,:,t], a_next, parameters)
a[:,:,t] = a_next
y_pred[:,:,t] = yt_pred
caches.append(cache)
caches = (caches, x)
return a, y_pred, caches
普通RNN的原理与代码已经实现,有说的不好的地方请指正。下一讲我们就开始讲讲LSTM的原理与代码实现。