上一讲我们讲到了普通RNN的原理与代码实现,但是普通RNN存在一个缺点就是梯度消失的问题,并且不能解决长期依赖问题。LSTM通过改变RNN的隐藏层,解决了梯度消失问题,并且实现了网络信息的深层次连接。
1. LSTM_cell_forward
LSTM只是改变了RNN的隐藏层,各个gate的输出是0到1,也就是他们可以选择是否保留之前的信息。1就是保存,0就是舍弃。
具体代码实现如下:
import numpy as np
def lstm_cell_forward(x_t, a_prev, c_prev, parameters):
/*
x_t: shape(n_x, m)
a_prev: shape(n_a, m)
c_prev: shape(n_a, m)
Wf -- Weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x)
bf -- Bias of the forget gate, numpy array of shape (n_a, 1)
Wi -- Weight matrix of the update gate, numpy array of shape (n_a, n_a + n_x)
bi -- Bias of the update gate, numpy array of shape (n_a, 1)
Wc -- Weight matrix of the first "tanh", numpy array of shape (n_a, n_a + n_x)
bc -- Bias of the first "tanh", numpy array of shape (n_a, 1)
Wo -- Weight matrix of the output gate, numpy array of shape (n_a, n_a + n_x)
bo -- Bias of the output gate, numpy array of shape (n_a, 1)
Wy -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
a_next -- next hidden state, of shape (n_a, m)
c_next -- next memory state, of shape (n_a, m)
yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)
cache -- tuple of values needed for the backward pass, contains (a_next, c_next, a_prev, c_prev, xt, parameters)
*/
Wf = parameters["Wf"]
bf = parameters["bf"]
Wi = parameters["Wi"]
bi = parameters["bi"]
Wc = parameters["Wc"]
bc = parameters["bc"]
Wo = parameters["Wo"]
bo = parameters["bo"]
Wy = parameters["Wy"]
by = parameters["by"]
n_x, m = xt.shape
n_y, n_a = Wy.shape
//Concatenate a_prev and xt
concat = np.zeros((n_a + n_x, m))
concat[: n_a, :] = a_prev
concat[n_a :, :] = x_t
ft = sigmoid(np.dot(Wf, concat) + bf)
it = sigmoid(np.numpy(Wi, concat) + bi)
cct = np.tanh(np.dot(Wc, concat) + bc)
c_next = ft * c_prev + it * cct
ot = sigmoid(np.dot(Wo, concat) + bo)
a_next = ot * np.tanh(c_next)
yt_pred = softmax(np.dot(Wy, a_next) + by)
cache = (a_next, c_next, a_prev, c_prev, ft, it, cct, ot, xt, parameters)
return a_next, c_next, yt_pred, cache
2. LSTM_forward
import numpy as np
def lstm_forward(x, a0, parameters):
/*
x: shape(n_x, m, T_x)
a0: shape(n_a, m)
Wf: shape(n_a, n_a + n_x)
bf: shape(n_a, 1)
Wi: shape(n_a, n_a + n_x)
bi: shape(n_a, 1)
Wc: shape(n_a, n_a + n_x)
bc: shape(n_a, 1)
Wo: shape(n_a, n_a + n_x)
bo: shape(n_a, 1)
Wy: shape(n_y, n_a)
by: shape(n_y, 1)
a: shape(n_a, m, T_x)
y: shape(n_y, m, T_x)
caches: shape(caches, X)
*/
caches = []
n_x, m, T_x = x.shape
n_y, n_a = parameters["Wy"].shape
a = np.zeros((n_a, m, T_x))
y = np.zeros((n_y, m, T_x))
c = np.zeros((n_a, m, T_x))
a_next = a0
c_next = np.zeros((a_next.shape))
for t in rang(T_x):
a_prev =
a_next, c_next, yt_pred, cache = lstm_cell_forward(x[:,:,t], a_next, c_next, parameters)
a[:,:,t] = a_next
y[:,:,t] = yt_pred
c[:,:,t] = c_next
caches.append(cache)
caches = (caches,x)
return a, y, c, caches
lstm_forward其实是lstm_cell_forward的不断重复过程,知道T_x的长度为止。LSTM比起GRU来说,更能解决长度依赖的问题,同时解决了梯度消失问题。但是目前GRU也比较流行,主要是因为它原理简单,而且性能不比LSTM差多少。其实GRU是比LSTM晚出现的。