......
Q1: Image Captioning with Vanilla RNNs (25 points)
首先是rnn_step_forward,直接按照公式即可:
next_h = np.tanh(x.dot(Wx) + prev_h.dot(Wh) + b) # [N, H] cache = (x, prev_h, Wx, Wh, b, next_h)
rnn_step_backward,根据tanh的求导公式:
可得:
x, prev_h, Wx, Wh, b, next_h = cache dtanh = dnext_h * (1 - next_h * next_h) # [N, H] db = np.sum(dtanh, axis=0) # [H, ] dWh = (prev_h.T).dot(dtanh) # [H, H] dWx = (x.T).dot(dtanh) # [D, H] dprev_h = dtanh.dot(Wh.T) # [N, H] dx = dtanh.dot(Wx.T) # [N, D]
rnn_forward中调用rnn_step_forward,这个函数的执行过程如下图(源自课件):
代码:
N, T, D = x.shape H = h0.shape[1] h = np.zeros((N, T, H)) prev_h = h0 for i in range(T): next_h, _ = rnn_step_forward(x[:, i, :], prev_h, Wx, Wh, b) prev_h = next_h h[:, i, :] = prev_h cache = (x, h0, Wh, Wx, b, h)
rnn_backward中注意dh的shape为(N, T, H),也就是其汇总了上图中每个h输出后返回的梯度,经过之前几个assignment的折磨后现在写起来很简单了。。。:
x, h0, Wh, Wx, b, h = cache N, T, D = x.shape dprev_h = np.zeros_like(h0) dx = np.zeros_like(x) dWx = np.zeros_like(Wx) dWh = np.zeros_like(Wh) db = np.zeros_like(b) for i in range(T): if i == T-1: prev_h = h0 else: prev_h = h[:, T-i-2, :] next_h = h[:, T-i-1, :] cache2 = (x[:, T-i-1, :], prev_h, Wx, Wh, b, next_h) dnext_h = dh[:, T - i - 1, :] + dprev_h dx1, dprev_h, dWx1, dWh1, </