RNN_Captioning
这周网课任务相对较轻,于是Boom!CS231n Assignment3重磅回归!CS231n的课和作业是真的有意思,真的写起来会上瘾,Stanford的课程我真的慕了…
Load Data
进入Assignment3的第一件事就是要下载coco数据集captions相关的部分,作业提供的办法是三个.sh文件来wget Stanford官网上的压缩包,然后…我就看到wget的速度显示为10k/s,预计需要6天下完…后来我不知道代理开成功没,但是下载速度确实提上来一点,终于把数据搞下来了,需要的朋友可以从百度网盘自取↓
链接:https://pan.baidu.com/s/1zKs8pC3NtAMUf70Wu8A7fg
提取码:a0tw
看看里面的图片和标注,还是挺有意思的!
Vanilla RNN
看完图片之后,我们就想coding一下看看到底怎么训练这样一个RNN啦!首先我们需要做的是实现rnn_step_forward和rnn_step_backward两个函数,这个模拟的是RNN一个组成单元,也就是接收上一个隐层ht-1和当前输入xt,产生当前隐层输出ht的过程。
由于计算比较简单,上代码!
def rnn_step_forward(x, prev_h, Wx, Wh, b):
"""
Run the forward pass for a single timestep of a vanilla RNN that uses a tanh
activation function.
The input data has dimension D, the hidden state has dimension H, and we use
a minibatch size of N.
Inputs:
- x: Input data for this timestep, of shape (N, D).
- prev_h: Hidden state from previous timestep, of shape (N, H)
- Wx: Weight matrix for input-to-hidden connections, of shape (D, H)
- Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)
- b: Biases of shape (H,)
Returns a tuple of:
- next_h: Next hidden state, of shape (N, H)
- cache: Tuple of values needed for the backward pass.
"""
next_h, cache = None, None
##############################################################################
# TODO: Implement a single forward step for the vanilla RNN. Store the next #
# hidden state and any values you need for the backward pass in the next_h #
# and cache variables respectively. #
##############################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
next_h = x.dot(Wx) + prev_h.dot(Wh) + b
next_h = np.tanh(next_h)
cache = (x, Wx, prev_h, Wh, next_h)
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
##############################################################################
# END OF YOUR CODE #
##############################################################################
return next_h, cache
def rnn_step_backward(dnext_h, cache):
"""
Backward pass for a single timestep of a vanilla RNN.
Inputs:
- dnext_h: Gradient of loss with respect to next hidden state, of shape (N, H)
- cache: Cache object from the forward pass
Returns a tuple of:
- dx: Gradients of input data, of shape (N, D)
- dprev_h: Gradients of previous hidden state, of shape (N, H)
- dWx: Gradients of input-to-hidden weights, of shape (D, H)
- dWh: Gradients of hidden-to-hidden weights, of shape (H, H)
- db: Gradients of bias vector, of shape (H,)
"""
dx, dprev_h, dWx, dWh, db = None, None, None, None, None
##############################################################################
# TODO: Implement the backward pass for a single step of a vanilla RNN. #
# #
# HINT: For the tanh function, you can compute the local derivative in terms #
# of the output value from tanh. #
##############################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
x, Wx, prev_h, Wh, next_h = cache
N = dnext_h.shape[0]
dy = dnext_h * (1 - next_h * next_h)
dx = dy.dot(Wx.T)
dWx = x.T.dot(dy)
dprev_h = dy.dot(Wh.T)
dWh = prev_h.T.dot(dy)
db = np.ones(N).dot(dy)
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
##############################################################################
# END OF YOUR CODE #
##############################################################################
return dx, dprev_h, dWx, dWh, db
这里值得一提的是tanh函数的求导,令f(x) = tanh(x),f’(x) = (1 + tanh(x)) * (1 - tanh(x))
有了单元内的更新准则,我们就可以搭建RNN网络了,由于每个模块单元都是类似的,所以搭建RNN的思想就是在循环中不断的迭代计算,并且将每一步的结果都保存下来。需要实现的是rnn_forward和rnn_backward部分代码。
def rnn_forward(x, h0, Wx, Wh, b):
"""
Run a vanilla RNN forward on an entire sequence of data. We assume an input
sequence composed of T vectors, each of dimension D. The RNN uses a hidden
size of H, and we work over a minibatch containing N sequences. After running
the RNN forward, we return the hidden states for all timesteps.
Inputs:
- x: Input data for the entire timeseries, of shape (N, T, D).
- h0: Initial hidden state, of shape (N, H)
- Wx: Weight matrix for input-to-hidden connections, of shape (D, H)
- Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)
- b: Biases of shape (H,)
Returns a tuple of:
- h: Hidden states for the entire timeseries, of shape (N, T, H).
- cache: Values needed in the backward pass
"""
h, cache = None, None
##############################################################################
# TODO: Implement forward pass for a vanilla RNN running on a sequence of #
# input data. You should use the rnn_step_forward function that you defined #
# above. You can use a for loop to help compute the forward pass. #
##############################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
cache = []
N, T, D = x.shape
H = h0.shape[1]
x = x.transpose(1, 0, 2)
h = np.zeros((T, N, H))
h_prev = h0
for i in range(T):
h_next, tmp_cache = rnn_step_forward(x[i], h_prev, Wx, Wh, b)
cache.append(tmp_cache)
h[