CS231n Assignment3 Q1心得笔记

最新推荐文章于 2022-08-20 01:29:11 发布

euphoriakis

最新推荐文章于 2022-08-20 01:29:11 发布

阅读量902

点赞数 7

分类专栏： CS231n 文章标签：深度学习神经网络 pytorch 自然语言处理

本文链接：https://blog.csdn.net/weixin_42214778/article/details/105182423

版权

本文记录了CS231n课程作业第三部分的实践心得，涉及RNN的实现，包括Vanilla RNN、词嵌入和图像描述。在训练过程中，通过word embedding将单词转化为向量输入RNN。文章探讨了字符级RNN的优缺点，并分享了训练RNN生成图像描述时遇到的问题及过拟合现象。

摘要由CSDN通过智能技术生成

这周网课任务相对较轻，于是Boom！CS231n Assignment3重磅回归！CS231n的课和作业是真的有意思，真的写起来会上瘾，Stanford的课程我真的慕了…

Load Data

进入Assignment3的第一件事就是要下载coco数据集captions相关的部分，作业提供的办法是三个.sh文件来wget Stanford官网上的压缩包，然后…我就看到wget的速度显示为10k/s，预计需要6天下完…后来我不知道代理开成功没，但是下载速度确实提上来一点，终于把数据搞下来了，需要的朋友可以从百度网盘自取↓

链接：https://pan.baidu.com/s/1zKs8pC3NtAMUf70Wu8A7fg 
提取码：a0tw

看看里面的图片和标注，还是挺有意思的！
在这里插入图片描述

Vanilla RNN

看完图片之后，我们就想coding一下看看到底怎么训练这样一个RNN啦！首先我们需要做的是实现rnn_step_forward和rnn_step_backward两个函数，这个模拟的是RNN一个组成单元，也就是接收上一个隐层h_t-1和当前输入x_t，产生当前隐层输出h_t的过程。
在这里插入图片描述
由于计算比较简单，上代码！

def rnn_step_forward(x, prev_h, Wx, Wh, b):
    """
    Run the forward pass for a single timestep of a vanilla RNN that uses a tanh
    activation function.

    The input data has dimension D, the hidden state has dimension H, and we use
    a minibatch size of N.

    Inputs:
    - x: Input data for this timestep, of shape (N, D).
    - prev_h: Hidden state from previous timestep, of shape (N, H)
    - Wx: Weight matrix for input-to-hidden connections, of shape (D, H)
    - Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)
    - b: Biases of shape (H,)

    Returns a tuple of:
    - next_h: Next hidden state, of shape (N, H)
    - cache: Tuple of values needed for the backward pass.
    """
    next_h, cache = None, None
    ##############################################################################
    # TODO: Implement a single forward step for the vanilla RNN. Store the next  #
    # hidden state and any values you need for the backward pass in the next_h   #
    # and cache variables respectively.                                          #
    ##############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    next_h = x.dot(Wx) + prev_h.dot(Wh) + b
    next_h = np.tanh(next_h)
    cache = (x, Wx, prev_h, Wh, next_h)

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ##############################################################################
    #                               END OF YOUR CODE                             #
    ##############################################################################
    return next_h, cache


def rnn_step_backward(dnext_h, cache):
    """
    Backward pass for a single timestep of a vanilla RNN.

    Inputs:
    - dnext_h: Gradient of loss with respect to next hidden state, of shape (N, H)
    - cache: Cache object from the forward pass

    Returns a tuple of:
    - dx: Gradients of input data, of shape (N, D)
    - dprev_h: Gradients of previous hidden state, of shape (N, H)
    - dWx: Gradients of input-to-hidden weights, of shape (D, H)
    - dWh: Gradients of hidden-to-hidden weights, of shape (H, H)
    - db: Gradients of bias vector, of shape (H,)
    """
    dx, dprev_h, dWx, dWh, db = None, None, None, None, None
    ##############################################################################
    # TODO: Implement the backward pass for a single step of a vanilla RNN.      #
    #                                                                            #
    # HINT: For the tanh function, you can compute the local derivative in terms #
    # of the output value from tanh.                                             #
    ##############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    
    x, Wx, prev_h, Wh, next_h = cache
    N = dnext_h.shape[0]
    dy = dnext_h * (1 - next_h * next_h)
    dx = dy.dot(Wx.T)
    dWx = x.T.dot(dy)
    dprev_h = dy.dot(Wh.T)
    dWh = prev_h.T.dot(dy)
    db = np.ones(N).dot(dy)

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ##############################################################################
    #                               END OF YOUR CODE                             #
    ##############################################################################
    return dx, dprev_h, dWx, dWh, db

这里值得一提的是tanh函数的求导，令f(x) = tanh(x)，f’(x) = (1 + tanh(x)) * (1 - tanh(x))

有了单元内的更新准则，我们就可以搭建RNN网络了，由于每个模块单元都是类似的，所以搭建RNN的思想就是在循环中不断的迭代计算，并且将每一步的结果都保存下来。需要实现的是rnn_forward和rnn_backward部分代码。

def rnn_forward(x, h0, Wx, Wh, b):
    """
    Run a vanilla RNN forward on an entire sequence of data. We assume an input
    sequence composed of T vectors, each of dimension D. The RNN uses a hidden
    size of H, and we work over a minibatch containing N sequences. After running
    the RNN forward, we return the hidden states for all timesteps.

    Inputs:
    - x: Input data for the entire timeseries, of shape (N, T, D).
    - h0: Initial hidden state, of shape (N, H)
    - Wx: Weight matrix for input-to-hidden connections, of shape (D, H)
    - Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)
    - b: Biases of shape (H,)

    Returns a tuple of:
    - h: Hidden states for the entire timeseries, of shape (N, T, H).
    - cache: Values needed in the backward pass
    """
    h, cache = None, None
    ##############################################################################
    # TODO: Implement forward pass for a vanilla RNN running on a sequence of    #
    # input data. You should use the rnn_step_forward function that you defined  #
    # above. You can use a for loop to help compute the forward pass.            #
    ##############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    
    cache = []
    N, T, D = x.shape
    H = h0.shape[1]
    x = x.transpose(1, 0, 2)
    h = np.zeros((T, N, H))
    h_prev = h0
    for i in range(T):
        h_next, tmp_cache = rnn_step_forward(x[i], h_prev, Wx, Wh, b)
        cache.append(tmp_cache)
        h[i

最低0.47元/天解锁文章

euphoriakis

关注

7
点赞
踩
1

收藏

觉得还不错? 一键收藏
2
评论
CS231n Assignment3 Q1心得笔记

RNN_CaptioningLoad DataVanilla RNNWord EmbeddingRNN for image captioning这周网课任务相对较轻，于是Boom！CS231n Assignment3重磅回归！CS231n的课和作业是真的有意思，真的写起来会上瘾，Stanford的课程我真的慕了…Load Data进入Assignment3的第一件事就是要下载coco数据集c...
复制链接

扫一扫