CS231n Assignment3 Q1心得笔记

本文记录了CS231n课程作业第三部分的实践心得,涉及RNN的实现,包括Vanilla RNN、词嵌入和图像描述。在训练过程中,通过word embedding将单词转化为向量输入RNN。文章探讨了字符级RNN的优缺点,并分享了训练RNN生成图像描述时遇到的问题及过拟合现象。
摘要由CSDN通过智能技术生成

这周网课任务相对较轻,于是Boom!CS231n Assignment3重磅回归!CS231n的课和作业是真的有意思,真的写起来会上瘾,Stanford的课程我真的慕了…

Load Data

进入Assignment3的第一件事就是要下载coco数据集captions相关的部分,作业提供的办法是三个.sh文件来wget Stanford官网上的压缩包,然后…我就看到wget的速度显示为10k/s,预计需要6天下完…后来我不知道代理开成功没,但是下载速度确实提上来一点,终于把数据搞下来了,需要的朋友可以从百度网盘自取↓

链接:https://pan.baidu.com/s/1zKs8pC3NtAMUf70Wu8A7fg 
提取码:a0tw

看看里面的图片和标注,还是挺有意思的!
在这里插入图片描述
在这里插入图片描述

Vanilla RNN

看完图片之后,我们就想coding一下看看到底怎么训练这样一个RNN啦!首先我们需要做的是实现rnn_step_forward和rnn_step_backward两个函数,这个模拟的是RNN一个组成单元,也就是接收上一个隐层ht-1和当前输入xt,产生当前隐层输出ht的过程。
在这里插入图片描述
由于计算比较简单,上代码!

def rnn_step_forward(x, prev_h, Wx, Wh, b):
    """
    Run the forward pass for a single timestep of a vanilla RNN that uses a tanh
    activation function.

    The input data has dimension D, the hidden state has dimension H, and we use
    a minibatch size of N.

    Inputs:
    - x: Input data for this timestep, of shape (N, D).
    - prev_h: Hidden state from previous timestep, of shape (N, H)
    - Wx: Weight matrix for input-to-hidden connections, of shape (D, H)
    - Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)
    - b: Biases of shape (H,)

    Returns a tuple of:
    - next_h: Next hidden state, of shape (N, H)
    - cache: Tuple of values needed for the backward pass.
    """
    next_h, cache = None, None
    ##############################################################################
    # TODO: Implement a single forward step for the vanilla RNN. Store the next  #
    # hidden state and any values you need for the backward pass in the next_h   #
    # and cache variables respectively.                                          #
    ##############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    next_h = x.dot(Wx) + prev_h.dot(Wh) + b
    next_h = np.tanh(next_h)
    cache = (x, Wx, prev_h, Wh, next_h)

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ##############################################################################
    #                               END OF YOUR CODE                             #
    ##############################################################################
    return next_h, cache


def rnn_step_backward(dnext_h, cache):
    """
    Backward pass for a single timestep of a vanilla RNN.

    Inputs:
    - dnext_h: Gradient of loss with respect to next hidden state, of shape (N, H)
    - cache: Cache object from the forward pass

    Returns a tuple of:
    - dx: Gradients of input data, of shape (N, D)
    - dprev_h: Gradients of previous hidden state, of shape (N, H)
    - dWx: Gradients of input-to-hidden weights, of shape (D, H)
    - dWh: Gradients of hidden-to-hidden weights, of shape (H, H)
    - db: Gradients of bias vector, of shape (H,)
    """
    dx, dprev_h, dWx, dWh, db = None, None, None, None, None
    ##############################################################################
    # TODO: Implement the backward pass for a single step of a vanilla RNN.      #
    #                                                                            #
    # HINT: For the tanh function, you can compute the local derivative in terms #
    # of the output value from tanh.                                             #
    ##############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    
    x, Wx, prev_h, Wh, next_h = cache
    N = dnext_h.shape[0]
    dy = dnext_h * (1 - next_h * next_h)
    dx = dy.dot(Wx.T)
    dWx = x.T.dot(dy)
    dprev_h = dy.dot(Wh.T)
    dWh = prev_h.T.dot(dy)
    db = np.ones(N).dot(dy)

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ##############################################################################
    #                               END OF YOUR CODE                             #
    ##############################################################################
    return dx, dprev_h, dWx, dWh, db

这里值得一提的是tanh函数的求导,令f(x) = tanh(x),f’(x) = (1 + tanh(x)) * (1 - tanh(x))

有了单元内的更新准则,我们就可以搭建RNN网络了,由于每个模块单元都是类似的,所以搭建RNN的思想就是在循环中不断的迭代计算,并且将每一步的结果都保存下来。需要实现的是rnn_forward和rnn_backward部分代码。

def rnn_forward(x, h0, Wx, Wh, b):
    """
    Run a vanilla RNN forward on an entire sequence of data. We assume an input
    sequence composed of T vectors, each of dimension D. The RNN uses a hidden
    size of H, and we work over a minibatch containing N sequences. After running
    the RNN forward, we return the hidden states for all timesteps.

    Inputs:
    - x: Input data for the entire timeseries, of shape (N, T, D).
    - h0: Initial hidden state, of shape (N, H)
    - Wx: Weight matrix for input-to-hidden connections, of shape (D, H)
    - Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)
    - b: Biases of shape (H,)

    Returns a tuple of:
    - h: Hidden states for the entire timeseries, of shape (N, T, H).
    - cache: Values needed in the backward pass
    """
    h, cache = None, None
    ##############################################################################
    # TODO: Implement forward pass for a vanilla RNN running on a sequence of    #
    # input data. You should use the rnn_step_forward function that you defined  #
    # above. You can use a for loop to help compute the forward pass.            #
    ##############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    
    cache = []
    N, T, D = x.shape
    H = h0.shape[1]
    x = x.transpose(1, 0, 2)
    h = np.zeros((T, N, H))
    h_prev = h0
    for i in range(T):
        h_next, tmp_cache = rnn_step_forward(x[i], h_prev, Wx, Wh, b)
        cache.append(tmp_cache)
        h[i
  • 7
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值