斯坦福cs231n计算机视觉训练营----two_layer_net exercise

Forward pass: compute scores

Open the file cs231n/classifiers/neural_net.py and look at the method TwoLayerNet.loss. This function is very similar to the loss functions you have written for the SVM and Softmax exercises: It takes the data and weights and computes the class scores, the loss, and the gradients on the parameters.

Implement the first part of the forward pass which uses the weights and biases to compute the scores for all inputs.

    # Compute the forward pass
    scores = None
    #############################################################################
    # TODO: Perform the forward pass, computing the class scores for the input. #
    # Store the result in the scores variable, which should be an array of      #
    # shape (N, C).                                                             #
    #############################################################################
    s1 = np.dot(X,W1) + b1    #根据两层神经网络的公式计算即可
    s1_act = (s1>0)*s1
    scores = np.dot(s1_act,W2)+b2
    #############################################################################
    #                              END OF YOUR CODE                             #
    #############################################################################

Forward pass: compute loss

In the same function, implement the second part that computes the data and regularizaion loss.

    # Compute the loss
    loss = None
    #############################################################################
    # TODO: Finish the forward pass, and compute the loss. This should include  #
    # both the data loss and L2 regularization for W1 and W2. Store the result  #
    # in the variable loss, which should be a scalar. Use the Softmax           #
    # classifier loss.                                                          #
    #############################################################################
    scores -= np.max(scores, axis=1, keepdims=True)  #数值稳定性
    scores = np.exp(scores) #取指数
    scores /= np.sum(scores,axis=1, keepdims=True)  #softmax
    loss = -np.log(scores[np.arange(N),y]).sum()
    loss /= X.shape[0]
    loss += reg * np.sum(W1 ** 2)
    loss += reg * np.sum(W2 ** 2)
    #############################################################################
    #                              END OF YOUR CODE                             #
    #############################################################################

Backward pass(类似softmax求梯度)

Implement the rest of the function. This will compute the gradient of the loss with respect to the variables W1b1W2, and b2. Now that you (hopefully!) have a correctly implemented forward pass, you can debug your backward pass using a numeric gradient check:

    # Backward pass: compute gradients
    grads = {}
    #############################################################################
    # TODO: Compute the backward pass, computing the derivatives of the weights #
    # and biases. Store the results in the grads dictionary. For example,       #
    # grads['W1'] should store the gradient on W1, and be a matrix of same size #
    #############################################################################
    ds2 = np.copy(scores)  # 计算ds
    ds2[np.arange(X.shape[0]), y] -= 1
    grads['W2'] = np.dot(s1_act.T, ds2) / X.shape[0] + 2 * reg * W2
    grads['b2'] = np.sum(ds2, axis=0) / X.shape[0]

    ds1 = np.dot(ds2, W2.T)
    ds1 = (s1 > 0) * ds1
    grads['W1'] = np.dot(X.T, ds1) / X.shape[0] + 2 * reg * W1
    grads['b1'] = np.sum(ds1, axis=0) / X.shape[0]
    #############################################################################
    #                              END OF YOUR CODE                             #
    #############################################################################

Train the network

To train the network we will use stochastic gradient descent (SGD), similar to the SVM and Softmax classifiers. Look at the function TwoLayerNet.train and fill in the missing sections to implement the training procedure. This should be very similar to the training procedure you used for the SVM and Softmax classifiers. You will also have to implement TwoLayerNet.predict, as the training process periodically performs prediction to keep track of accuracy over time while the network trains.

Once you have implemented the method, run the code below to train a two-layer network on toy data. You should achieve a training loss less than 0.2.

      #########################################################################
      # TODO: Create a random minibatch of training data and labels, storing  #
      # them in X_batch and y_batch respectively.                             #
      #########################################################################
      idx = np.random.choice(range(num_train),batch_size)
      X_batch = X[idx]
      y_batch = y[idx]
      #########################################################################
      #                             END OF YOUR CODE                          #
      #########################################################################
      #########################################################################
      # TODO: Use the gradients in the grads dictionary to update the         #
      # parameters of the network (stored in the dictionary self.params)      #
      # using stochastic gradient descent. You'll need to use the gradients   #
      # stored in the grads dictionary defined above.                         #
      #########################################################################
      for p in ['W1', 'W2', 'b1', 'b2']:
        self.params[p] -= learning_rate * grads[p]
      #########################################################################
      #                             END OF YOUR CODE                          #
      #########################################################################
        ###########################################################################
        # TODO: Implement this function; it should be VERY simple!                #
        ###########################################################################
        s1 = np.dot(X, self.params['W1']) + self.params['b1']
        s1 = s1 * (s1 > 0)
        s2 = np.dot(s1, self.params['W2']) + self.params['b2']
        y_pred = np.argmax(s2, axis=1)
        ###########################################################################
        #                              END OF YOUR CODE                           #
        ###########################################################################

from copy import deepcopy
input_size = 32 * 32 * 3
hidden_size = 50
num_classes = 10
best_net = None # store the best model into this 
best_acc = -1
#################################################################################
# TODO: Tune hyperparameters using the validation set. Store your best trained  #
# model in best_net.                                                            #
#                                                                               #
# To help debug your network, it may help to use visualizations similar to the  #
# ones we used above; these visualizations will have significant qualitative    #
# differences from the ones we saw above for the poorly tuned network.          #
#                                                                               #
# Tweaking hyperparameters by hand can be fun, but you might find it useful to  #
# write code to sweep through possible combinations of hyperparameters          #
# automatically like we did on the previous exercises.                          #
#################################################################################
learning_rate = [1e-4, 1e-3]
regularization = [0.25, 0.5, 0.75]

for lr in learning_rate:
    for reg in regularization:
        net = TwoLayerNet(input_size, hidden_size, num_classes)
        state = net.train(X_train, y_train, X_val, y_val,
                num_iters=2000, batch_size=200,
                learning_rate=lr, learning_rate_decay=0.95,
                reg=reg, verbose=False)
        val_acc = np.mean(net.predict(X_val) == y_val)
        if val_acc > best_acc:
            best_acc = val_acc
            best_net = deepcopy(net)
print('best val acc: {:.3f}'.format(best_acc))
#################################################################################
#                               END OF YOUR CODE                                #
#################################################################################

Inline Question

Now that you have trained a Neural Network classifier, you may find that your testing accuracy is much lower than the training accuracy. In what ways can we decrease this gap? Select all that apply.

  1. Train on a larger dataset.
  2. Add more hidden units.
  3. Increase the regularization strength.
  4. None of the above.

Your answer:1和3

Your explanation:增大数据和增加正则化强度都能够提高泛化能力,但是增加隐藏节点会使得model更加的过拟合

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值