斯坦福大学CS231n——assignmentv1——neural_network

开篇

终于到了作业1的最后一个部分,也终于到了深度学习最核心的一部分,神经网络的搭建和训练。虽然这次作业只是一个双层的神经网络,但是体现了神经网络的一般架构和框架,更复杂的神经网络无非就是加一些batch normalization,dropout,identity map等等等等。所以这次作业还是要认真独立看懂代码并完成的。

神经网络的搭建

上次我们刚刚训练了我们的第一个模型——线性分类器,其实神经网络的训练和线性分类器异曲同工,都是在不断地训练中计算损失函数,然后梯度下降,对参数进行更新,从而预测出更加准确的结果,不同的是分类器之间的结构差异。
如图所示,我们这次训练的是一个双层神经网络:
在这里插入图片描述它的基本结构是这样的:输入——全连接层——ReLU激活——全连接层——softmax——输出
我们需要实现的部分主要就是网络的中间结构。
损失函数的声明
首先求出网络的中间变量,初始化W1,b1,W2,b2,计算出网络的中间变量,即Z1 = W1 * X之后Z1会经过ReLU激活函数:A1 = max(0,Z1)
然后将A1作为输入传递给第二层:Z2 = A1 * W。而二层的输出应该是经过了softmax层的输出。
这些其实都不算难点,难点主要体现我们梯度下降的过程,即反向传播,我们应该从dZ2开始反向传播,dZ2怎么得到嗯?要通过最后的得分矩阵,并将分类正确的位置-1,最后除以样本数,即可得到dZ2。
其他变量的梯度都是通过反向传播得到的,很简单,在这里不赘述了,这部分如果都不懂的朋友一定要回头看一遍cs231n再来做作业哈。最后将dW1,db1,dW2,db2放入变量grads中。返回loss和grads。
训练
训练之前还是要将样本分批,如果样本数比批次小,那我们对一些数据重复选取,如果大于批次,就不需要重复选取(体现在参数replace 上)
分批后在指定循环迭代次数中对样本进行迭代训练,并计算损失和梯度,更新参数以获得最小的损失函数。迭代的最后预测出类别并与真是类别比较,计算训练集和验证集的准确率即可。
说到预测,就到了最后一部分,预测函数。
预测
有了模型,有了损失,预测函数就是最简单的一部分。用更新好的W1,b1,W2,b2计算出最终预测结果,或许是一个确定的类别,或许是一个得分矩阵,在这里应该是一个得分矩阵。然后通过np.argmax函数在axis=1上找到最大值的索引(每一行的最大值),在这里每一行的最大值索引即为我们的类别,返回即可得到预测类别。
具体的一些细节和说明参照代码。

from __future__ import print_function

import numpy as np
import matplotlib.pyplot as plt

class TwoLayerNet(object):
    """
  A two-layer fully-connected neural network. The net has an input dimension of
  N, a hidden layer dimension of H, and performs classification over C classes.
  We train the network with a softmax loss function and L2 regularization on the
  weight matrices. The network uses a ReLU nonlinearity after the first fully
  connected layer.

  In other words, the network has the following architecture:

  input - fully connected layer - ReLU - fully connected layer - softmax

  The outputs of the second fully-connected layer are the scores for each class.
  """

    def __init__(self, input_size, hidden_size, output_size, std=1e-4):
        """
    Initialize the model. Weights are initialized to small random values and
    biases are initialized to zero. Weights and biases are stored in the
    variable self.params, which is a dictionary with the following keys:

    W1: First layer weights; has shape (D, H)
    b1: First layer biases; has shape (H,)
    W2: Second layer weights; has shape (H, C)
    b2: Second layer biases; has shape (C,)

    Inputs:
    - input_size: The dimension D of the input data.
    - hidden_size: The number of neurons H in the hidden layer.
    - output_size: The number of classes C.
    """
        self.params = {}
        self.params['W1'] = std * np.random.randn(input_size, hidden_size)
        self.params['b1'] = np.zeros(hidden_size)
        self.params['W2'] = std * np.random.randn(hidden_size, output_size)
        self.params['b2'] = np.zeros(output_size)

    def loss(self, X, y=None, reg=0.0):
        """
    Compute the loss and gradients for a two layer fully connected neural
    network.

    Inputs:
    - X: Input data of shape (N, D). Each X[i] is a training sample.
    - y: Vector of training labels. y[i] is the label for X[i], and each y[i] is
      an integer in the range 0 <= y[i] < C. This parameter is optional; if it
      is not passed then we only return scores, and if it is passed then we
      instead return the loss and gradients.
    - reg: Regularization strength.

    Returns:
    If y is None, return a matrix scores of shape (N, C) where scores[i, c] is
    the score for class c on input X[i].

    If y is not None, instead return a tuple of:
    - loss: Loss (data loss and regularization loss) for this batch of training
      samples.
    - grads: Dictionary mapping parameter names to gradients of those parameters
      with respect to the loss function; has the same keys as self.params.
    """
        # Unpack variables from the params dictionary
        W1, b1 = self.params['W1'], self.params['b1']
        W2, b2 = self.params['W2'], self.params['b2']
        N, D = X.shape

        # Compute the forward pass
        scores = None
        #############################################################################
        # TODO: Perform the forward pass, computing the class scores for the input. #
        # Store the result in the scores variable, which should be an array of      #
        # shape (N, C).                                                             #
        #############################################################################

        Z1 = np.dot(X, W1) + b1
        A1 = np.maximum(0, Z1)  # Relu
        scores = np.dot(A1, W2) + b2

        #############################################################################
        #                              END OF YOUR CODE                             #
        #############################################################################

        # If the targets are not given then jump out, we're done
        if y is None:
            return scores

        # Compute the loss
        loss = None
        #############################################################################
        # TODO: Finish the forward pass, and compute the loss. This should include  #
        # both the data loss and L2 regularization for W1 and W2. Store the result  #
        # in the variable loss, which should be a scalar. Use the Softmax           #
        # classifier loss.                                                          #
        #############################################################################

        num_train = X.shape[0]

        scores = np.exp(scores) / np.sum(np.exp(scores), axis=1).reshape(num_train, -1)
        score_y = scores[np.arange(num_train), y]
        loss = np.sum(-np.log(score_y)) / num_train
        loss += reg * (np.sum(W1 * W1) + np.sum(W2 * W2))

        #############################################################################
        #                              END OF YOUR CODE                             #
        #############################################################################

        # Backward pass: compute gradients
        grads = {}
        #############################################################################
        # TODO: Compute the backward pass, computing the derivatives of the weights #
        # and biases. Store the results in the grads dictionary. For example,       #
        # grads['W1'] should store the gradient on W1, and be a matrix of same size #
        #############################################################################

        scores[np.arange(num_train), y] -= 1
        dZ2 = scores / num_train
        dW2 = np.dot(A1.T, dZ2) + 2 * reg * W2
        db2 = np.sum(dZ2, axis=0, keepdims=True)
        dA1 = np.dot(dZ2, W2.T)
        dZ1 = dA1.copy()
        # 因为dZ1是从dA1得来的,A1中没有负数,都当作0来处理
        # 但是Z1中有负数,所以计算dZ1的时候要把Z1<0的部分变成0
        dZ1[Z1 < 0] = 0

        dW1 = np.dot(X.T, dZ1) + 2 * reg * W1
        db1 = np.sum(dZ1, axis=0, keepdims=True)

        grads['W2'] = dW2
        grads['b2'] = db2
        grads['W1'] = dW1
        grads['b1'] = db1

        #############################################################################
        #                              END OF YOUR CODE                             #
        #############################################################################

        return loss, grads

    def train(self, X, y, X_val, y_val,
              learning_rate=1e-3, learning_rate_decay=0.95,
              reg=5e-6, num_iters=100,
              batch_size=200, verbose=False):
        """
    Train this neural network using stochastic gradient descent.

    Inputs:
    - X: A numpy array of shape (N, D) giving training data.
    - y: A numpy array f shape (N,) giving training labels; y[i] = c means that
      X[i] has label c, where 0 <= c < C.
    - X_val: A numpy array of shape (N_val, D) giving validation data.
    - y_val: A numpy array of shape (N_val,) giving validation labels.
    - learning_rate: Scalar giving learning rate for optimization.
    - learning_rate_decay: Scalar giving factor used to decay the learning rate
      after each epoch.
    - reg: Scalar giving regularization strength.
    - num_iters: Number of steps to take when optimizing.
    - batch_size: Number of training examples to use per step.
    - verbose: boolean; if true print progress during optimization.
    """
        # 选择一个batch
        # X_var,y_var是验证集
        num_train = X.shape[0]
        iterations_per_epoch = max(num_train / batch_size, 1)

        # Use SGD to optimize the parameters in self.model
        loss_history = []
        train_acc_history = []
        val_acc_history = []

        for it in range(num_iters):
            X_batch = None
            y_batch = None

            #########################################################################
            # TODO: Create a random minibatch of training data and labels, storing  #
            # them in X_batch and y_batch respectively.                             #
            #########################################################################

            if num_train < batch_size:
                temp = np.random.choice(a=num_train, size=batch_size, replace=True)
            else:
                temp = np.random.choice(a=num_train, size=batch_size, replace=False)
            X_batch = X[temp]
            y_batch = y[temp]

            #########################################################################
            #                             END OF YOUR CODE                          #
            #########################################################################

            # Compute loss and gradients using the current minibatch
            loss, grads = self.loss(X_batch, y=y_batch, reg=reg)
            loss_history.append(loss)

            #########################################################################
            # TODO: Use the gradients in the grads dictionary to update the         #
            # parameters of the network (stored in the dictionary self.params)      #
            # using stochastic gradient descent. You'll need to use the gradients   #
            # stored in the grads dictionary defined above.                         #
            #########################################################################
            # 更新权重和偏差
            self.params['W1'] -= learning_rate * grads['W1']
            self.params['W2'] -= learning_rate * grads['W2']
            self.params['b1'] -= learning_rate * grads['b1'].reshape(-1)
            self.params['b2'] -= learning_rate * grads['b2'].reshape(-1)

            #########################################################################
            #                             END OF YOUR CODE                          #
            #########################################################################

            if verbose and it % 100 == 0:
                print('iteration %d / %d: loss %f' % (it, num_iters, loss))

            # Every epoch, check train and val accuracy and decay learning rate.
            if it % iterations_per_epoch == 0:
                # Check accuracy
                train_acc = (self.predict(X_batch) == y_batch).mean()
                val_acc = (self.predict(X_val) == y_val).mean()
                train_acc_history.append(train_acc)
                val_acc_history.append(val_acc)

                # Decay learning rate
                learning_rate *= learning_rate_decay

        return {
            'loss_history': loss_history,
            'train_acc_history': train_acc_history,
            'val_acc_history': val_acc_history,
        }

    def predict(self, X):
        """
    Use the trained weights of this two-layer network to predict labels for
    data points. For each data point we predict scores for each of the C
    classes, and assign each data point to the class with the highest score.

    Inputs:
    - X: A numpy array of shape (N, D) giving N D-dimensional data points to
      classify.

    Returns:
    - y_pred: A numpy array of shape (N,) giving predicted labels for each of
      the elements of X. For all i, y_pred[i] = c means that X[i] is predicted
      to have class c, where 0 <= c < C.
    """
        y_pred = None

        ###########################################################################
        # TODO: Implement this function; it should be VERY simple!                #
        ###########################################################################
        Z1 = np.dot(X, self.params['W1']) + self.params['b1']
        A1 = np.maximum(0, Z1)
        Z2 = np.dot(A1, self.params['W2']) + self.params['b2']
        # 找出一行中(即各个类别)最大值的索引,即为类别
        y_pred = np.argmax(Z2, axis=1)

        ###########################################################################
        #                              END OF YOUR CODE                           #
        ###########################################################################

        return y_pred

总结

至此,作业1结束了。作业1中主要是一些基础模型的训练以及往后会用的损失函数的定义经过我们自己的实现对这方面都有了一定程度的掌握。接下来的作业2,就是难点了,各位加油!

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值