[CS231n Assignment 2 #01] 全连接神经网络(Fully-connected Neural Network)

  • 作业主页:Assignment 2
  • 作业目的:之前我们已经实现过一个双层的神经网络了,但是,它的所有函数被放置在一个文件中。对于,简单的神经网络,这种做法或许比较简便,但是当我们需要更大、更深的神经网络的时候,这种写法可能就不是那么高效。所以,本次作业,我们需要学会如何对神经网络进行 分层设计 以及 模块化,在不同的文件中实现我们不同的模块,然后将它们 整合 成最终的网络。
  • 官方示例代码: Assignment 2 code
  • 作业源文件 FullyConnectedNets.ipynb
1.Fully-Connected Neural Nets 架构

在本次作业中,我们将模块化的实现我们的全连接神经网络,每一层网络,我们将实现前向传播forward() 和反向传播 backward()

  • 其中,forward()接收输入和权重以及必要的其它参数,然后返回一个输出,以及存储我们在反向传播过程中需要的变量,即类似于:
def layer_forward(x, w):
  """ Receive inputs x and weights w """
  # Do some computations ...
  z = # ... some intermediate value
  # Do some more computations ...
  out = # the output

  cache = (x, w, z, out) # Values we need to compute gradients

  return out, cache
  • 而反向传播过程backward()将接收上流梯度以及之前存储的变量,然后返回输入以及权重的梯度:
def layer_backward(dout, cache):
  Receive dout (derivative of loss with respect to outputs) and cache,
  and compute derivative with respect to inputs.
  # Unpack cache values
  x, w, z, out = cache

  # Use values in cache to compute derivatives
  dx = # Derivative of loss with respect to x
  dw = # Derivative of loss with respect to w

  return dx, dw
2. 初始化作业环境
  • 下载数据集
  • 安装必要的包
  • 注意:gnureadline==6.3.3 在windows下不支持,直接不安装就行;其它也并不都是必须的,可选择性安装,但是要有Numpy、Cython、Future等。

cd assignment2
pip install -r requirements.txt
  • 编译Cython扩展:因为卷积神经网络需要一些高效的操作,所以官方已经用Cython实现了必要的操作,例如im2col.py。我们要做的就是先编译这个文件,即:在cs231n目录下,运行setup.py
python setup.py build_ext --inplace
  • 初始化Jupyter notebook环境
# As usual, a bit of setup
from __future__ import print_function
import time
import numpy as np
import matplotlib.pyplot as plt
from cs231n.classifiers.fc_net import *
from cs231n.data_utils import get_CIFAR10_data
from cs231n.gradient_check import eval_numerical_gradient, eval_numerical_gradient_array
from cs231n.solver import Solver

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):
  """ returns relative error """
  return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))
  • 加载数据:
# Load the (preprocessed) CIFAR10 data.

data = get_CIFAR10_data()
for k, v in list(data.items()):
  print(('%s: ' % k, v.shape))
('X_val: ', (1000, 3, 32, 32))
('y_test: ', (1000,))
('y_train: ', (49000,))
('X_test: ', (1000, 3, 32, 32))
('X_train: ', (49000, 3, 32, 32))
('y_val: ', (1000,))
3. 实现全连接层(Affine Layer)
3.1 前向传播
  • Open the file cs231n/layers.py and implement the affine_forward function.
def affine_forward(x, w, b):
    Computes the forward pass for an affine (fully-connected) layer.

    The input x has shape (N, d_1, ..., d_k) and contains a minibatch of N
    examples, where each example x[i] has shape (d_1, ..., d_k). We will
    reshape each input into a vector of dimension D = d_1 * ... * d_k, and
    then transform it to an output vector of dimension M.

    - x: A numpy array containing input data, of shape (N, d_1, ..., d_k)
    - w: A numpy array of weights, of shape (D, M)
    - b: A numpy array of biases, of shape (M,)

    Returns a tuple of:
    - out: output, of shape (N, M)
    - cache: (x, w, b)
    out = None
    # reshape the input into rows.
    x = x.reshape(x.shape[0],-1) # [N , D]
    out = np.dot(x,w) + b # [N , M]
    cache = (x, w, b)
    return out, cache
3.2 反向传播
  • Now implement the affine_backward function and test your implementation using numeric gradient checking.
def affine_backward(dout, cache):
    Computes the backward pass for an affine layer.

    - dout: Upstream derivative, of shape (N, M)
    - cache: Tuple of:
      - x: Input data, of shape (N, d_1, ... d_k)
      - w: Weights, of shape (D, M)
      - b: Biases, of shape (M,)

    Returns a tuple of:
    - dx: Gradient with respect to x, of shape (N, d1, ..., d_k)
    - dw: Gradient with respect to w, of shape (D, M)
    - db: Gradient with respect to b, of shape (M,)
    x, w, b = cache
    x_rows = x.reshape(x.shape[0],-1)
    d_xrows = np.dot(dout,w.T)
    dx = d_xrows.reshape(x.shape)
    dw = np.dot(x_rows.T, dout)
    # 注意,这里的db没有对行取平均
    db = np.sum(dout, axis=0)
    return dx, dw, db
4. ReLU激活函数
def relu_forward(x):
    Computes the forward pass for a layer of rectified linear units (ReLUs).

    - x: Inputs, of any shape

    Returns a tuple of:
    - out: Output, of the same shape as x
    - cache: x
    out = np.maximum(0,x)
    cache = x
    return out, cache

def relu_backward(dout, cache):
    Computes the backward pass for a layer of rectified linear units (ReLUs).

    - dout: Upstream derivatives, of any shape
    - cache: Input x, of same shape as dout

    - dx: Gradient with respect to x
    dx, x = None, cache
    dx = (x > 0) * dout
    return dx
5. “Sandwich” layers


def affine_relu_forward(x, w, b):
    Convenience layer that perorms an affine transform followed by a ReLU

    - x: Input to the affine layer
    - w, b: Weights for the affine layer

    Returns a tuple of:
    - out: Output from the ReLU
    - cache: Object to give to the backward pass
    a, fc_cache = affine_forward(x, w, b)
    out, relu_cache = relu_forward(a)
    cache = (fc_cache, relu_cache)
    return out, cache

def affine_relu_backward(dout, cache):
    Backward pass for the affine-relu convenience layer
    fc_cache, relu_cache = cache
    da = relu_backward(dout, relu_cache)
    dx, dw, db = affine_backward(da, fc_cache)
    return dx, dw, db
6. 损失层(Loss Layer)
  • You implemented these loss functions in the last assignment, so we’ll give them to you for free here. You should still make sure you understand how they work by looking at the implementations in cs231n/layers.py.
  • 居然还有这种好事,不过大家可以和前面自己实现的比较一下。

def svm_loss(x, y):
    Computes the loss and gradient using for multiclass SVM classification.

    - x: Input data, of shape (N, C) where x[i, j] is the score for the jth
      class for the ith input.
    - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
      0 <= y[i] < C

    Returns a tuple of:
    - loss: Scalar giving the loss
    - dx: Gradient of the loss with respect to x
    N = x.shape[0]
    correct_class_scores = x[np.arange(N), y]
    margins = np.maximum(0, x - correct_class_scores[:, np.newaxis] + 1.0)
    margins[np.arange(N), y] = 0
    loss = np.sum(margins) / N
    num_pos = np.sum(margins > 0, axis=1)
    dx = np.zeros_like(x)
    dx[margins > 0] = 1
    dx[np.arange(N), y] -= num_pos
    dx /= N
    return loss, dx

def softmax_loss(x, y):
    Computes the loss and gradient for softmax classification.

    - x: Input data, of shape (N, C) where x[i, j] is the score for the jth
      class for the ith input.
    - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
      0 <= y[i] < C

    Returns a tuple of:
    - loss: Scalar giving the loss
    - dx: Gradient of the loss with respect to x
    shifted_logits = x - np.max(x, axis=1, keepdims=True)
    Z = np.sum(np.exp(shifted_logits), axis=1, keepdims=True)
    log_probs = shifted_logits - np.log(Z)
    probs = np.exp(log_probs)
    N = 
