斯坦福cs231n课程记录——assignment2 FullyConnectedNets_we've only asked you to implement relu, but there -CSDN博客

本文链接：https://blog.csdn.net/weixin_39880579/article/details/86764215

一、作业目的

之前做了一个Two-layer neural network的作业，但是其损失函数和反向传播都是在一个函数中实现的，并没有实现模块化，因此不适合复杂网络结构的开发。因此本作业目的在于将各功能模块化，从而较好地实现复杂网络的搭建。

二、网络层实现

1.affine_layer(layers.py)

1.1 affine_forward

Inputs:
- x: A numpy array containing input data, of shape (N, d_1, ..., d_k)
- w: A numpy array of weights, of shape (D, M)
- b: A numpy array of biases, of shape (M,)

Returns a tuple of:
- out: output, of shape (N, M)
- cache: (x, w, b)

def affine_forward(x, w, b):
    out = None
    out = np.dot(x.reshape((x.shape[0], -1)), w) + b
    cache = (x, w, b)
    return out, cache

前向传播较为简单，就是out = x * w + b，维度是（N,M）=（N,D) * (D,M) +(M,)，这里与b相加用到了broadcast机制。

1.2 affine_backward

Inputs:
- dout: Upstream derivative, of shape (N, M)
- cache: Tuple of:
- x: Input data, of shape (N, d_1, ... d_k)
- w: Weights, of shape (D, M)
- b: Biases, of shape (M,)

Returns a tuple of:
- dx: Gradient with respect to x, of shape (N, d1, ..., d_k)
- dw: Gradient with respect to w, of shape (D, M)
- db: Gradient with respect to b, of shape (M,)

def affine_backward(dout, cache):
    x, w, b = cache
    dx, dw, db = None, None, None
    dw = np.dot(x.reshape((x.shape[0], -1)).T, dout)
    db = dout.sum(axis=0)
    dx = np.dot(dout, w.T)
    dx = dx.reshape(x.shape)
    return dx, dw, db

反向传播主要注意维度的问题。

dw = x.T * dout	(D,M)=(D,N)*(N,M)
db = dout 的列向量之和	（M,) =(N,M)[0]
dx = dout * w.T	(N,D) = (N,M) *(M,D)

2.ReLU layer(layers.py)

2.1 relu_forward

Input:
- x: Inputs, of any shape

Returns a tuple of:
- out: Output, of the same shape as x
- cache: x

def relu_forward(x):
    out = x * (x > 0)
    cache = x
    return out, cache

（x > 0 ) 是一个布尔判断，输出大于0的x。

2.2 relu_backward

Input:
- dout: Upstream derivatives, of any shape
- cache: Input x, of same shape as dout

Returns:
- dx: Gradient with respect to x

dx, x = None, cache
    dx = dout * (x > 0)
    return dx

同样，大于0的数才会得到反向传播的值。

3.Loss layers: Softmax and SVM(layers.py)

def svm_loss(x, y):
    """
    Computes the loss and gradient using for multiclass SVM classification.

    Inputs:
    - x: Input data, of shape (N, C) where x[i, j] is the score for the jth
      class for the ith input.
    - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
      0 <= y[i] < C

    Returns a tuple of:
    - loss: Scalar giving the loss
    - dx: Gradient of the loss with respect to x
    """
    N = x.shape[0]
    correct_class_scores = x[np.arange(N), y]
    margins = np.maximum(0, x - correct_class_scores[:, np.newaxis] + 1.0)
    margins[np.arange(N), y] = 0
    loss = np.sum(margins) / N
    num_pos = np.sum(margins > 0, axis=1)
    dx = np.zeros_like(x)
    dx[margins > 0] = 1
    dx[np.arange(N), y] -= num_pos
    dx /= N
    return loss, dx


def softmax_loss(x, y):
    """
    Computes the loss and gradient for softmax classification.

    I