目录
- 作业目的
- 网络层实现
- 优化方法实现
- 作业问题记录
- 参考文献
一、作业目的
之前做了一个Two-layer neural network的作业,但是其损失函数和反向传播都是在一个函数中实现的,并没有实现模块化,因此不适合复杂网络结构的开发。因此本作业目的在于将各功能模块化,从而较好地实现复杂网络的搭建。
二、网络层实现
1.affine_layer(layers.py)
1.1 affine_forward
Inputs:
- x: A numpy array containing input data, of shape (N, d_1, ..., d_k)
- w: A numpy array of weights, of shape (D, M)
- b: A numpy array of biases, of shape (M,)
Returns a tuple of:
- out: output, of shape (N, M)
- cache: (x, w, b)
def affine_forward(x, w, b):
out = None
out = np.dot(x.reshape((x.shape[0], -1)), w) + b
cache = (x, w, b)
return out, cache
前向传播较为简单,就是out = x * w + b,维度是(N,M)=(N,D) * (D,M) +(M,),这里与b相加用到了broadcast机制。
1.2 affine_backward
Inputs:
- dout: Upstream derivative, of shape (N, M)
- cache: Tuple of:
- x: Input data, of shape (N, d_1, ... d_k)
- w: Weights, of shape (D, M)
- b: Biases, of shape (M,)
Returns a tuple of:
- dx: Gradient with respect to x, of shape (N, d1, ..., d_k)
- dw: Gradient with respect to w, of shape (D, M)
- db: Gradient with respect to b, of shape (M,)
def affine_backward(dout, cache):
x, w, b = cache
dx, dw, db = None, None, None
dw = np.dot(x.reshape((x.shape[0], -1)).T, dout)
db = dout.sum(axis=0)
dx = np.dot(dout, w.T)
dx = dx.reshape(x.shape)
return dx, dw, db
反向传播主要注意维度的问题。
dw = x.T * dout | (D,M)=(D,N)*(N,M) |
db = dout 的列向量之和 | (M,) =(N,M)[0] |
dx = dout * w.T | (N,D) = (N,M) *(M,D) |
2.ReLU layer(layers.py)
2.1 relu_forward
Input:
- x: Inputs, of any shape
Returns a tuple of:
- out: Output, of the same shape as x
- cache: x
def relu_forward(x):
out = x * (x > 0)
cache = x
return out, cache
(x > 0 ) 是一个布尔判断,输出大于0的x。
2.2 relu_backward
Input:
- dout: Upstream derivatives, of any shape
- cache: Input x, of same shape as dout
Returns:
- dx: Gradient with respect to x
dx, x = None, cache
dx = dout * (x > 0)
return dx
同样,大于0的数才会得到反向传播的值。
3.Loss layers: Softmax and SVM(layers.py)
def svm_loss(x, y):
"""
Computes the loss and gradient using for multiclass SVM classification.
Inputs:
- x: Input data, of shape (N, C) where x[i, j] is the score for the jth
class for the ith input.
- y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
0 <= y[i] < C
Returns a tuple of:
- loss: Scalar giving the loss
- dx: Gradient of the loss with respect to x
"""
N = x.shape[0]
correct_class_scores = x[np.arange(N), y]
margins = np.maximum(0, x - correct_class_scores[:, np.newaxis] + 1.0)
margins[np.arange(N), y] = 0
loss = np.sum(margins) / N
num_pos = np.sum(margins > 0, axis=1)
dx = np.zeros_like(x)
dx[margins > 0] = 1
dx[np.arange(N), y] -= num_pos
dx /= N
return loss, dx
def softmax_loss(x, y):
"""
Computes the loss and gradient for softmax classification.
I