Assignment Two(二)全连接神经网络
主要工作为:模块化设计、最优化更新的几种方法
一、模块设计
在A1中,实现了完全连接的两层神经网络。 但功能上不是很模块化,因为损耗和梯度是在单个整体函数中计算的。 这对于简单的两层网络是可管理的,但是随着转向更大的模型,这将变得不切实际。
理想情况下,期望使用更具模块化的设计来构建网络,以便隔离地实现不同的层类型,然后将它们组合在一起成为具有不同体系结构的模型。
在本练习中,使用更加模块化的方法来实现完全连接的网络。 对于每一层,我们将实现前向和后向功能。
1)前向函数将接收输入,权重和其他参数,并将返回输出和缓存对象,该对象存储了向后传递所需的数据。
def layer_forward(x, w):
""" Receive inputs x and weights w """
# Do some computations ...
z = # ... some intermediate value
# Do some more computations ...
out = # the output
cache = (x, w, z, out) # Values we need to compute gradients
return out, cache
2)反向函数:向后传递将接收上游导数和缓存对象,并将返回关于输入和权重的梯度
def layer_backward(dout, cache):
"""
Receive dout (derivative of loss with respect to outputs) and cache,
and compute derivative with respect to inputs.
"""
# Unpack cache values
x, w, z, out = cache
# Use values in cache to compute derivatives
dx = # Derivative of loss with respect to x
dw = # Derivative of loss with respect to w
return dx, dw
1.1 Affine layer: forward
def affine_forward(x, w, b):
out = None
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
N, D = x.shape[0], x.size // x.shape[0]
out = np.dot(x.reshape(N, D), w) + b
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
cache = (x, w, b)
return out, cache
代码分析:
计算映射层的前向计算函数
输入x的形状为(N,d_1,...,d_k),并且包含N的小批量示例,x [i] 表示为(d_1,...,d_k)
将输入整形为尺寸为D = d_1 * ... * d_k的向量
然后将其转换成尺寸为M的输出向量。
输入:
-x:包含形状为(N,d_1,...,d_k)的输入数据的numpy数组
-w:为N的数组,形状为(D,M)
-b:数量为(M)的偏置量
返回一个元组:
-输出:形状为(N,M)的输出
-cache:(x,w,b)
Test_1.1 affine_forward
检测误差小于 1e-9
# Test the affine_forward function
num_inputs = 2
input_shape = (4, 5, 6)
output_dim = 3
input_size = num_inputs * np.prod(input_shape)
weight_size = output_dim * np.prod(input_shape)
x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)
w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)
b = np.linspace(-0.3, 0.1, num=output_dim)
out, _ = affine_forward(x, w, b)
correct_out = np.array([[ 1.49834967, 1.70660132, 1.91485297],
[ 3.25553199, 3.5141327, 3.77273342]])
# Compare your output with ours. The error should be around e-9 or less.
print('Testing affine_forward function:')
print('difference: ', rel_error(out, correct_out))
输出:
Testing affine_forward function:
difference: 9.769849468192957e-10
1.2 Affine layer: backward
def affine_backward(dout, cache):
x, w, b = cache
dx, dw, db = None, None, None
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
N, D = x.shape[0], w.shape[0]
dx = np.dot(dout, w.T).reshape(x.shape)
dw = np.dot(x.reshape(N, D).T, dout)
db = np.sum(dout, axis = 0)
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
return dx, dw, db
代码分析:
计算映射层反向函数。
输入:
-dout:形状为(N,M)的上游导数
-cache:元组:
-x:形状为(N,d_1,... d_k)的输入数据
-w:权重(形状,D,M)
-b:M个偏置量
返回一个元组:
-dx、-dw、-db
Test_1.2 affine_backward
# Test the affine_backward function
np.random.seed(231)
x = np.random.randn(10, 2, 3)
w = np.random.randn(6, 5)
b = np.random.randn(5)
dout = np.random.randn(10, 5)
dx_num = eval_numerical_gradient_array(lambda x: affine_forward(x, w, b)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: affine_forward(x, w, b)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: affine_forward(x, w, b)[0], b, dout)
_, cache = affine_forward(x, w, b)
dx, dw, db = affine_backward(dout, cache)
# The error should be around e-10 or less
print('Testing affine_backward function:')
print('dx error: ', rel_error(dx_num, dx))
print('dw error: ', rel_error(dw_num, dw))
print('db error: ', rel_error(db_num, db))
输出:
Testing affine_backward function:
dx error: 5.399100368651805e-11
dw error: 9.904211865398145e-11
db error: 2.4122867568119087e-11
1.3 ReLU activation: forward
计算ReLU层的正向函数
def relu_forward(x):
out = None
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
out = np.maximum(0, x)
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
cache = x
return out, cache
Test_1.3 relu_forward
# Test the relu_forward function
x = np.linspace(-0.5, 0.5, num=12).reshape(3, 4)
out, _ = relu_forward(x)
correct_out = np.array([[ 0., 0., 0., 0., ],
[ 0., 0., 0.04545455, 0.13636364,],
[ 0.22727273, 0.31818182, 0.40909091, 0.5, ]])
# Compare your output with ours. The error should be on the order of e-8
print('Testing relu_forward function:')
print('difference: ', rel_error(out, correct_out))
输出:
Testing relu_forward function:
difference: 4.999999798022158e-08
1.4 ReLU activation: backward
def relu_backward(dout, cache):
dx, x = None, cache
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
dx = dout * (x > 0)
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
return dx
代码分析:
输入:
-dout
-cache:输入x,形状与dout相同
返回值:-dx
Test_1.4 relu_backword
np.random.seed(231)
x = np.random.randn(10, 10)
dout = np.random.randn(*x.shape)
dx_num = eval_numerical_gradient_array(lambda x: relu_forward(x)[0], x, dout)
_, cache = relu_forward(x)
dx = relu_backward(dout, cache)
# The error should be on the order of e-12
print('Testing relu_backward function:')
print('dx error: ', rel_error(dx_num, dx))
输出:
Testing relu_backward function:
dx error: 3.2756349136310288e-12
1.5 “Sandwich” layers
一种映射层-ReLU连接好的简单结构
def affine_relu_forward(x, w, b):
a, fc_cache = affine_forward(x, w, b)
out, relu_cache = relu_forward(a)
cache = (fc_cache, relu_cache)
return out, cache
def affine_relu_backward(dout, cache):
fc_cache, relu_cache = cache
da = relu_backward(dout, relu_cache)
dx, dw, db = affine_backward(da, fc_cache)
return dx, dw, db
Test_1.5 affine_relu
from cs231n.layer_utils import affine_relu_forward, affine_relu_backward
np.random.seed(231)
x = np.random.randn(2, 3, 4)
w = np.random.randn(12, 10)
b = np.random.randn(10)
dout = np.random.randn(2, 10)
out, cache = affine_relu_forward(x, w, b)
dx, dw, db = affine_relu_backward(dout, cache)
dx_num = eval_numerical_gradient_array(lambda x: affine_relu_forward(x, w, b)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: affine_relu_forward(x, w, b)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: affine_relu_forward(x, w, b)[0], b, dout)
# Relative error should be around e-10 or less
print('Testing affine_relu_forward and affine_relu_backward:')
print('dx error: ', rel_error(dx_num, dx))
print('dw error: ', rel_error(dw_num, dw))
print('db error: ', rel_error(db_num, db))
输出:
Testing affine_relu_forward and affine_relu_backward:
dx error: 2.299579177309368e-11
dw error: 8.162011105764925e-11
db error: 7.826724021458994e-12
1.6 Loss layers: Softmax and SVM
svm_loss的计算
def svm_loss(x, y):
N = x.shape[0]
correct_class_scores = x[np.arange(N), y]
margins = np.maximum(0, x - correct_class_scores[:, np.newaxis] + 1.0)
margins[np.arange(N), y] = 0
loss = np.sum(margins) / N
num_pos = np.sum(margins > 0, axis=1)
dx = np.zeros_like(x)
dx[margins > 0] = 1
dx[np.arange(N), y] -= num_pos
dx /= N
return loss, dx
代码分析:
输入:
-x:输入数据,形状为(N,C),其中x [i,j]是第 j 个分数第 i 个输入的类。
-y:标签的向量,形状为(N),其中y [i] 是x [i] 的标签,0 <= y [i] <C
返回一个元组:
-损失:标量
-dx:相对于x的损失梯度
softmax_loss的计算
def softmax_loss(x