目录
FullyConnedtedNets.ipynb
上一个作业中实现了一个两层全连接网络,但是它并非结构化的,因为它在同一个函数中计算整个网络的梯度。这种做法在浅层神经网络中还容易实现,但是在深度神经网络就难以实现了。所以需要构造一个结构化的神经网络。
首先,每一层实现一个forward()和一个backward()函数。
forward()主要计算输出,结构如下:
def layer_forward(x, w):
""" 接收输入x和权重w"""
# 做一些计算……
z = # ……得到一些中间值
# 做更多的计算……
out = # 输出
cache = (x, w, z, out) # 保存计算梯度是所需的值
return out, cache
backward()主要计算梯度,结构如下:
def layer_backward(dout, cache):
"""
接收上游传回来的导数∂loss/∂outputs和cache中保存的计算梯度所需的值,
然后计算∂loss/∂x和∂loss/∂w
"""
# 取出cache里保存的值
x, w, z, out = cache
# 使用cache里的值计算梯度
dx = # 赋值
dw = # 赋值
return dx, dw
刚开始运行下面的代码时可能会出错。懂得解决的话自己解决就好了,不懂的话可以参考这篇文章。
# 像以前一样,为后续代码做准备
from __future__ import print_function
import time
import numpy as np
import matplotlib.pyplot as plt
from cs231n.classifiers.fc_net import *
from cs231n.data_utils import get_CIFAR10_data
from cs231n.gradient_check import eval_numerical_gradient, eval_numerical_gradient_array
from cs231n.solver import Solver
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # 设置画图的默认大小
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
# 自动加载额外的文件
%load_ext autoreload
%autoreload 2
def rel_error(x, y):
""" 返回相对误差 """
return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))
加载预处理过(减均值)的CIFAR-10数据,函数内部和以前的差不多,只不过现在封装起来了。
data = get_CIFAR10_data()
for k, v in list(data.items()):
# 输出数据的维度
print(('%s: ' % k, v.shape))
'''('X_train: ', (49000, 3, 32, 32))
('y_train: ', (49000,))
('X_val: ', (1000, 3, 32, 32))
('y_val: ', (1000,))
('X_test: ', (1000, 3, 32, 32))
('y_test: ', (1000,))
'''
affine层的forward和backward
先完成cs231n/layers.py里的affine_forward()再运行下面代码
# 测试affine_forward()是否编程正确
num_inputs = 2
input_shape = (4, 5, 6)
output_dim = 3
# np.prod()函数用来计算数组中所有元素的乘积
input_size = num_inputs * np.prod(input_shape)
weight_size = output_dim * np.prod(input_shape)
# reshape只接受整数,所以要使用*input_shape
x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)
w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)
b = np.linspace(-0.3, 0.1, num=output_dim)
out, _ = affine_forward(x, w, b)
correct_out = np.array([[ 1.49834967, 1.70660132, 1.91485297],
[ 3.25553199, 3.5141327, 3.77273342]])
# 比较编程输出和给定的输出,误差应该小于1e-9.
print('Testing affine_forward function:')
print('difference: ', rel_error(out, correct_out))
# Testing affine_forward function:
# difference: 9.769849468192957e-10
先完成cs231n/layers.py里的affine_backward()再运行下面代码
# 固定随机种子,使结果可以复现
np.random.seed(231)
x = np.random.randn(10, 2, 3)
w = np.random.randn(6, 5)
b = np.random.randn(5)
dout = np.random.randn(10, 5)
# 计算数值梯度
dx_num = eval_numerical_gradient_array(lambda x: affine_forward(x, w, b)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: affine_forward(x, w, b)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: affine_forward(x, w, b)[0], b, dout)
# 计算解析梯度(自己实现的)
_, cache = affine_forward(x, w, b)
dx, dw, db = affine_backward(dout, cache)
# 相对误差应该都小于1e-10
print('Testing affine_backward function:')
print('dx error: ', rel_error(dx_num, dx))
print('dw error: ', rel_error(dw_num, dw))
print('db error: ', rel_error(db_num, db))
# Testing affine_backward function:
# dx error: 5.399100368651805e-11
# dw error: 9.904211865398145e-11
# db error: 2.4122867568119087e-11
ReLU层的forward和backward
**先完成cs231n/layers.py里的relu_forward()和relu_backward()**再运行下面代码
# 测试relu_forward()
x = np.linspace(-0.5, 0.5, num=12).reshape(3, 4)
# 调用函数计算输出
out, _ = relu_forward(x)
correct_out = np.array([[ 0., 0., 0., 0., ],
[ 0., 0., 0.04545455, 0.13636364,],
[ 0.22727273, 0.31818182, 0.40909091, 0.5, ]])
# 相对误差应该在5e-8左右
print('Testing relu_forward function:')
print('difference: ', rel_error(out, correct_out))
# Testing relu_forward function:
# difference: 4.999999798022158e-08
# 测试relu_backward()
np.random.seed(231)
x = np.random