Dropout是一种正则化神经网络的技术,它通过在前向传递过程中将一些特征随机设置为零。在本练习中,您将实现一个dropout层,并修改完全连接的网络以选择使用dropout
In[1]:
# As usual, a bit of setup
from __future__ import print_function
import time
import numpy as np
import matplotlib.pyplot as plt
from cs231n.classifiers.fc_net import *
from cs231n.data_utils import get_CIFAR10_data
from cs231n.gradient_check import eval_numerical_gradient, eval_numerical_gradient_array
from cs231n.solver import Solver
#%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
def rel_error(x, y):
""" returns relative error """
return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))
In[2]:
# Load the (preprocessed) CIFAR10 data.
data = get_CIFAR10_data()
for k, v in data.items():
print('%s: ' % k, v.shape)
Dropout forward pass:
主要思想:在训练时忽略部分神经元,在测试时测试全部神经元
完成cs231n/layers.py中的函数
def dropout_forward(x, dropout_param):
"""
Performs the forward pass for (inverted) dropout.
Inputs:
- x: Input data, of any shape
- dropout_param: A dictionary with the following keys:
- p: Dropout parameter. We keep each neuron output with probability p.
- mode: 'test' or 'train'. If the mode is train, then perform dropout;
if the mode is test, then just return the input.
- seed: Seed for the random number generator. Passing seed makes this
function deterministic, which is needed for gradient checking but not
in real networks.
Outputs:
- out: Array of the same shape as x.
- cache: tuple (dropout_param, mask). In training mode, mask is the dropout
mask that was used to multiply the input; in test mode, mask is None.
NOTE: Please implement **inverted** dropout, not the vanilla version of dropout.
See http://cs231n.github.io/neural-networks-2/#reg for more details.
NOTE 2: Keep in mind that p is the probability of **keep** a neuron
output; this might be contrary to some sources, where it is referred to
as the probability of dropping a neuron output.
"""
p, mode = dropout_param['p'], dropout_param['mode']
if 'seed' in dropout_param:
np.random.seed(dropout_param['seed'])
mask = None
out = None
if mode == 'train':
#######################################################################
# TODO: Implement training phase forward pass for inverted dropout. #
# Store the dropout mask in the mask variable. #
#######################################################################
mask = ( np.random.rand(*x.shape) < p ) / p #生成以xshape为大小的0到1的平均分布 这里/p是为了测试和训练保持数学期望
out = x * mask
#######################################################################
# END OF YOUR CODE #
#######################################################################
elif mode == 'test':
#######################################################################
# TODO: Implement the test phase forward pass for inverted dropout. #
#######################################################################
out = x
#######################################################################
# END OF YOUR CODE #
#######################################################################
cache = (dropout_param, mask)
out = out.astype(x.dtype, copy=False)
return out, cache
ln[3]:
np.random.seed(231)
x = np.random.randn(500, 500) + 10
for p in [0.25, 0.4, 0.7]:
out, _ = dropout_forward(x, {'mode': 'train', 'p': p})
out_test, _ = dropout_forward(x, {'mode': 'test', 'p': p})
print('Running tests with p = ', p)
print('Mean of input: ', x.mean())
print('Mean of train-time output: ', out.mean())
print('Mean of test-time output: ', out_test.mean())
print('Fraction of train-time output set to zero: ', (out == 0).mean())
print('Fraction of test-time output set to zero: ', (out_test == 0).mean())
print()
Dropout backward pass:
def dropout_backward(dout, cache):
"""
Perform the backward pass for (inverted) dropout.
Inputs:
- dout: Upstream derivatives, of any shape
- cache: (dropout_param, mask) from dropout_forward.
"""
dropout_param, mask = cache
mode = dropout_param['mode']
dx = None
if mode == 'train':
#######################################################################
# TODO: Implement training phase backward pass for inverted dropout #
#######################################################################
dx = dout * mask
#######################################################################
# END OF YOUR CODE #
#######################################################################
elif mode == 'test':
dx = dout
return dx
ln[4]:
np.random.seed(231)
x = np.random.randn(10, 10) + 10
dout = np.random.randn(*x.shape)
dropout_param = {'mode': 'train', 'p': 0.2, 'seed': 123}
out, cache = dropout_forward(x, dropout_param)
dx = dropout_backward(dout, cache)
dx_num = eval_numerical_gradient_array(lambda xx: dropout_forward(xx, dropout_param)[0], x, dout)
# Error should be around e-10 or less
print('dx relative error: ', rel_error(dx, dx_num))
*内联问题1:
如果我们不把递归递归的值除以递归层中的p会怎么样?为什么会这样呢?
如果我们不将这些值除以p,那么在测试时我们就不会考虑训练输出的平均值。因此,我们将只考虑所有可能导致爆炸梯度的子网络的总和。这是因为在测试时,我们需要训练阶段产生的预期输出的近似值,因为我们只执行一个前向调用,而没有掉出神经元。
Dropout with fully-connected nets
在文件cs231n/classifiers/fc_net.py中,修改实现以使用dropout。具体来说,如果网络的构造函数接收到的dropout参数的值不是1,那么网络应该在ReLU非线性后立即添加dropout。完成之后,运行以下命令以数字方式检查实现。
def affine_norm_relu_forward(x, w, b, gamma, beta, bn_param, normalization, dropout, do_param):
bn_cache, do_cache = None, None
# affine layer
out, fc_cache = affine_forward(x,w,b)
# batch/layer norm
if normalization == 'batchnorm':
out, bn_cache = batchnorm_forward(out, gamma, beta, bn_param)
elif normalization == 'layernorm':
out, bn_cache = layernorm_forward(out, gamma, beta, bn_param)
# relu
out, relu_cache = relu_forward(out)
# dropout
if dropout:
out, do_cache = dropout_forward(out, do_param)
return out, (fc_cache, bn_cache, relu_cache, do_cache)
def affine_norm_relu_backward(dout, cache, normalization, dropout):
fc_cache, bn_cache, relu_cache, do_cache = cache
# dropout
if dropout:
dout = dropout_backward(dout, do_cache)
# relu
dout = relu_backward(dout, relu_cache)
# batch/layer norm
dgamma, dbeta = None, None
if normalization == 'batchnorm':
dout, dgamma, dbeta = batchnorm_backward_alt(dout, bn_cache)
elif normalization == 'layernorm':
dout, dgamma, dbeta = layernorm_backward(dout, bn_cache)
# affine layer
dx, dw, db = affine_backward(dout, fc_cache)
return dx, dw, db, dgamma, dbeta
ln[5]:
np.random.seed(231)
N, D, H1, H2, C = 2, 15, 20, 30, 10
X = np.random.randn(N, D)
y = np.random.randint(C, size=(N,))
for dropout in [1, 0.75, 0.5]:
print('Running check with dropout = ', dropout)
model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,
weight_scale=5e-2, dtype=np.float64,
dropout=dropout, seed=123)
loss, grads = model.loss(X, y)
print('Initial loss: ', loss)
# Relative errors should be around e-6 or less; Note that it's fine
# if for dropout=1 you have W2 error be on the order of e-5.
for name in sorted(grads):
f = lambda _: model.loss(X, y)[0]
grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)
print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))
print()
正则化实验
作为一个实验,我们将在500个训练示例上训练一对两层网络:一个不适应dropout,一个使用keep probability为0.25。然后,我们将可视化训练和验证准确性
ln[6]:
# Train two identical nets, one with dropout and one without
np.random.seed(231)
num_train = 500
small_data = {
'X_train': data['X_train'][:num_train],
'y_train': data['y_train'][:num_train],
'X_val': data['X_val'],
'y_val': data['y_val'],
}
solvers = {}
dropout_choices = [1, 0.25]
for dropout in dropout_choices:
model = FullyConnectedNet([500], dropout=dropout)
print(dropout)
solver = Solver(model, small_data,
num_epochs=25, batch_size=100,
update_rule='adam',
optim_config={
'learning_rate': 5e-4,
},
verbose=True, print_every=100)
solver.train()
solvers[dropout] = solver
ln[7]:
# Plot train and validation accuracies of the two models
train_accs = []
val_accs = []
for dropout in dropout_choices:
solver = solvers[dropout]
train_accs.append(solver.train_acc_history[-1])
val_accs.append(solver.val_acc_history[-1])
plt.subplot(3, 1, 1)
for dropout in dropout_choices:
plt.plot(solvers[dropout].train_acc_history, 'o', label='%.2f dropout' % dropout)
plt.title('Train accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(ncol=2, loc='lower right')
plt.subplot(3, 1, 2)
for dropout in dropout_choices:
plt.plot(solvers[dropout].val_acc_history, 'o', label='%.2f dropout' % dropout)
plt.title('Val accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(ncol=2, loc='lower right')
plt.gcf().set_size_inches(15, 15)
plt.show()
内联问题2:
比较有dropout和没有dropout的验证和培训准确性——关于dropout作为一种正则化,你的结果表明什么?
结果表明,该模型存在过拟合问题。在训练阶段,当我们不使用dropout时,准确率非常高(在epoch 25: ~0.99);但是,dropout的精度较小(在epoch 25: ~0.93)。这表明,使用dropout我们学习的是一个更简单的模型,因此我们试图避免过拟合。在验证阶段,我们可以看到dropout得到了稍好的结果。这表明,使用dropout可以有效地正则化模型,减少过拟合。
内联问题3:
假设我们为了图像分类用dropout去训练深的全连通网络,那如果我们决定减少隐藏层的大小(也就是说,在每一层的节点数),我们应该如何修改p?
如果我们决定减少隐藏层的大小,我们不需要修改p,因为神经元的数量会与隐藏层的大小成比例。举个例子,假设在一个隐含层中有n=1024个神经元,我们使用p=0.5。因此,期望丢失的神经元数为pn=0.51024=512。如果我们将隐含层的神经元数减少到n=512,用同样的p=0.5,则期望的丢失神经元数为pn=0.5512=256。因此,当我们改变隐藏层的大小时,我们不需要修改p。