CS231n-assignment2-Dropout

Dropout是一种正则化神经网络的技术,它通过在前向传递过程中将一些特征随机设置为零。在本练习中,您将实现一个dropout层,并修改完全连接的网络以选择使用dropout
In[1]:

# As usual, a bit of setup
from __future__ import print_function
import time
import numpy as np
import matplotlib.pyplot as plt
from cs231n.classifiers.fc_net import *
from cs231n.data_utils import get_CIFAR10_data
from cs231n.gradient_check import eval_numerical_gradient, eval_numerical_gradient_array
from cs231n.solver import Solver

#%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):
  """ returns relative error """
  return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

In[2]:

# Load the (preprocessed) CIFAR10 data.

data = get_CIFAR10_data()
for k, v in data.items():
  print('%s: ' % k, v.shape)


Dropout forward pass:

主要思想:在训练时忽略部分神经元,在测试时测试全部神经元

完成cs231n/layers.py中的函数

def dropout_forward(x, dropout_param):
    """
    Performs the forward pass for (inverted) dropout.

    Inputs:
    - x: Input data, of any shape
    - dropout_param: A dictionary with the following keys:
      - p: Dropout parameter. We keep each neuron output with probability p.
      - mode: 'test' or 'train'. If the mode is train, then perform dropout;
        if the mode is test, then just return the input.
      - seed: Seed for the random number generator. Passing seed makes this
        function deterministic, which is needed for gradient checking but not
        in real networks.

    Outputs:
    - out: Array of the same shape as x.
    - cache: tuple (dropout_param, mask). In training mode, mask is the dropout
      mask that was used to multiply the input; in test mode, mask is None.

    NOTE: Please implement **inverted** dropout, not the vanilla version of dropout.
    See http://cs231n.github.io/neural-networks-2/#reg for more details.

    NOTE 2: Keep in mind that p is the probability of **keep** a neuron
    output; this might be contrary to some sources, where it is referred to
    as the probability of dropping a neuron output.
    """
    p, mode = dropout_param['p'], dropout_param['mode']
    if 'seed' in dropout_param:
        np.random.seed(dropout_param['seed'])

    mask = None
    out = None

    if mode == 'train':
        #######################################################################
        # TODO: Implement training phase forward pass for inverted dropout.   #
        # Store the dropout mask in the mask variable.                        #
        #######################################################################
        mask = ( np.random.rand(*x.shape) < p ) / p #生成以xshape为大小的0到1的平均分布  这里/p是为了测试和训练保持数学期望
        out = x * mask
        #######################################################################
        #                           END OF YOUR CODE                          #
        #######################################################################
    elif mode == 'test':
        #######################################################################
        # TODO: Implement the test phase forward pass for inverted dropout.   #
        #######################################################################
        out = x
        #######################################################################
        #                            END OF YOUR CODE                         #
        #######################################################################

    cache = (dropout_param, mask)
    out = out.astype(x.dtype, copy=False)

    return out, cache

ln[3]:

np.random.seed(231)
x = np.random.randn(500, 500) + 10

for p in [0.25, 0.4, 0.7]:
  out, _ = dropout_forward(x, {'mode': 'train', 'p': p})
  out_test, _ = dropout_forward(x, {'mode': 'test', 'p': p})

  print('Running tests with p = ', p)
  print('Mean of input: ', x.mean())
  print('Mean of train-time output: ', out.mean())
  print('Mean of test-time output: ', out_test.mean())
  print('Fraction of train-time output set to zero: ', (out == 0).mean())
  print('Fraction of test-time output set to zero: ', (out_test == 0).mean())
  print()


Dropout backward pass:

def dropout_backward(dout, cache):
    """
    Perform the backward pass for (inverted) dropout.

    Inputs:
    - dout: Upstream derivatives, of any shape
    - cache: (dropout_param, mask) from dropout_forward.
    """
    dropout_param, mask = cache
    mode = dropout_param['mode']

    dx = None
    if mode == 'train':
        #######################################################################
        # TODO: Implement training phase backward pass for inverted dropout   #
        #######################################################################
        dx = dout * mask
        #######################################################################
        #                          END OF YOUR CODE                           #
        #######################################################################
    elif mode == 'test':
        dx = dout
    return dx

ln[4]:

np.random.seed(231)
x = np.random.randn(10, 10) + 10
dout = np.random.randn(*x.shape)

dropout_param = {'mode': 'train', 'p': 0.2, 'seed': 123}
out, cache = dropout_forward(x, dropout_param)
dx = dropout_backward(dout, cache)
dx_num = eval_numerical_gradient_array(lambda xx: dropout_forward(xx, dropout_param)[0], x, dout)

# Error should be around e-10 or less
print('dx relative error: ', rel_error(dx, dx_num))


*内联问题1:
如果我们不把递归递归的值除以递归层中的p会怎么样?为什么会这样呢?

如果我们不将这些值除以p,那么在测试时我们就不会考虑训练输出的平均值。因此,我们将只考虑所有可能导致爆炸梯度的子网络的总和。这是因为在测试时,我们需要训练阶段产生的预期输出的近似值,因为我们只执行一个前向调用,而没有掉出神经元。

Dropout with fully-connected nets
在文件cs231n/classifiers/fc_net.py中,修改实现以使用dropout。具体来说,如果网络的构造函数接收到的dropout参数的值不是1,那么网络应该在ReLU非线性后立即添加dropout。完成之后,运行以下命令以数字方式检查实现。

def affine_norm_relu_forward(x, w, b, gamma, beta, bn_param, normalization, dropout, do_param):
    bn_cache, do_cache = None, None
    # affine layer
    out, fc_cache = affine_forward(x,w,b)
    # batch/layer norm
    if normalization == 'batchnorm':
       out, bn_cache = batchnorm_forward(out, gamma, beta, bn_param)
    elif normalization == 'layernorm':
       out, bn_cache = layernorm_forward(out, gamma, beta, bn_param)       
    # relu
    out, relu_cache = relu_forward(out)
    # dropout
    if dropout:
       out, do_cache = dropout_forward(out, do_param)
    return out, (fc_cache, bn_cache, relu_cache, do_cache)

def affine_norm_relu_backward(dout, cache, normalization, dropout):
    fc_cache, bn_cache, relu_cache, do_cache = cache
    # dropout
    if dropout:
       dout = dropout_backward(dout, do_cache)
    # relu
    dout = relu_backward(dout, relu_cache)
    # batch/layer norm
    dgamma, dbeta = None, None
    if normalization == 'batchnorm':
       dout, dgamma, dbeta = batchnorm_backward_alt(dout, bn_cache)   
    elif normalization == 'layernorm':
       dout, dgamma, dbeta = layernorm_backward(dout, bn_cache)
    # affine layer
    dx, dw, db = affine_backward(dout, fc_cache)
    return dx, dw, db, dgamma, dbeta

ln[5]:

np.random.seed(231)
N, D, H1, H2, C = 2, 15, 20, 30, 10
X = np.random.randn(N, D)
y = np.random.randint(C, size=(N,))

for dropout in [1, 0.75, 0.5]:
  print('Running check with dropout = ', dropout)
  model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,
                            weight_scale=5e-2, dtype=np.float64,
                            dropout=dropout, seed=123)

  loss, grads = model.loss(X, y)
  print('Initial loss: ', loss)
  
  # Relative errors should be around e-6 or less; Note that it's fine
  # if for dropout=1 you have W2 error be on the order of e-5.
  for name in sorted(grads):
    f = lambda _: model.loss(X, y)[0]
    grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)
    print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))
  print()


正则化实验

作为一个实验,我们将在500个训练示例上训练一对两层网络:一个不适应dropout,一个使用keep probability为0.25。然后,我们将可视化训练和验证准确性
ln[6]:

# Train two identical nets, one with dropout and one without
np.random.seed(231)
num_train = 500
small_data = {
  'X_train': data['X_train'][:num_train],
  'y_train': data['y_train'][:num_train],
  'X_val': data['X_val'],
  'y_val': data['y_val'],
}

solvers = {}
dropout_choices = [1, 0.25]
for dropout in dropout_choices:
  model = FullyConnectedNet([500], dropout=dropout)
  print(dropout)

  solver = Solver(model, small_data,
                  num_epochs=25, batch_size=100,
                  update_rule='adam',
                  optim_config={
                    'learning_rate': 5e-4,
                  },
                  verbose=True, print_every=100)
  solver.train()
  solvers[dropout] = solver

ln[7]:

# Plot train and validation accuracies of the two models

train_accs = []
val_accs = []
for dropout in dropout_choices:
  solver = solvers[dropout]
  train_accs.append(solver.train_acc_history[-1])
  val_accs.append(solver.val_acc_history[-1])

plt.subplot(3, 1, 1)
for dropout in dropout_choices:
  plt.plot(solvers[dropout].train_acc_history, 'o', label='%.2f dropout' % dropout)
plt.title('Train accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(ncol=2, loc='lower right')
  
plt.subplot(3, 1, 2)
for dropout in dropout_choices:
  plt.plot(solvers[dropout].val_acc_history, 'o', label='%.2f dropout' % dropout)
plt.title('Val accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(ncol=2, loc='lower right')

plt.gcf().set_size_inches(15, 15)
plt.show()


内联问题2:

比较有dropout和没有dropout的验证和培训准确性——关于dropout作为一种正则化,你的结果表明什么?

结果表明,该模型存在过拟合问题。在训练阶段,当我们不使用dropout时,准确率非常高(在epoch 25: ~0.99);但是,dropout的精度较小(在epoch 25: ~0.93)。这表明,使用dropout我们学习的是一个更简单的模型,因此我们试图避免过拟合。在验证阶段,我们可以看到dropout得到了稍好的结果。这表明,使用dropout可以有效地正则化模型,减少过拟合。

内联问题3:

假设我们为了图像分类用dropout去训练深的全连通网络,那如果我们决定减少隐藏层的大小(也就是说,在每一层的节点数),我们应该如何修改p?

如果我们决定减少隐藏层的大小,我们不需要修改p,因为神经元的数量会与隐藏层的大小成比例。举个例子,假设在一个隐含层中有n=1024个神经元,我们使用p=0.5。因此,期望丢失的神经元数为pn=0.51024=512。如果我们将隐含层的神经元数减少到n=512,用同样的p=0.5,则期望的丢失神经元数为pn=0.5512=256。因此,当我们改变隐藏层的大小时,我们不需要修改p。

  • 4
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Esaka7

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值