CS231n 课程作业 Assignment Two（四）Dropout（0829）

最新推荐文章于 2020-09-24 23:34:20 发布

阿桥今天吃饱了吗

最新推荐文章于 2020-09-24 23:34:20 发布

阅读量532

点赞数

分类专栏：计算机视觉文章标签：神经网络

本文链接：https://blog.csdn.net/yq1271/article/details/108306604

版权

计算机视觉专栏收录该内容

24 篇文章 6 订阅

订阅专栏

Assignment Two（四）Dropout

一、原理

正向传递过程中，激活层之后。将某些输出激活随机设置为零，对神经网络进行正则化，这样可以有效地缓解过拟合的现象

二、实现

2.1 Dropout forward

def dropout_forward(x, dropout_param):
    p, mode = dropout_param["p"], dropout_param["mode"]
    if "seed" in dropout_param:
        np.random.seed(dropout_param["seed"])
    mask = None
    out = None

    if mode == "train":
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        # np.random.rand生成0-1的小数，如果生成的数小于p，保留输入；否则，输入置零
        # 由于测试时不使用dropout，而又希望测试时的输出和训练时的输出均值相近
        # 所以要除以p（使用dropout相当于输入均值乘以p）
        mask = (np.random.rand(*x.shape)<p)/p
        out = x*mask        
        pass
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    elif mode == "test":
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        out = x        
        pass
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    cache = (dropout_param, mask)
    out = out.astype(x.dtype, copy=False)
    return out, cache

代码分析：

输入:
- x: 输入数据
- dropout_param:
  - p: dropout参数，p越大，保留的输入越多
  - mode: 两个值，'test'或'train'。如果是训练，就使用dropout，否则直接返回输入
  - seed: 用于生成每个输入dropout的概率，只用于梯度检验
输出:
- out: 输出，和输入相同维度
- cache: (dropout_param, mask)

Test 2.1
输出：

Running tests with p =  0.25
Mean of input:  10.000207878477502
Mean of train-time output:  10.014059116977283
Mean of test-time output:  10.000207878477502
Fraction of train-time output set to zero:  0.749784
Fraction of test-time output set to zero:  0.0

Running tests with p =  0.4
Mean of input:  10.000207878477502
Mean of train-time output:  9.977917658761159
Mean of test-time output:  10.000207878477502
Fraction of train-time output set to zero:  0.600796
Fraction of test-time output set to zero:  0.0

Running tests with p =  0.7
Mean of input:  10.000207878477502
Mean of train-time output:  9.987811912159426
Mean of test-time output:  10.000207878477502
Fraction of train-time output set to zero:  0.30074
Fraction of test-time output set to zero:  0.0

2.2 Dropout backward

def dropout_backward(dout, cache):
    dropout_param, mask = cache
    mode = dropout_param["mode"]

    dx = None
    if mode == "train":
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        dx = dout*mask        
        pass
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    elif mode == "test":
        dx = dout
    return dx

Test 2.2
输出：dx relative error: 5.44560814873387e-11

三、Fully-connected nets with Dropout

np.random.seed(231)
N, D, H1, H2, C = 2, 15, 20, 30, 10
X = np.random.randn(N, D)
y = np.random.randint(C, size=(N,))

for dropout in [1, 0.75, 0.5]:
  print('Running check with dropout = ', dropout)
  model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,
                            weight_scale=5e-2, dtype=np.float64,
                            dropout=dropout, seed=123)

  loss, grads = model.loss(X, y)
  print('Initial loss: ', loss)
  
  # Relative errors should be around e-6 or less; Note that it's fine
  # if for dropout=1 you have W2 error be on the order of e-5.
  for name in sorted(grads):
    f = lambda _: model.loss(X, y)[0]
    grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)
    print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))
  print()

输出：

Running check with dropout =  1
Initial loss:  2.3004790897684924
W1 relative error: 1.48e-07
W2 relative error: 2.21e-05
W3 relative error: 3.53e-07
b1 relative error: 5.38e-09
b2 relative error: 2.09e-09
b3 relative error: 5.80e-11

Running check with dropout =  0.75
Initial loss:  2.302371489704412
W1 relative error: 1.90e-07
W2 relative error: 4.76e-06
W3 relative error: 2.60e-08
b1 relative error: 4.73e-09
b2 relative error: 1.82e-09
b3 relative error: 1.70e-10

Running check with dropout =  0.5
Initial loss:  2.3042759220785896
W1 relative error: 3.11e-07
W2 relative error: 1.84e-08
W3 relative error: 5.35e-08
b1 relative error: 5.37e-09
b2 relative error: 2.99e-09
b3 relative error: 1.13e-10

四、正则化实验

在500个训练示例上训练一对两层网络：一个将不使用dropout，一个将使用0.25概率的dropout。然后，我们将随着时间的推移可视化这两个网络的训练和验证准确性。

# Train two identical nets, one with dropout and one without
np.random.seed(231)
num_train = 500
small_data = {
  'X_train': data['X_train'][:num_train],
  'y_train': data['y_train'][:num_train],
  'X_val': data['X_val'],
  'y_val': data['y_val'],
}

solvers = {}
dropout_choices = [1, 0.25]
for dropout in dropout_choices:
  model = FullyConnectedNet([500], dropout=dropout)
  print(dropout)

  solver = Solver(model, small_data,
                  num_epochs=25, batch_size=100,
                  update_rule='adam',
                  optim_config={
                    'learning_rate': 5e-4,
                  },
                  verbose=True, print_every=100)
  solver.train()
  solvers[dropout] = solver
  print()

可视化输出：

在这里插入图片描述

五、问题回答

Inline Question 1:

提问：What happens if we do not divide the values being passed through inverse dropout by p in the dropout layer? Why does that happen?

回答：大面积失活？

Inline Question 2:

提问：Compare the validation and training accuracies with and without dropout – what do your results suggest about dropout as a regularizer?

翻译：比较是否采用dropout的准确性-您的结果对dropout作为正则化函数有何建议？

回答：虽然dropout能够用于防止过拟合，但p太小的话容易造成欠拟合

Inline Question 3:

提问：Suppose we are training a deep fully-connected network for image classification, with dropout after hidden layers (parameterized by keep probability p). If we are concerned about overfitting, how should we modify p (if at all) when we decide to decrease the size of the hidden layers (that is, the number of nodes in each layer)?

翻译：假设我们正在训练一个深层的全连接网络进行图像分类，隐藏层之后会丢失（通过保持概率p进行参数化）。如果我们担心过度拟合，那么当我们决定减小隐藏层的大小（即每层中的节点数）时，应该如何修改p（如果有的话）？

回答：应该适当减小p

阿桥今天吃饱了吗

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
CS231n 课程作业 Assignment Two（四）Dropout（0829）

Assignment Two（四）Dropout一、原理正向传递过程中，激活层之后。将某些输出激活随机设置为零，对神经网络进行正则化，这样可以有效地缓解过拟合的现象二、实现2.1 Dropout forwarddef dropout_forward(x, dropout_param): """ Performs the forward pass for (inverted) dropout. Inputs: - x: Input data, of any shap
复制链接

扫一扫

专栏目录