CS231n 课程作业 Assignment Two(四)Dropout(0829)

Assignment Two(四)Dropout

一、原理

正向传递过程中,激活层之后。将某些输出激活随机设置为零,对神经网络进行正则化,这样可以有效地缓解过拟合的现象

二、实现

2.1 Dropout forward
def dropout_forward(x, dropout_param):
    p, mode = dropout_param["p"], dropout_param["mode"]
    if "seed" in dropout_param:
        np.random.seed(dropout_param["seed"])
    mask = None
    out = None

    if mode == "train":
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        # np.random.rand生成0-1的小数,如果生成的数小于p,保留输入;否则,输入置零
        # 由于测试时不使用dropout,而又希望测试时的输出和训练时的输出均值相近
        # 所以要除以p(使用dropout相当于输入均值乘以p)
        mask = (np.random.rand(*x.shape)<p)/p
        out = x*mask        
        pass
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    elif mode == "test":
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        out = x        
        pass
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    cache = (dropout_param, mask)
    out = out.astype(x.dtype, copy=False)
    return out, cache

代码分析:

输入:
- x: 输入数据
- dropout_param:
  - p: dropout参数,p越大,保留的输入越多
  - mode: 两个值,'test'或'train'。如果是训练,就使用dropout,否则直接返回输入
  - seed: 用于生成每个输入dropout的概率,只用于梯度检验
输出:
- out: 输出,和输入相同维度
- cache: (dropout_param, mask)

Test 2.1
输出:

Running tests with p =  0.25
Mean of input:  10.000207878477502
Mean of train-time output:  10.014059116977283
Mean of test-time output:  10.000207878477502
Fraction of train-time output set to zero:  0.749784
Fraction of test-time output set to zero:  0.0

Running tests with p =  0.4
Mean of input:  10.000207878477502
Mean of train-time output:  9.977917658761159
Mean of test-time output:  10.000207878477502
Fraction of train-time output set to zero:  0.600796
Fraction of test-time output set to zero:  0.0

Running tests with p =  0.7
Mean of input:  10.000207878477502
Mean of train-time output:  9.987811912159426
Mean of test-time output:  10.000207878477502
Fraction of train-time output set to zero:  0.30074
Fraction of test-time output set to zero:  0.0
2.2 Dropout backward
def dropout_backward(dout, cache):
    dropout_param, mask = cache
    mode = dropout_param["mode"]

    dx = None
    if mode == "train":
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        dx = dout*mask        
        pass
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    elif mode == "test":
        dx = dout
    return dx

Test 2.2
输出:dx relative error: 5.44560814873387e-11

三、Fully-connected nets with Dropout

np.random.seed(231)
N, D, H1, H2, C = 2, 15, 20, 30, 10
X = np.random.randn(N, D)
y = np.random.randint(C, size=(N,))

for dropout in [1, 0.75, 0.5]:
  print('Running check with dropout = ', dropout)
  model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,
                            weight_scale=5e-2, dtype=np.float64,
                            dropout=dropout, seed=123)

  loss, grads = model.loss(X, y)
  print('Initial loss: ', loss)
  
  # Relative errors should be around e-6 or less; Note that it's fine
  # if for dropout=1 you have W2 error be on the order of e-5.
  for name in sorted(grads):
    f = lambda _: model.loss(X, y)[0]
    grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)
    print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))
  print()

输出:

Running check with dropout =  1
Initial loss:  2.3004790897684924
W1 relative error: 1.48e-07
W2 relative error: 2.21e-05
W3 relative error: 3.53e-07
b1 relative error: 5.38e-09
b2 relative error: 2.09e-09
b3 relative error: 5.80e-11

Running check with dropout =  0.75
Initial loss:  2.302371489704412
W1 relative error: 1.90e-07
W2 relative error: 4.76e-06
W3 relative error: 2.60e-08
b1 relative error: 4.73e-09
b2 relative error: 1.82e-09
b3 relative error: 1.70e-10

Running check with dropout =  0.5
Initial loss:  2.3042759220785896
W1 relative error: 3.11e-07
W2 relative error: 1.84e-08
W3 relative error: 5.35e-08
b1 relative error: 5.37e-09
b2 relative error: 2.99e-09
b3 relative error: 1.13e-10

四、正则化实验

在500个训练示例上训练一对两层网络:一个将不使用dropout,一个将使用0.25概率的dropout。 然后,我们将随着时间的推移可视化这两个网络的训练和验证准确性。

# Train two identical nets, one with dropout and one without
np.random.seed(231)
num_train = 500
small_data = {
  'X_train': data['X_train'][:num_train],
  'y_train': data['y_train'][:num_train],
  'X_val': data['X_val'],
  'y_val': data['y_val'],
}

solvers = {}
dropout_choices = [1, 0.25]
for dropout in dropout_choices:
  model = FullyConnectedNet([500], dropout=dropout)
  print(dropout)

  solver = Solver(model, small_data,
                  num_epochs=25, batch_size=100,
                  update_rule='adam',
                  optim_config={
                    'learning_rate': 5e-4,
                  },
                  verbose=True, print_every=100)
  solver.train()
  solvers[dropout] = solver
  print()

可视化输出:

在这里插入图片描述

五、问题回答

Inline Question 1:

提问:What happens if we do not divide the values being passed through inverse dropout by p in the dropout layer? Why does that happen?

回答:大面积失活?

Inline Question 2:

提问:Compare the validation and training accuracies with and without dropout – what do your results suggest about dropout as a regularizer?

翻译:比较是否采用dropout的准确性-您的结果对dropout作为正则化函数有何建议?

回答:虽然dropout能够用于防止过拟合,但p太小的话容易造成欠拟合

Inline Question 3:

提问:Suppose we are training a deep fully-connected network for image classification, with dropout after hidden layers (parameterized by keep probability p). If we are concerned about overfitting, how should we modify p (if at all) when we decide to decrease the size of the hidden layers (that is, the number of nodes in each layer)?

翻译:假设我们正在训练一个深层的全连接网络进行图像分类,隐藏层之后会丢失(通过保持概率p进行参数化)。 如果我们担心过度拟合,那么当我们决定减小隐藏层的大小(即每层中的节点数)时,应该如何修改p(如果有的话)?

回答:应该适当减小p

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值