Forward
这一次的作业非常简单,代码也是极其的少,建议阅读下课程对于dropout实现的 说明,我们要实现的是inverted dropout,说通俗点就是因为测试的时候不做dropout,所以我们在训练过dropout层的时候除掉p,以防样本均值和测试的时候不同。
def dropout_forward(x, dropout_param):
"""
Performs the forward pass for (inverted) dropout.
Inputs:
- x: Input data, of any shape
- dropout_param: A dictionary with the following keys:
- p: Dropout parameter. We keep each neuron output with probability p.
- mode: 'test' or 'train'. If the mode is train, then perform dropout;
if the mode is test, then just return the input.
- seed: Seed for the random number generator. Passing seed makes this
function deterministic, which is needed for gradient checking but not
in real networks.
Outputs:
- out: Array of the same shape as x.
- cache: tuple (dropout_param, mask). In training mode, mask is the dropout
mask that was used to multiply the input; in test mode, mask is None.
NOTE: Please implement **inverted** dropout, not the vanilla version of dropout.
See http://cs231n.github.io/neural-networks-2/#reg for more details.
NOTE 2: Keep in mind that p is the probability of **keep** a neuron
output; this might be contrary to some sources, where it is referred to
as the probability of dropping a neuron output.
"""
p, mode = dropout_param['p'], dropout_param['mode']
if 'seed' in dropout_param:
np.random.seed(dropout_param['seed'])
mask = None
out = None
if mode == 'train':
#######################################################################
# TODO: Implement training phase forward pass for inverted dropout. #
# Store the dropout mask in the mask variable. #
#######################################################################
mask = (np.random.rand(*x.shape) < p) / p
out = x * mask
#######################################################################
# END OF YOUR CODE #
#######################################################################
elif mode == 'test':
#######################################################################
# TODO: Implement the test phase forward pass for inverted dropout. #
#######################################################################
out = x
#######################################################################
# END OF YOUR CODE #
#######################################################################
cache = (dropout_param, mask)
out = out.astype(x.dtype, copy=False)
return out, cache
Backward
反向传播也是非常简单,由于只涉及一个乘法操作,我们直接给dout乘上mask就好了。
def dropout_backward(dout, cache):
"""
Perform the backward pass for (inverted) dropout.
Inputs:
- dout: Upstream derivatives, of any shape
- cache: (dropout_param, mask) from dropout_forward.
"""
dropout_param, mask = cache
mode = dropout_param['mode']
dx = None
if mode == 'train':
#######################################################################
# TODO: Implement training phase backward pass for inverted dropout #
#######################################################################
dx = dout * mask
#######################################################################
# END OF YOUR CODE #
#######################################################################
elif mode == 'test':
dx = dout
return dx
接下来作业给我们展示了应用dropout之后带来的类似正则化的效果,同时抛出一个小问题:
Suppose we are training a deep fully-connected network for image classification, with dropout after hidden layers (parameterized by keep probability p). How should we modify p, if at all, if we decide to decrease the size of the hidden layers (that is, the number of nodes in each layer)?
答案也是很显而易见了,如果隐藏层size被减小,那我们当然得调大一点p的值,不然连接岂不是太少了。
至此,Q3就结束了,用时3分钟做完,舒服至极哈哈哈哈哈。