代码:https://github.com/LiuZhe6/CS231N
为了防止神经网络过拟合数据,可以采用dropout方法。其主要思想是:对隐藏层中部分输出或者权重随机置为0。
Dropout
forward pass
题目要求使用inverted dropout,其主要思想是在训练阶段在mask的基础上处以p,使得在测试阶段不需要修改。
layers.py中dropout_forward()
if mode == 'train':
#######################################################################
# Implement training phase forward pass for inverted dropout. #
# Store the dropout mask in the mask variable. #
#######################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
mask = (np.random.rand(*x.shape) < p) / p
out = x * mask
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
#######################################################################
# END OF YOUR CODE #
#######################################################################
elif mode == 'test':
#######################################################################
# Implement the test phase forward pass for inverted dropout. #
#######################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
out = x
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
#######################################################################
# END OF YOUR CODE #
backward pass
前向传播过程中将mask保存下来了,故直接使用mask进行梯度计算。
layers.py中的dropout_backward()
if mode == 'train':
#######################################################################
# Implement training phase backward pass for inverted dropout #
#######################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
dx = mask * dout
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
#######################################################################
# END OF YOUR CODE #
#######################################################################
elif mode == 'test':
dx = dout
Inline Question
Inline Question 1
What happens if we do not divide the values being passed through inverse dropout by p
in the dropout layer? Why does that happen?
Answer
如果在训练阶段不除以p,则需要在测试时处以p。
Inline Question 2
Compare the validation and training accuracies with and without dropout – what do your results suggest about dropout as a regularizer?
Answer
使用了dropout的网络,很明显使训练集准确率得到控制,避免了过拟合。
Inline Question 3
Suppose we are training a deep fully-connected network for image classification, with dropout after hidden layers (parameterized by keep probability p). If we are concerned about overfitting, how should we modify p (if at all) when we decide to decrease the size of the hidden layers (that is, the number of nodes in each layer)?
Answer
关心过拟合与否时,若此时减小隐藏层中节点的大小,此时网络整体变得简单,因此减小p可以使得dropout过程丢失的节点少一些,不至于网络变得欠拟合。