目标函数要能防止网络的输出始终是一个单一数值,例如0。
- 当training set和test set数据分布不一致时,保证validate set和test set数据分布一致
- High bias? 增大加深网络。Large and deep。
- High variance?找更多训练数据,regularization
增大加深网络,同时增大训练集数据量,总是可取的。
- Regularization:
- L2 平方和, 对矩阵是Frobenius Norm,在神经网络中也被称作 Weight decay
- L1 绝对值的和
- Dropout
- 训练的时候在每一层按照阈值 p ( 0 < p < 1 ) p (0<p<1) p(0<p<1)随机忽略一些节点,每一层的输出 a a a最后要除以 p p p,即 a = a / p a = a/p a=a/p,以保持输出的大小。By doing this you are assuring that the result of the cost will still have the same expected value as without drop-out. (This technique is also called inverted dropout.)
- 对于参数较多的层,设置较大的dropout率,参数较少的层,减小dropout率。
- 对输入层不要dropout
- 测试时关闭dropout,只在训练时使用。
- Apply dropout both during forward and backward propagation.
- The dropped neurons don’t contribute to the training in both the forward and backward propagations of the iteration.
- At each iteration, you train a different model that uses only a subset of your neurons. With dropout, your neurons thus become less sensitive to the activation of one other specific neuron, because that other neuron might be shut down at any time.
- 数据扩增,增大数据量。水平翻转、旋转、变形
- Early stopping。验证集误差开始增加时停止训练。此时权重参数还比较小,因此能避免overfitting
- Note that regularization hurts training set performance! This is because it limits the ability of the network to overfit to the training set. But since it ultimately gives better test accuracy, it is helping your system.
为什么R