起因/背景:梯度消失 vanishing gradient problem
DNN的训练中,由于梯度消失,即输出层的错误在反向传播的过程中会显著地越来越小,所以靠近输入层的层的梯度就接近0, 所以这些层的参数就会得不到更新。而只有靠近输出层的那几层的参数得到了很好的更新。这是比较深的多层NN训练的一个难点。
梯度消失:随着NN的隐层数目增加,网络加深,从输出层传回的错误信息的量就会显著减少。于是那些靠近输出层的隐层能够被正常更新参数,但是靠近输入层的隐层却被更新地很慢或者完全更新不了。
Training deep neural networks was traditionally challenging as the vanishing gradient meant that weights in layers close to the input layer were not updated in response to errors calculated on the training dataset.
As the number of hidden layers is increased, the amount of error information propagated back to earlier layers is dramatically reduced. This means that weights in hidden layers close to the output layer are updated normally, whereas weights in hidden layers close to the input layer are updated minimally or not at all.
针对这个难点,一个创新的方法诞生了:逐层贪婪预训练&#