随着网络深度的增加,BP算法引起的梯度消失问题愈发严重
《Neural Networks and Deep Learning》
LSTM解决梯度消失的原理
论文《Deep Residual Learning for Image Recognition》中提到
Driven by the significance of depth, a question arises: Is learning better networks as easy as stacking more layers? An obstacle to answering this question was the notorious problem ofvanishing/exploding gradients [1, 9], which hamper convergence from the beginning. This problem,however, has been largely addressed bynormalized initialization[23, 9, 37, 13] and