类神经网络训练不出来怎么办(一)
梯度为0
- Local minima
- Saddle point
两种可能情况,统称为:critical point
怎么知道是Local minima还是Saddle point?
给定 L ( θ ) , θ = θ ′ L(\theta),\theta=\theta^{'} L(θ),θ=θ′
在 θ ′ \theta^{'} θ′附近: L ( θ ) ≈ L ( θ ′ ) + ( θ − θ ′ ) T g + 1 2 ( θ − θ ′ ) T H ( θ − θ ′ ) L(\theta)\approx L(\theta^{'})+(\theta-\theta^{'})^Tg+\frac{1}{2}(\theta-\theta^{'})^TH(\theta-\theta^{'}) L(θ)≈L(θ′)+(θ−θ′)Tg+21(θ−θ′)TH(θ−θ′)
g: Gradient H: Hessian matrix
当g 等于0时, L ( θ ) ≈ L ( θ ′ ) + 1 2 ( θ − θ ′ ) T H ( θ − θ ′ ) L(\theta)\approx L(\theta^{'})+\frac{1}{2}(\theta-\theta^{'})^TH(\theta-\theta^{'}) L(θ)≈L(θ′)+21(θ−θ′)TH(θ−θ′) H正定:Local minima H负定: Local maxima H不定:Saddle point
当是一个Saddle point时,可以根据H判断更新方向使Loss继续下降:
设 u u u是H特特征值为 λ < 0 \lambda<0 λ<0的特征向量, u T H u = λ ∣ ∣ u ∣ ∣ 2 < 0 u^THu=\lambda||u||^2<0 uTHu=λ∣∣u∣∣2<0,当 θ − θ ′ = u \theta-\theta^{'}=u θ−θ′=u时, L ( θ ) ≈ L ( θ ′ ) + 1 2 λ ∣ ∣ u ∣ ∣ 2 L(\theta)\approx L(\theta^{'})+\frac{1}{2}\lambda ||u||^2 L(θ)≈L(θ′)+21λ∣∣u∣∣2
所以 L ( θ ) < L ( θ ′ ) L(\theta)<L(\theta^{'}) L(θ)<L(θ′),即沿 u u u方向移动可以使Loss变小 θ = θ ′ + u \theta=\theta{'}+u θ=θ′+u
Saddle point VS. Local minima
经验上看Local minima并不是很常见