马鞍点与局部最优点
对损失函数进行优化时会出现Loss不在下降,但是此时Loss并没有下降到期望的大小,此时gradient接近于0,一般是局部最小,但是,也十分可能是到了一个马鞍点,gradient一样近于0
但是对于局部最小点(local minima)已经无法优化了,而马鞍点(saddle point)却还可以优化,那么如何区分是不是马鞍点?
区分Saddle point与local minima
Math
泰勒近似
在
θ
\theta
θ附近
θ
′
\theta'
θ′处, 通过泰勒级数接近(Taler Series Approximation), 把损失函数
L
(
θ
)
L(\theta)
L(θ)展开:
L
(
θ
)
=
L
(
θ
′
)
+
(
θ
−
θ
′
)
T
g
+
1
2
(
θ
−
θ
′
)
T
H
(
θ
−
θ
′
)
L(\theta) = L(\theta') + (\theta - \theta')^Tg + \frac{1}{2}(\theta - \theta')^TH(\theta - \theta')
L(θ)=L(θ′)+(θ−θ′)Tg+21(θ−θ′)TH(θ−θ′)
g
g
g是gradient, 是一个向量,
g
=
∇
L
(
θ
′
)
,
g
i
=
∂
L
(
θ
′
)
∂
θ
i
g = \nabla L(\theta'), \quad g_i = \frac{\partial L(\theta')}{\partial \theta_i}
g=∇L(θ′),gi=∂θi∂L(θ′)
H H H是一个Hessian矩阵, H i j = ∂ 2 ∂ θ i ∂ θ j L ( θ ′ ) H_{ij} = \frac{\partial^2}{\partial \theta_i \partial \theta_j} L(\theta') Hij=∂θi∂θj∂2L(θ′)
判断
At critical point:
L ( θ ) ≈ L ( θ ′ ) + 1 2 ( θ − θ ′ ) T H ( θ − θ ′ ) L(\theta) \approx L(\theta') + \frac{1}{2} (\theta - \theta')^T H (\theta - \theta') L(θ)≈L(θ′)+21(θ−θ′)TH(θ−θ′)
v T H v v^T H v vTHv
- For all v :
- If
v
T
H
v
>
0
v^THv > 0
vTHv>0 ⟹ Around
θ
′
\theta'
θ′:
L
(
θ
)
>
L
(
θ
′
)
L(\theta) > L(\theta')
L(θ)>L(θ′) ⟹ Local minima
== H is positive definite == All eigen values are positive - If
v
T
H
v
<
0
v^THv < 0
vTHv<0 ⟹ Around
θ
′
\theta'
θ′:
L
(
θ
)
<
L
(
θ
′
)
L(\theta) < L(\theta')
L(θ)<L(θ′) ⟹ Local maxima
== H is negative definite == All eigen values are negative
- If
v
T
H
v
>
0
v^THv > 0
vTHv>0 ⟹ Around
θ
′
\theta'
θ′:
L
(
θ
)
>
L
(
θ
′
)
L(\theta) > L(\theta')
L(θ)>L(θ′) ⟹ Local minima
- Sometimes v T H v > 0 v^THv > 0 vTHv>0 , sometimes v T H v < 0 v^THv < 0 vTHv<0 ⟹ Saddle point