无约束最优条件
一阶必要条件:
∇ f ( x ∗ ) = 0 \nabla f(x^*)=0 ∇f(x∗)=0
必要性:
若
x
∗
x^*
x∗ 是局部最优点(最小值点),则由:
∇
f
(
x
∗
)
=
lim
f
(
x
)
−
f
(
x
∗
)
x
−
x
∗
\nabla f(x^*)=\lim \frac{f(x)- f(x^*)}{x-x^*}\\
∇f(x∗)=limx−x∗f(x)−f(x∗)
取出其中一个分量
x
i
∗
x_i^*
xi∗ 则
∂
f
(
x
)
∂
x
i
>
0
\frac{\partial f(x)}{\partial x_i}>0
∂xi∂f(x)>0 当
x
i
>
x
∗
x_i>x^*
xi>x∗;
∂
f
(
x
)
∂
x
i
<
0
\frac{\partial f(x)}{\partial x_i}<0
∂xi∂f(x)<0 当
x
i
<
x
∗
x_i<x^*
xi<x∗。
则由函数的连续性, ∂ f ( x ∗ ) ∂ x i ∗ = 0 \frac{\partial f(x^*)}{\partial x^*_i}=0 ∂xi∗∂f(x∗)=0 。得证 □ \square □
Remark:
单纯的梯度为 0 0 0 ,不能保证最优点:
二阶必要条件:
H = ∇ 2 f ( x ) = [ ∂ 2 f ( x ) ∂ x 1 ∂ x 1 ⋯ ∂ 2 f ( x ) ∂ x 1 ∂ x n ⋮ ⋱ ⋮ ∂ 2 f ( x ) ∂ x n ∂ x 1 ⋯ ∂ 2 f ( x ) ∂ x n ∂ x n ] { H ⪰ 0 convex H ≻ 0 strictly convex H = \nabla^2 f(x) = \left[\begin{matrix}\frac{\partial^2 f(x)}{\partial x_1\partial x_1}&\cdots&\frac{\partial^2 f(x)}{\partial x_1\partial x_n} \\ \vdots&\ddots&\vdots\\ \frac{\partial^2 f(x)}{\partial x_n\partial x_1}&\cdots&\frac{\partial^2 f(x)}{\partial x_n\partial x_n}\end{matrix}\right]\\ \begin{cases} H\succeq 0& \text{convex}\\ H\succ 0& \text{strictly convex} \end{cases} H=∇2f(x)=⎣⎢⎢⎡∂x1∂x1∂2f(x)⋮∂xn∂x1∂2f(x)⋯⋱⋯∂x1∂xn∂2f(x)⋮∂xn∂xn∂2f(x)⎦⎥⎥⎤{H⪰0H≻0convexstrictly convex
必要性:
由二阶 Taylor 展开:
f
(
x
)
=
f
(
x
∗
)
+
∇
f
(
x
∗
)
T
(
x
−
x
∗
)
+
1
2
(
x
−
x
∗
)
T
∇
2
f
(
x
∗
)
(
x
−
x
∗
)
+
o
(
∥
x
∗
−
x
∥
2
)
f(x)=f(x^*)+\nabla f(x^*)^T (x-x^*) + \frac{1}{2}(x-x^*)^T \nabla^2f(x^*)(x-x^*)+o(\|x^*-x\|^2)
f(x)=f(x∗)+∇f(x∗)T(x−x∗)+21(x−x∗)T∇2f(x∗)(x−x∗)+o(∥x∗−x∥2)
由一阶必要条件,
∇
f
(
x
∗
)
=
0
\nabla f(x^*)=0
∇f(x∗)=0 ,又因为是最优值,则:
0
≤
f
(
x
)
−
f
(
x
∗
)
=
1
2
(
x
−
x
∗
)
T
∇
2
f
(
x
∗
)
(
x
−
x
∗
)
+
o
(
∥
x
∗
−
x
∥
2
)
0\le f(x)-f(x^*)= \frac{1}{2}(x-x^*)^T \nabla^2f(x^*)(x-x^*)+o(\|x^*-x\|^2)
0≤f(x)−f(x∗)=21(x−x∗)T∇2f(x∗)(x−x∗)+o(∥x∗−x∥2)
因为
∇
2
f
(
x
)
\nabla^2 f(x)
∇2f(x) 正定对称,则由酉分解:
∇
2
f
(
x
)
=
P
Λ
P
T
\nabla^2 f(x)=P\Lambda P^{T}
∇2f(x)=PΛPT
令
d
~
=
P
(
x
−
x
∗
)
\tilde d=P(x-x^*)
d~=P(x−x∗) 则:
1
2
d
~
T
∇
2
f
(
x
∗
)
d
~
=
1
2
(
x
−
x
∗
)
T
Λ
(
x
−
x
∗
)
≥
1
2
λ
min
∥
x
−
x
∗
∥
2
\frac{1}{2}\tilde d^T \nabla^2f(x^*)\tilde d=\frac{1}{2}(x-x^*)^T \Lambda (x-x^*)\ge \frac{1}{2}\lambda_{\min} \|x-x^*\|^2
21d~T∇2f(x∗)d~=21(x−x∗)TΛ(x−x∗)≥21λmin∥x−x∗∥2
因此:
f
(
x
)
≥
f
(
x
∗
)
+
1
2
λ
min
∥
x
−
x
∗
∥
2
f(x)\ge f(x^*)+\frac{1}{2}\lambda_{\min} \|x-x^*\|^2
f(x)≥f(x∗)+21λmin∥x−x∗∥2