经典牛顿法(Pure Newton’s Method)
对于二次连续可微函数
f
f
f,求
min
{
f
(
x
)
:
x
∈
R
n
}
\min \{ f(\boldsymbol{x}):x\in \mathbb{R}^n\}
min{f(x):x∈Rn}
由多元函数泰勒公式
f
(
x
)
=
f
(
x
k
)
+
∇
f
(
x
k
)
T
(
x
−
x
k
)
+
1
2
(
x
−
x
k
)
T
∇
2
f
(
x
k
)
(
x
−
x
k
)
+
o
(
∥
x
−
x
k
∥
)
2
)
f(\boldsymbol{x})=f(\boldsymbol{x}_k)+\nabla f(\boldsymbol{x}_k)^T(\boldsymbol{x}-\boldsymbol{x}_k)+\frac{1}{2}(\boldsymbol{x}-\boldsymbol{x}_k)^T\nabla^2f(\boldsymbol{x}_k)(\boldsymbol{x}-\boldsymbol{x}_k)+o(\Vert \boldsymbol{x}-\boldsymbol{x}_k\Vert)^2)
f(x)=f(xk)+∇f(xk)T(x−xk)+21(x−xk)T∇2f(xk)(x−xk)+o(∥x−xk∥)2)
我们忽略高阶项,然后近似
f
(
x
)
f(\boldsymbol{x})
f(x)
f
(
x
)
≈
f
(
x
k
)
+
∇
f
(
x
k
)
T
(
x
−
x
k
)
+
1
2
(
x
−
x
k
)
T
∇
2
f
(
x
k
)
(
x
−
x
k
)
f(\boldsymbol{x})\approx f(\boldsymbol{x}_k)+\nabla f(\boldsymbol{x}_k)^T(\boldsymbol{x}-\boldsymbol{x}_k)+\frac{1}{2}(\boldsymbol{x}-\boldsymbol{x}_k)^T\nabla^2f(\boldsymbol{x}_k)(\boldsymbol{x}-\boldsymbol{x}_k)
f(x)≈f(xk)+∇f(xk)T(x−xk)+21(x−xk)T∇2f(xk)(x−xk)
接着找下降最多的地方
x
k
+
1
=
arg
min
x
∈
R
n
{
f
(
x
k
)
+
∇
f
(
x
k
)
T
(
x
−
x
k
)
+
1
2
(
x
−
x
k
)
T
∇
2
f
(
x
k
)
(
x
−
x
k
)
}
\boldsymbol{x}_{k+1}=\arg \min_{\boldsymbol{x} \in \mathbb{R}^n}\{ f(\boldsymbol{x}_k)+\nabla f(\boldsymbol{x}_k)^T(\boldsymbol{x}-\boldsymbol{x}_k)+\frac{1}{2}(\boldsymbol{x}-\boldsymbol{x}_k)^T\nabla^2f(\boldsymbol{x}_k)(\boldsymbol{x}-\boldsymbol{x}_k)\}
xk+1=argx∈Rnmin{f(xk)+∇f(xk)T(x−xk)+21(x−xk)T∇2f(xk)(x−xk)}
这里我们假设
∇
2
f
(
x
k
)
≻
0
\nabla^2 f(\boldsymbol{x}_k) \succ 0
∇2f(xk)≻0
那么最小值点就是驻点
所以求个导
∇
f
(
x
k
)
+
∇
2
f
(
x
k
)
(
x
−
x
k
)
=
0
⇒
x
k
+
1
=
x
k
−
(
∇
2
f
(
x
k
)
)
−
1
∇
f
(
x
k
)
\nabla f(\boldsymbol{x}_k)+\nabla^2f(\boldsymbol{x}_k)(\boldsymbol{x}-\boldsymbol{x}_k)=0\\ \Rightarrow \boldsymbol{x}_{k+1}=\boldsymbol{x}_k-(\nabla^2f(\boldsymbol{x}_k))^{-1}\nabla f(\boldsymbol{x}_k)
∇f(xk)+∇2f(xk)(x−xk)=0⇒xk+1=xk−(∇2f(xk))−1∇f(xk)
−
(
∇
2
f
(
x
k
)
)
−
1
∇
f
(
x
k
)
-(\nabla^2f(\boldsymbol{x}_k))^{-1}\nabla f(\boldsymbol{x}_k)
−(∇2f(xk))−1∇f(xk)称为牛顿方向
这种更新方式就叫做经典牛顿法
缺点
首先计算海森矩阵,计算量很大,更何况还要求逆
而且,尽管我们要求海森矩阵正定,但是也不能保证收敛
例如
f
(
x
)
=
1
+
x
2
f(x)=\sqrt{1+x^2}
f(x)=1+x2,用经典牛顿法
在初始点
∣
x
0
∣
<
1
\left|x_0\right|<1
∣x0∣<1收敛,
∣
x
0
∣
≥
1
\left| x_0\right| \ge 1
∣x0∣≥1发散
收敛性
引理1
设
A
T
=
A
A^T=A
AT=A,则
∥
A
∥
=
λ
max
(
A
)
\Vert A\Vert = \lambda_{\max}(A)
∥A∥=λmax(A)
证明:
A
x
=
λ
x
Ax=\lambda x
Ax=λx
A
T
x
=
λ
x
A^T x=\lambda x
ATx=λx
A
T
A
x
=
A
T
λ
x
=
λ
2
x
A^TAx=A^T\lambda x=\lambda^2 x
ATAx=ATλx=λ2x
∥
A
∥
=
λ
max
(
A
T
A
)
=
λ
max
(
A
)
\Vert A\Vert =\sqrt{\lambda_{\max}(A^TA)}=\lambda_{\max}(A)
∥A∥=λmax(ATA)=λmax(A)
定理1
假设
f
f
f二阶连续可微,并且
1)
∃
m
>
0
\exists m>0
∃m>0,对于
∀
x
∈
R
n
\forall x\in \mathbb{R}^n
∀x∈Rn,有
∇
2
f
(
x
)
⪰
m
I
\nabla^2 f(\boldsymbol{x})\succeq mI
∇2f(x)⪰mI
2)
∃
L
>
0
\exists L>0
∃L>0,对于
∀
x
,
y
∈
R
n
\forall \boldsymbol{x},\boldsymbol{y}\in \mathbb{R}^n
∀x,y∈Rn,有
∥
∇
2
f
(
x
)
−
∇
2
f
(
y
)
∥
≤
L
∥
x
−
y
∥
\Vert \nabla^2 f(\boldsymbol{x}) -\nabla^2 f(\boldsymbol{y})\Vert \le L \Vert \boldsymbol{x} - \boldsymbol{y} \Vert
∥∇2f(x)−∇2f(y)∥≤L∥x−y∥
设
{
x
k
}
k
≥
0
\{\boldsymbol{x}_k \}_{k\ge 0}
{xk}k≥0为经典牛顿法产生的序列,
设
x
∗
\boldsymbol{x}^{*}
x∗为
R
n
\mathbb{R}^{n}
Rn上唯一的最小值点
那么
∥
x
k
+
1
−
x
∗
∥
≤
L
2
m
∥
x
k
−
x
∗
∥
2
(
k
=
0
,
1
,
⋯
)
\Vert \boldsymbol{x}_{k+1} -\boldsymbol{x}^{*}\Vert \le \frac{L}{2m}\Vert \boldsymbol{x}_k - \boldsymbol{x}^{*}\Vert^2 (k=0,1,\cdots)
∥xk+1−x∗∥≤2mL∥xk−x∗∥2(k=0,1,⋯)
并且如果
∥
x
0
−
x
∗
∥
≤
m
L
\Vert \boldsymbol{x}_0 - \boldsymbol{x}^{*}\Vert \le \frac{m}{L}
∥x0−x∗∥≤Lm,那么
∥
x
k
−
x
∗
∥
≤
2
m
L
(
1
2
)
2
k
(
k
=
0
,
1
,
⋯
)
\Vert \boldsymbol{x}_{k} - \boldsymbol{x}^{*} \Vert \le \frac{2m}{L}(\frac{1}{2})^{2^{k}}(k=0,1,\cdots)
∥xk−x∗∥≤L2m(21)2k(k=0,1,⋯)
证明:
显然
∇
f
(
x
∗
)
=
0
\nabla f(\boldsymbol{x}^{*})=0
∇f(x∗)=0
x
k
+
1
−
x
∗
=
x
k
−
(
∇
2
f
(
x
k
)
)
−
1
∇
f
(
x
k
)
−
x
∗
=
x
k
−
x
∗
−
(
∇
2
f
(
x
k
)
)
−
1
(
∇
f
(
x
k
)
−
∇
2
f
(
x
∗
)
)
=
x
k
−
x
∗
+
(
∇
2
f
(
x
k
)
)
−
1
∫
0
1
∇
2
f
(
x
k
+
t
(
x
∗
−
x
k
)
)
(
x
∗
−
x
k
)
d
t
=
(
∇
2
f
(
x
k
)
)
−
1
∫
0
1
[
∇
2
f
(
x
k
+
t
(
x
∗
−
x
k
)
)
−
∇
2
f
(
x
k
)
]
(
x
∗
−
x
k
)
d
t
\begin{aligned} \boldsymbol{x}_{k+1}- \boldsymbol{x}^{*} &= \boldsymbol{x}_k-(\nabla^2 f( \boldsymbol{x}_{k}))^{-1}\nabla f( \boldsymbol{x}_k)- \boldsymbol{x}^{*}\\ &= \boldsymbol{x}_k-\boldsymbol{x}^{*}-(\nabla^2 f( \boldsymbol{x}_{k}))^{-1}(\nabla f( \boldsymbol{x}_k)-\nabla^2 f(\boldsymbol{x}^{*}))\\ &= \boldsymbol{x}_k-\boldsymbol{x}^{*}+(\nabla^2 f( \boldsymbol{x}_{k}))^{-1}\int_{0}^{1} \nabla^2f(\boldsymbol{x}_{k}+t(\boldsymbol{x}^{*}-\boldsymbol{x}_{k}))(\boldsymbol{x}^{*}-\boldsymbol{x}_{k})\mathrm{d}t\\ &=(\nabla^2 f( \boldsymbol{x}_{k}))^{-1}\int_{0}^{1} \left[\nabla^2f(\boldsymbol{x}_{k}+t(\boldsymbol{x}^{*}-\boldsymbol{x}_{k}))-\nabla^2 f( \boldsymbol{x}_{k})\right](\boldsymbol{x}^{*}-\boldsymbol{x}_{k})\mathrm{d}t\\ \end{aligned}
xk+1−x∗=xk−(∇2f(xk))−1∇f(xk)−x∗=xk−x∗−(∇2f(xk))−1(∇f(xk)−∇2f(x∗))=xk−x∗+(∇2f(xk))−1∫01∇2f(xk+t(x∗−xk))(x∗−xk)dt=(∇2f(xk))−1∫01[∇2f(xk+t(x∗−xk))−∇2f(xk)](x∗−xk)dt
∇
2
f
(
x
)
⪰
m
I
⇒
λ
≥
m
⇒
1
m
≤
1
λ
⇒
∥
(
∇
2
f
(
x
)
)
−
1
∥
≤
1
m
\nabla^2 f(\boldsymbol{x})\succeq mI \Rightarrow \lambda\ge m \Rightarrow \frac{1}{m}\le \frac{1}{\lambda}\Rightarrow \Vert (\nabla^2 f(\boldsymbol{x}))^{-1} \Vert \le \frac{1}{m}
∇2f(x)⪰mI⇒λ≥m⇒m1≤λ1⇒∥(∇2f(x))−1∥≤m1
∥
∫
0
1
[
∇
2
f
(
x
k
+
t
(
x
∗
−
x
k
)
)
−
∇
2
f
(
x
k
)
]
(
x
∗
−
x
k
)
d
t
∥
≤
∫
0
1
∥
[
∇
2
f
(
x
k
+
t
(
x
∗
−
x
k
)
)
−
∇
2
f
(
x
k
)
]
(
x
∗
−
x
k
)
∥
d
t
≤
∫
0
1
∥
∇
2
f
(
x
k
+
t
(
x
∗
−
x
k
)
)
−
∇
2
f
(
x
k
)
∥
∥
x
∗
−
x
k
∥
d
t
≤
∫
0
1
L
∥
x
k
+
t
(
x
∗
−
x
k
)
−
x
k
∥
∥
x
∗
−
x
k
∥
d
t
≤
∫
0
1
L
∥
t
(
x
∗
−
x
k
)
∥
∥
x
∗
−
x
k
∥
d
t
≤
∫
0
1
L
t
∥
(
x
∗
−
x
k
)
∥
∥
x
∗
−
x
k
∥
d
t
=
L
2
∥
x
k
−
x
∗
∥
2
\begin{aligned} &\quad \Vert \int_{0}^{1} \left[\nabla^2f(\boldsymbol{x}_{k}+t(\boldsymbol{x}^{*}-\boldsymbol{x}_{k}))-\nabla^2 f( \boldsymbol{x}_{k})\right](\boldsymbol{x}^{*}-\boldsymbol{x}_{k})\mathrm{d}t \Vert\\ &\le \int_{0}^{1} \Vert \left[\nabla^2f(\boldsymbol{x}_{k}+t(\boldsymbol{x}^{*}-\boldsymbol{x}_{k}))-\nabla^2 f( \boldsymbol{x}_{k})\right](\boldsymbol{x}^{*}-\boldsymbol{x}_{k}) \Vert \mathrm{d}t \\ &\le \int_{0}^{1} \Vert \nabla^2f(\boldsymbol{x}_{k}+t(\boldsymbol{x}^{*}-\boldsymbol{x}_{k}))-\nabla^2 f( \boldsymbol{x}_{k})\Vert\Vert\boldsymbol{x}^{*}-\boldsymbol{x}_{k} \Vert \mathrm{d}t \\ &\le \int_{0}^{1} L\Vert \boldsymbol{x}_{k}+t(\boldsymbol{x}^{*}-\boldsymbol{x}_{k})-\boldsymbol{x}_{k}\Vert\Vert\boldsymbol{x}^{*}-\boldsymbol{x}_{k} \Vert \mathrm{d}t \\ &\le \int_{0}^{1} L\Vert t(\boldsymbol{x}^{*}-\boldsymbol{x}_{k})\Vert\Vert\boldsymbol{x}^{*}-\boldsymbol{x}_{k} \Vert \mathrm{d}t \\ &\le \int_{0}^{1} Lt\Vert (\boldsymbol{x}^{*}-\boldsymbol{x}_{k})\Vert\Vert\boldsymbol{x}^{*}-\boldsymbol{x}_{k} \Vert \mathrm{d}t \\ &=\frac{L}{2}\Vert\boldsymbol{x}_{k}-\boldsymbol{x}^{*} \Vert ^2 \end{aligned}
∥∫01[∇2f(xk+t(x∗−xk))−∇2f(xk)](x∗−xk)dt∥≤∫01∥[∇2f(xk+t(x∗−xk))−∇2f(xk)](x∗−xk)∥dt≤∫01∥∇2f(xk+t(x∗−xk))−∇2f(xk)∥∥x∗−xk∥dt≤∫01L∥xk+t(x∗−xk)−xk∥∥x∗−xk∥dt≤∫01L∥t(x∗−xk)∥∥x∗−xk∥dt≤∫01Lt∥(x∗−xk)∥∥x∗−xk∥dt=2L∥xk−x∗∥2
所以
∥
x
k
+
1
−
x
∗
∥
≤
L
2
m
∥
x
k
−
x
∗
∥
2
\Vert \boldsymbol{x}_{k+1}- \boldsymbol{x}^{*} \Vert \le \frac{L}{2m}\Vert\boldsymbol{x}_{k}-\boldsymbol{x}^{*} \Vert ^2
∥xk+1−x∗∥≤2mL∥xk−x∗∥2
接着用数学归纳法证明
∥
x
k
−
x
∗
∥
≤
2
m
L
(
1
2
)
2
k
\Vert \boldsymbol{x}_{k} - \boldsymbol{x}^{*} \Vert \le \frac{2m}{L}(\frac{1}{2})^{2^{k}}
∥xk−x∗∥≤L2m(21)2k
当
k
=
0
k=0
k=0时,
∥
x
0
−
x
∗
∥
≤
m
L
=
2
m
L
(
1
2
)
2
0
\Vert \boldsymbol{x}_0 - \boldsymbol{x}^{*}\Vert \le \frac{m}{L}=\frac{2m}{L}(\frac{1}{2})^{2^{0}}
∥x0−x∗∥≤Lm=L2m(21)20
成立
假设
k
k
k时成立
当
k
+
1
k+1
k+1时
∥
x
k
+
1
−
x
∗
∥
≤
L
2
m
∥
x
k
−
x
∗
∥
2
≤
L
2
m
(
2
m
L
(
1
2
)
2
k
)
2
=
2
m
L
(
1
2
)
2
k
+
1
\Vert \boldsymbol{x}_{k+1}- \boldsymbol{x}^{*} \Vert \le \frac{L}{2m}\Vert\boldsymbol{x}_{k}-\boldsymbol{x}^{*} \Vert ^2\le \frac{L}{2m} (\frac{2m}{L}(\frac{1}{2})^{2^{k}})^2=\frac{2m}{L}(\frac{1}{2})^{2^{k+1}}
∥xk+1−x∗∥≤2mL∥xk−x∗∥2≤2mL(L2m(21)2k)2=L2m(21)2k+1
成立
所以我们得出如果初始点足够近,那么经典牛顿法是二次收敛的
定理2
假设
f
f
f二阶连续可微
设最小值点为
x
∗
\boldsymbol{x}^{*}
x∗
存在
L
>
0
L>0
L>0,对于
∀
x
,
y
∈
N
δ
(
x
∗
)
\forall \boldsymbol{x},\boldsymbol{y}\in N_{\delta}(\boldsymbol{x}^{*})
∀x,y∈Nδ(x∗)(
x
∗
\boldsymbol{x}^{*}
x∗的领域),有
∥
∇
2
f
(
x
)
−
∇
2
f
(
y
)
∥
≤
L
∥
x
−
y
∥
\Vert \nabla^2 f(\boldsymbol{x}) -\nabla^2 f(\boldsymbol{y})\Vert \le L \Vert \boldsymbol{x} - \boldsymbol{y} \Vert
∥∇2f(x)−∇2f(y)∥≤L∥x−y∥
如果
∇
f
(
x
∗
)
=
0
,
∇
2
f
(
x
∗
)
≻
0
\nabla f(\boldsymbol{x}^{*})=0,\nabla^2f(\boldsymbol{x}^{*})\succ0
∇f(x∗)=0,∇2f(x∗)≻0,则
1)如果初始点离
x
∗
\boldsymbol{x}^{*}
x∗足够近,则
{
x
k
}
\{\boldsymbol{x}_{k}\}
{xk}收敛到
x
∗
\boldsymbol{x}^{*}
x∗
2)
{
x
k
}
\{\boldsymbol{x}_{k}\}
{xk}收敛到
x
∗
\boldsymbol{x}^{*}
x∗的速度是Q-二次的
3)
{
∥
∇
f
(
x
k
)
∥
}
\{\Vert \nabla f(\boldsymbol{x}_k)\Vert \}
{∥∇f(xk)∥}Q-二次收敛到0
证明:
因为
∇
2
f
(
x
∗
)
\nabla^2f(\boldsymbol{x}^{*})
∇2f(x∗)是非奇异的,并且
f
f
f二阶连续可微,因此
∃
r
>
0
\exists r>0
∃r>0,对于任意满足
∥
x
−
x
∗
∥
<
r
\Vert \boldsymbol{x}- \boldsymbol{x}^{*}\Vert<r
∥x−x∗∥<r的
x
\boldsymbol{x}
x,均有
∥
(
∇
2
f
(
x
)
)
−
1
∥
≤
2
∥
(
∇
2
f
(
x
∗
)
)
−
1
∥
\Vert (\nabla^2f(\boldsymbol{x}))^{-1}\Vert \le 2\Vert (\nabla^2f(\boldsymbol{x}^{*}))^{-1}\Vert
∥(∇2f(x))−1∥≤2∥(∇2f(x∗))−1∥
(其实我也没看懂上面这个 )
然后与定理1类似,有
∥
x
k
+
1
−
x
∗
∥
≤
2
∥
(
∇
2
f
(
x
∗
)
)
−
1
∥
L
2
∥
x
k
−
x
∗
∥
2
=
L
∥
(
∇
2
f
(
x
∗
)
)
−
1
∥
∥
x
k
−
x
∗
∥
2
\Vert \boldsymbol{x}_{k+1}- \boldsymbol{x}^{*} \Vert \le 2\Vert (\nabla^2f(\boldsymbol{x}^{*}))^{-1}\Vert\frac{L}{2}\Vert\boldsymbol{x}_{k}-\boldsymbol{x}^{*} \Vert ^2=L\Vert (\nabla^2f(\boldsymbol{x}^{*}))^{-1}\Vert \Vert\boldsymbol{x}_{k}-\boldsymbol{x}^{*} \Vert ^2
∥xk+1−x∗∥≤2∥(∇2f(x∗))−1∥2L∥xk−x∗∥2=L∥(∇2f(x∗))−1∥∥xk−x∗∥2
因此,当
x
0
\boldsymbol{x}_0
x0满足
∥
x
0
−
x
∗
∥
≤
min
{
δ
,
r
,
1
2
L
∥
∇
2
f
(
x
∗
)
−
1
∥
}
=
def
δ
^
\Vert x^{0}-x^{*}\Vert \le \min \left\{\delta, r, \frac{1}{2 L\Vert \nabla^{2} f\left(x^{*}\right)^{-1}\Vert }\right\} \stackrel{\text { def }}{=} \hat{\delta}
∥x0−x∗∥≤min{δ,r,2L∥∇2f(x∗)−1∥1}= def δ^
时,可保证
{
x
k
}
\{\boldsymbol{x}_{k}\}
{xk}收敛到
N
δ
^
(
x
∗
)
N_{\hat{\delta}}(\boldsymbol{x}^{*})
Nδ^(x∗)中(其实我也没看懂)
因此
{
x
k
}
\{\boldsymbol{x}_{k}\}
{xk}Q-二次收敛到
x
∗
\boldsymbol{x}^{*}
x∗
根据
∇
f
(
x
k
)
+
∇
2
f
(
x
k
)
(
x
−
x
k
)
=
0
\nabla f(\boldsymbol{x}_k)+\nabla^2f(\boldsymbol{x}_k)(\boldsymbol{x}-\boldsymbol{x}_k)=0
∇f(xk)+∇2f(xk)(x−xk)=0
有
∥
∇
f
(
x
k
+
1
)
∥
=
∥
∇
f
(
x
k
+
1
)
−
(
∇
f
(
x
k
)
+
∇
2
f
(
x
k
)
(
x
−
x
k
)
)
∥
=
∥
∫
0
1
∇
2
f
(
x
k
+
t
(
x
−
x
k
)
)
(
x
−
x
k
)
d
t
−
∇
2
f
(
x
k
)
(
x
−
x
k
)
)
∥
=
∥
∫
0
1
[
∇
2
f
(
x
k
+
t
(
x
−
x
k
)
)
−
∇
2
f
(
x
k
)
]
(
x
−
x
k
)
d
t
∥
≤
L
2
∥
x
−
x
k
∥
2
=
L
2
∥
−
(
∇
2
f
(
x
k
)
)
−
1
∇
f
(
x
k
)
∥
2
≤
L
2
∥
(
∇
2
f
(
x
k
)
)
−
1
∥
2
∥
∇
f
(
x
k
)
∥
2
≤
L
2
4
∥
(
∇
2
f
(
x
∗
)
)
−
1
∥
2
∥
∇
f
(
x
k
)
∥
2
=
2
L
∥
(
∇
2
f
(
x
∗
)
)
−
1
∥
2
∥
∇
f
(
x
k
)
∥
2
\begin{aligned} \Vert \nabla f(\boldsymbol{x}_{k+1}) \Vert &= \Vert \nabla f(\boldsymbol{x}_{k+1}) -(\nabla f(\boldsymbol{x}_k)+\nabla^2f(\boldsymbol{x}_k)(\boldsymbol{x}-\boldsymbol{x}_k))\Vert\\ &=\Vert \int_{0}^{1} \nabla^2f(\boldsymbol{x}_{k}+t(\boldsymbol{x}-\boldsymbol{x}_k))(\boldsymbol{x}-\boldsymbol{x}_k)\mathrm{d}t -\nabla^2f(\boldsymbol{x}_k)(\boldsymbol{x}-\boldsymbol{x}_k))\Vert\\ &=\Vert \int_{0}^{1} \left[\nabla^2f(\boldsymbol{x}_{k}+t(\boldsymbol{x}-\boldsymbol{x}_k))-\nabla^2f(\boldsymbol{x}_k)\right](\boldsymbol{x}-\boldsymbol{x}_k)\mathrm{d}t \Vert\\ &\le \frac{L}{2}\Vert \boldsymbol{x}-\boldsymbol{x}_k \Vert^2\\ &=\frac{L}{2}\Vert -(\nabla^2f(\boldsymbol{x}_k))^{-1}\nabla f(\boldsymbol{x}_k)\Vert^2\\ &\le \frac{L}{2} \Vert (\nabla^2f(\boldsymbol{x}_k))^{-1}\Vert^2 \Vert \nabla f(\boldsymbol{x}_k) \Vert^2\\ &\le \frac{L}{2} 4\Vert (\nabla^2f(\boldsymbol{x}^{*}))^{-1}\Vert^2 \Vert \nabla f(\boldsymbol{x}_k) \Vert^2\\ &=2L\Vert (\nabla^2f(\boldsymbol{x}^{*}))^{-1}\Vert^2 \Vert \nabla f(\boldsymbol{x}_k) \Vert^2\\ \end{aligned}
∥∇f(xk+1)∥=∥∇f(xk+1)−(∇f(xk)+∇2f(xk)(x−xk))∥=∥∫01∇2f(xk+t(x−xk))(x−xk)dt−∇2f(xk)(x−xk))∥=∥∫01[∇2f(xk+t(x−xk))−∇2f(xk)](x−xk)dt∥≤2L∥x−xk∥2=2L∥−(∇2f(xk))−1∇f(xk)∥2≤2L∥(∇2f(xk))−1∥2∥∇f(xk)∥2≤2L4∥(∇2f(x∗))−1∥2∥∇f(xk)∥2=2L∥(∇2f(x∗))−1∥2∥∇f(xk)∥2
所以
{
∥
∇
f
(
x
k
)
∥
}
\{\Vert \nabla f(\boldsymbol{x}_k)\Vert \}
{∥∇f(xk)∥}Q-二次收敛到0