牛顿下降法--最优化方法
一、分类
根据步长step size t是否设置为1分为pure(t=1)和damped(t不一定等于1)两种。
本文还介绍inexact的方法。
1)最优化问题目标函数:
min f ( x ) \min f(x) minf(x)
2) f ( x k + p k ) f(x_k+p_k) f(xk+pk)估计值(Taylor 公式,quadratic approximation):
f ( x k + p k ) ≈ f ( x k ) + ∇ f ( x k ) T p k + 1 2 p k T ∇ 2 f ( x k ) p k f(x_k+p_k) \approx f(x_k)+ \nabla f(x_k)^Tp_k+\frac{1}{2} p_k^T \nabla ^2f(x_k)p_k f(xk+pk)≈f(xk)+∇f(xk)Tpk+21pkT∇2f(xk)pk
3)对该Taylor估计值的 p k p_k pk( p k p_k pk使得该Taylor估计值取最小)进行求导:
∇ f ( x ) + ∇ 2 f ( x k ) p k = 0 , ∇ 2 f ( x k ) ≻ 0 ⇒ p k = − ∇ 2 f ( x k ) − 1 ∇ f ( x k ) \nabla f(x) + \nabla ^2 f(x_k)p_k=0, \nabla ^2f(x_k) \succ0 \Rightarrow p_k=-\nabla ^2f(x_k)^{-1}\nabla f(x_k) ∇f(x)+∇2f(xk)pk=0,∇2f(xk)≻0⇒pk=−∇2f(xk)−1∇f(xk)
二、pure Newton method(纯牛顿算法)
1)设定 x k + 1 = x k + t k p k x_{k+1}=x_k+t_kp_k xk+1=xk+tkpk:
其中 t k = 1 , p k = − ∇ 2 f ( x k ) − 1 ∇ f ( x k ) t_k=1,p_k=-\nabla ^2f(x_k)^{-1}\nabla f(x_k) tk=1,pk=−∇2f(xk)−1∇f(xk),所以 x k + 1 = x k − ∇ 2 f ( x k ) − 1 ∇ f ( x k ) x_{k+1}=x_k-\nabla ^2f(x_k)^{-1}\nabla f(x_k) xk+1=xk−∇2f(xk)−1∇f(xk)
2)算法:
setting initial value x 0 w h i l e ∣ f ( x k ) − f ( x k + 1 ) ∣ < e p s i l o n o r ∣ x k − x k + 1 ∣ < e p s i l o n o r i t e r a t i o n _ t i m e s > = m a x _ i t e r a t i o n _ t i m e s d o compute ∇ 2 f ( x k ) compute ∇ f ( x k ) x k + 1 = x k − ∇ 2 f ( x k ) − 1 ∇ f ( x k ) e n d w h i l e \begin{align*} &\text{setting initial value}x_0\\ &while\quad |f(x_{k})-f(x_{k+1})| < epsilon \\ &\quad \quad or \quad |x_{k}-x_{k+1}| < epsilon \\ &\quad \quad or \quad iteration\_times >= max\_iteration\_times \quad do \\ &\quad \quad \text{compute }\nabla ^2f(x_k)\\ &\quad \quad \text{compute } \nabla f(x_k)\\ &\quad \quad x_{k+1}=x_k-\nabla ^2f(x_k)^{-1}\nabla f(x_k) \\ & end\quad while\\ \end{align*} setting initial valuex0while∣f(xk)−f(xk+1)∣<epsilonor∣xk−xk+1∣<epsilonoriteration_times>=max_iteration_timesdocompute ∇2f(xk)compute ∇f(xk)xk+1=xk−∇2f(xk)−1∇f(xk)endwhile
3)优劣分析:
纯牛顿法的优点是收敛速度快,但缺点是对初始点的选择比较敏感,且要求函数 f(x) 在根附近可导且导数不为零。
三、damped Newton method(阻尼牛顿算法)
1)设定 x k + 1 = x k + t k p k x_{k+1}=x_k+t_kp_k xk+1=xk+tkpk:
其中 t k ≠ 1 , p k = − ∇ 2 f ( x k ) − 1 ∇ f ( x k ) t_k\not =1,p_k=-\nabla ^2f(x_k)^{-1}\nabla f(x_k) tk=1,pk=−∇2f(xk)−1∇f(xk),所以 x k + 1 = x k − t k ∇ 2 f ( x k ) − 1 ∇ f ( x k ) x_{k+1}=x_k-t_k \nabla ^2f(x_k)^{-1}\nabla f(x_k) xk+1=xk−tk∇2f(xk)−1∇f(xk)
2)Newton decrement(停止迭代):
Newton decrement是牛顿法中用于衡量当前迭代点与最优解之间差距的一个重要指标,常用于优化问题的收敛性分析和停止条件判断。
Newton decrement推倒:
quadratic approximation:
f
(
x
k
+
p
k
)
=
f
(
x
k
)
+
∇
f
(
x
)
T
p
k
+
1
2
p
k
T
∇
2
f
(
x
)
−
1
p
k
min
f
(
x
k
+
p
k
)
⇒
p
k
=
−
∇
2
f
(
x
k
)
−
1
∇
f
(
x
k
)
⇒
min
f
(
x
k
+
p
k
)
=
min
f
(
x
k
)
+
∇
f
(
x
)
T
p
k
+
1
2
p
k
T
∇
2
f
(
x
)
p
k
=
f
(
x
k
)
−
∇
f
(
x
k
)
T
∇
2
f
(
x
k
)
−
1
∇
f
(
x
k
)
+
1
2
∇
f
(
x
k
)
T
∇
2
f
(
x
k
)
−
T
∇
2
f
(
x
)
∇
2
f
(
x
k
)
−
1
∇
f
(
x
k
)
=
f
(
x
k
)
−
1
2
∇
f
(
x
k
)
T
∇
2
f
(
x
k
)
−
1
∇
f
(
x
k
)
⇒
min
f
(
x
k
+
p
k
)
−
f
(
x
k
)
=
−
1
2
∇
f
(
x
k
)
T
∇
2
f
(
x
k
)
−
1
∇
f
(
x
k
)
=
1
2
λ
(
x
k
)
2
⇒
λ
(
x
k
)
=
(
−
∇
f
(
x
k
)
T
∇
2
f
(
x
k
)
−
1
∇
f
(
x
k
)
)
1
/
2
=
(
−
d
(
x
k
)
T
∇
2
f
(
x
k
)
d
(
x
k
)
)
1
/
2
=
(
−
∇
f
(
x
k
)
T
d
(
x
k
)
)
1
/
2
d
(
x
k
)
=
∇
2
f
(
x
k
)
−
1
∇
f
(
x
k
)
\begin{align*} \text{quadratic approximation: } f(x_{k}+p_k)&=f(x_k)+\nabla f(x)^Tp_k+\frac{1}{2} p_k^T\nabla ^2f(x)^{-1}p_k\\ \min f(x_{k}+p_k) \Rightarrow p_k&=- \nabla ^2f(x_k)^{-1} \nabla f(x_k)\\ \Rightarrow \min f(x_{k}+p_k)&= \min f(x_k)+\nabla f(x)^Tp_k+\frac{1}{2} p_k^T\nabla ^2f(x)p_k\\ &=f(x_k)- \nabla f(x_k)^T\nabla ^2f(x_k)^{-1} \nabla f(x_k)+\frac{1}{2} \nabla f(x_k)^T\nabla ^2f(x_k)^{-T}\nabla ^2f(x) \nabla ^2f(x_k)^{-1} \nabla f(x_k)\\ & = f(x_k)- \frac{1}{2} \nabla f(x_k)^T\nabla ^2f(x_k)^{-1} \nabla f(x_k)\\ \Rightarrow \min f(x_{k}+p_k) -f(x_k)&=- \frac{1}{2} \nabla f(x_k)^T\nabla ^2f(x_k)^{-1} \nabla f(x_k)=\frac{1}{2} \lambda(x_k)^2\\ \Rightarrow \lambda(x_k) &= (-\nabla f(x_k)^T\nabla ^2f(x_k)^{-1} \nabla f(x_k))^{1/2}=(-d(x_k)^T\nabla ^2f(x_k)d(x_k))^{1/2}=(-\nabla f(x_k)^Td(x_k))^{1/2}\\ d(x_k) &= \nabla ^2f(x_k)^{-1} \nabla f(x_k) \end{align*}
quadratic approximation: f(xk+pk)minf(xk+pk)⇒pk⇒minf(xk+pk)⇒minf(xk+pk)−f(xk)⇒λ(xk)d(xk)=f(xk)+∇f(x)Tpk+21pkT∇2f(x)−1pk=−∇2f(xk)−1∇f(xk)=minf(xk)+∇f(x)Tpk+21pkT∇2f(x)pk=f(xk)−∇f(xk)T∇2f(xk)−1∇f(xk)+21∇f(xk)T∇2f(xk)−T∇2f(x)∇2f(xk)−1∇f(xk)=f(xk)−21∇f(xk)T∇2f(xk)−1∇f(xk)=−21∇f(xk)T∇2f(xk)−1∇f(xk)=21λ(xk)2=(−∇f(xk)T∇2f(xk)−1∇f(xk))1/2=(−d(xk)T∇2f(xk)d(xk))1/2=(−∇f(xk)Td(xk))1/2=∇2f(xk)−1∇f(xk)
所以Newton decrement
λ
(
x
k
)
=
(
−
∇
f
(
x
k
)
T
∇
2
f
(
x
k
)
−
1
∇
f
(
x
k
)
)
1
/
2
\lambda(x_k) =(-\nabla f(x_k)^T\nabla ^2f(x_k)^{-1} \nabla f(x_k))^{1/2}
λ(xk)=(−∇f(xk)T∇2f(xk)−1∇f(xk))1/2可以作为停止搜寻的条件。
3)Armijio rule backing line search(寻找 t k t_k tk):
α ∈ ( 0 , 1 / 2 ) , f ( x k + t k p k ) ≤ f ( x k ) + α t k ∇ f ( x k ) T p k ⇒ f ( x k ) − f ( x k + t k p k ) ≥ − α t k ∇ f ( x k ) T p k \alpha \in (0,1/2), f(x_k+t_kp_k) \leq f(x_k)+\alpha t_k \nabla f(x_k)^Tp_k \Rightarrow f(x_k)-f(x_k+t_kp_k) \ge -\alpha t_k \nabla f(x_k)^Tp_k α∈(0,1/2),f(xk+tkpk)≤f(xk)+αtk∇f(xk)Tpk⇒f(xk)−f(xk+tkpk)≥−αtk∇f(xk)Tpk
4)算法:
setting initial value x 0 , α ∈ ( 0 , 1 / 2 ) , β ∈ ( 0 , 1 ) w h i l e 1 2 λ ( x k ) 2 = − 1 2 ∇ f ( x k ) T ∇ 2 f ( x k ) − 1 ∇ f ( x k ) ≤ ϵ d o p k = − f ( x k ) − 1 ∇ f ( x k ) t k = 1 w h i l e f ( x k ) − f ( x k + t k p k ) < − α t k ∇ f ( x k ) T p k d o t k = β t k e n d w h i l e x k + 1 = x k + t k p k e n d w h i l e \begin{align*} &\text{setting initial value } x_0, \alpha \in (0,1/2),\beta \in (0,1) \\ &while\quad \frac{1}{2}\lambda(x_k) ^2=-\frac{1}{2}\nabla f(x_k)^T\nabla ^2f(x_k)^{-1} \nabla f(x_k) \leq \epsilon \quad do \\ &\quad \quad p_k=-f(x_k)^{-1} \nabla f(x_k) \\ &\quad \quad t_k=1\\ &\quad \quad while \quad f(x_k)-f(x_k+t_kp_k) < -\alpha t_k \nabla f(x_k)^Tp_k \quad do\\ &\quad \quad \quad \quad t_k=\beta t_k\\ &\quad \quad end \quad while\\ &\quad \quad x_{k+1}=x_k+t_kp_k \\ &end \quad while\\ \end{align*} setting initial value x0,α∈(0,1/2),β∈(0,1)while21λ(xk)2=−21∇f(xk)T∇2f(xk)−1∇f(xk)≤ϵdopk=−f(xk)−1∇f(xk)tk=1whilef(xk)−f(xk+tkpk)<−αtk∇f(xk)Tpkdotk=βtkendwhilexk+1=xk+tkpkendwhile
5)使用分析:damped Newton method需要计算目标函数的梯度和海森矩阵,这在高维问题中可能计算成本较高。在适当的条件下,阻尼牛顿算法可以保证快速收敛。
6)收敛性分析:
(1)对
f
(
x
)
,
∇
f
(
x
)
,
∇
2
f
(
x
)
,
x
∈
R
n
f(x), \nabla f(x),\nabla ^2f(x),x \in \mathbb{R^n}
f(x),∇f(x),∇2f(x),x∈Rn的设定(假设、限制):
f
(
x
)
二次可微且连续
一次导Lipschitz连续性:
∇
2
f
(
x
)
⪯
M
I
⇔
∥
∇
f
(
y
)
−
∇
f
(
x
)
∥
≤
M
∥
y
−
x
∥
二次导Lipschitz连续性:
∥
∇
2
f
(
y
)
−
∇
2
f
(
x
)
∥
≤
L
∥
y
−
x
∥
strongly convex:
∇
2
f
(
x
)
≻
m
I
⇔
∥
∇
2
f
(
x
)
∥
≥
m
\begin{align*} &f(x) \text{ 二次可微且连续 }\\ &\text{一次导Lipschitz连续性: }\nabla ^2f(x) \preceq MI \Leftrightarrow \|\nabla f(y)-\nabla f(x)\| \leq M\|y-x\|\\ &\text{二次导Lipschitz连续性: }\|\nabla ^2f(y)-\nabla ^2f(x)\| \leq L\|y-x\|\\ &\text{strongly convex: } \nabla ^2f(x) \succ mI \Leftrightarrow \|\nabla ^2f(x) \| \ge m\\ \end{align*}
f(x) 二次可微且连续 一次导Lipschitz连续性: ∇2f(x)⪯MI⇔∥∇f(y)−∇f(x)∥≤M∥y−x∥二次导Lipschitz连续性: ∥∇2f(y)−∇2f(x)∥≤L∥y−x∥strongly convex: ∇2f(x)≻mI⇔∥∇2f(x)∥≥m
(2)基于假设上的达到收敛条件时迭代次数的上限:
达到
f
(
x
k
)
−
f
(
x
∗
)
≤
ϵ
f(x_k)-f(x^*) \leq \epsilon
f(xk)−f(x∗)≤ϵ停止迭代条件时的迭代次数上界为
M
2
L
2
/
m
5
α
β
min
(
1
,
9
(
1
−
2
α
)
2
)
(
f
(
x
0
)
−
f
(
x
∗
)
)
+
l
o
g
2
l
o
g
2
2
m
3
/
L
2
ϵ
\frac{M^2L^2/m^5}{\alpha \beta \min (1,9(1-2\alpha)^2)}(f(x_0)-f(x^*))+log_2log_2\frac{2m^3/L^2}{\epsilon}
αβmin(1,9(1−2α)2)M2L2/m5(f(x0)−f(x∗))+log2log2ϵ2m3/L2
(3)基于假设上的
∥
x
k
+
1
−
x
∗
∥
≤
L
2
m
∥
x
k
−
x
∗
∥
2
\|x_{k+1}-x^*\| \leq \frac{L}{2m}\|x_k-x^*\|^2
∥xk+1−x∗∥≤2mL∥xk−x∗∥2:
证明:
−
∇
f
(
x
k
)
=
0
−
∇
f
(
x
k
)
=
∇
f
(
x
∗
)
−
∇
f
(
x
k
)
∇
f
(
x
∗
)
−
∇
f
(
x
k
)
=
∫
0
1
∇
2
f
(
x
k
+
t
(
x
∗
−
x
k
)
)
(
x
∗
−
x
k
)
d
t
x
k
+
1
−
x
∗
=
x
k
−
∇
2
f
(
x
k
)
−
1
∇
f
(
x
k
)
−
x
∗
=
x
k
−
x
∗
+
∇
2
f
(
x
k
)
−
1
(
∇
f
(
x
∗
)
−
∇
f
(
x
k
)
)
=
x
k
−
x
∗
+
∇
2
f
(
x
k
)
−
1
∫
0
1
∇
2
f
(
x
k
+
t
(
x
∗
−
x
k
)
)
(
x
∗
−
x
k
)
d
t
=
∇
2
f
(
x
k
)
−
1
∫
0
1
[
∇
2
f
(
x
k
+
t
(
x
∗
−
x
k
)
)
−
∇
2
f
(
x
k
)
]
(
x
∗
−
x
k
)
d
t
∥
x
k
+
1
−
x
∗
∥
=
∥
∇
2
f
(
x
k
)
−
1
∫
0
1
[
∇
2
f
(
x
k
+
t
(
x
∗
−
x
k
)
)
−
∇
2
f
(
x
k
)
]
(
x
∗
−
x
k
)
d
t
∥
≤
∥
∇
2
f
(
x
k
)
−
1
∥
∥
∫
0
1
[
∇
2
f
(
x
k
+
t
(
x
∗
−
x
k
)
)
−
∇
2
f
(
x
k
)
]
(
x
∗
−
x
k
)
d
t
∥
≤
1
m
∥
∫
0
1
[
∇
2
f
(
x
k
+
t
(
x
∗
−
x
k
)
)
−
∇
2
f
(
x
k
)
]
(
x
∗
−
x
k
)
d
t
∥
≤
1
m
∫
0
1
∥
∇
2
f
(
x
k
+
t
(
x
∗
−
x
k
)
)
−
∇
2
f
(
x
k
)
∥
∥
(
x
∗
−
x
k
)
∥
d
t
≤
1
m
∫
0
1
L
t
∥
x
∗
−
x
k
∥
2
d
t
=
L
2
m
∥
x
k
−
x
∗
∥
2
\begin{align*} -\nabla f(x_k)&=0-\nabla f(x_k)=\nabla f(x^*)-\nabla f(x_k)\\ \nabla f(x^*)-\nabla f(x_k)&=\int ^1_0\nabla ^2f(x_k+t(x^*-x_k))(x^*-x_k)dt\\ x_{k+1}-x^*&=x_k-\nabla ^2f(x_k)^{-1}\nabla f(x_k)-x^*\\ &=x_k-x^*+\nabla ^2f(x_k)^{-1}(\nabla f(x^*)-\nabla f(x_k))\\ &=x_k-x^*+\nabla ^2f(x_k)^{-1}\int ^1_0\nabla ^2f(x_k+t(x^*-x_k))(x^*-x_k)dt\\ &=\nabla ^2f(x_k)^{-1}\int ^1_0[\nabla ^2f(x_k+t(x^*-x_k))-\nabla ^2f(x_k)](x^*-x_k)dt\\ \|x_{k+1}-x^*\| &=\|\nabla ^2f(x_k)^{-1}\int ^1_0[\nabla ^2f(x_k+t(x^*-x_k))-\nabla ^2f(x_k)](x^*-x_k)dt\|\\ & \leq \|\nabla ^2f(x_k)^{-1}\| \|\int ^1_0[\nabla ^2f(x_k+t(x^*-x_k))-\nabla ^2f(x_k)](x^*-x_k)dt\|\\ & \leq \frac{1}{m}\|\int ^1_0[\nabla ^2f(x_k+t(x^*-x_k))-\nabla ^2f(x_k)](x^*-x_k)dt\|\\ & \leq \frac{1}{m}\int ^1_0\|\nabla ^2f(x_k+t(x^*-x_k))-\nabla ^2f(x_k)\| \|(x^*-x_k)\|dt\\ & \leq \frac{1}{m}\int ^1_0 Lt \|x^*-x_k\|^2dt\\ &= \frac{L}{2m}\|x_k-x^*\|^2\\ \end{align*}
−∇f(xk)∇f(x∗)−∇f(xk)xk+1−x∗∥xk+1−x∗∥=0−∇f(xk)=∇f(x∗)−∇f(xk)=∫01∇2f(xk+t(x∗−xk))(x∗−xk)dt=xk−∇2f(xk)−1∇f(xk)−x∗=xk−x∗+∇2f(xk)−1(∇f(x∗)−∇f(xk))=xk−x∗+∇2f(xk)−1∫01∇2f(xk+t(x∗−xk))(x∗−xk)dt=∇2f(xk)−1∫01[∇2f(xk+t(x∗−xk))−∇2f(xk)](x∗−xk)dt=∥∇2f(xk)−1∫01[∇2f(xk+t(x∗−xk))−∇2f(xk)](x∗−xk)dt∥≤∥∇2f(xk)−1∥∥∫01[∇2f(xk+t(x∗−xk))−∇2f(xk)](x∗−xk)dt∥≤m1∥∫01[∇2f(xk+t(x∗−xk))−∇2f(xk)](x∗−xk)dt∥≤m1∫01∥∇2f(xk+t(x∗−xk))−∇2f(xk)∥∥(x∗−xk)∥dt≤m1∫01Lt∥x∗−xk∥2dt=2mL∥xk−x∗∥2
(4)基于假设上的式子
m
2
∥
x
−
x
∗
∥
2
2
≤
f
(
x
)
−
f
(
x
∗
)
≤
1
2
m
∥
∇
f
(
x
)
∥
2
2
\frac{m}{2}\|x-x^*\|^2_2 \leq f(x)-f(x^*) \leq \frac{1}{2m}\|\nabla f(x)\|^2_2
2m∥x−x∗∥22≤f(x)−f(x∗)≤2m1∥∇f(x)∥22:
证明:
左边不等式:已知
∥
∇
2
f
(
x
)
∥
2
⪯
m
,
f
(
x
)
=
f
(
x
∗
)
+
∇
f
(
x
∗
)
(
x
−
x
∗
)
+
(
x
−
x
∗
)
T
∇
2
f
(
x
∗
)
(
x
−
x
∗
)
/
2
≥
f
(
x
∗
)
+
∥
x
∗
−
x
∥
2
2
2
m
右边不等式:已知
[
m
2
(
x
∗
−
x
)
+
1
2
m
∇
f
(
x
)
]
2
=
m
2
∥
x
∗
−
x
∥
2
2
+
1
2
m
∥
∇
f
(
x
)
∥
2
2
+
∇
f
(
x
)
T
(
x
∗
−
x
)
≥
0
f
(
x
∗
)
=
f
(
x
)
+
∇
f
(
x
)
(
x
∗
−
x
)
+
(
x
∗
−
x
)
T
∇
2
f
(
x
)
(
x
∗
−
x
)
/
2
≥
f
(
x
)
+
∇
f
(
x
)
(
x
∗
−
x
)
+
m
2
∥
x
∗
−
x
∥
2
2
≥
f
(
x
)
−
1
2
m
∥
∇
f
(
x
)
∥
2
2
\begin{align*} \text{左边不等式:已知}& \|\nabla ^2f(x) \|_2\preceq m,\\ f(x) &=f(x^*)+\nabla f(x^*)(x-x^*)+(x-x^*)^T\nabla ^2f(x^*)(x-x^*)/2\\ &\ge f(x^*)+\frac{\|x^*-x\|_2^2}{2m}\\ \text{右边不等式:已知}&[\frac{\sqrt{m}}{\sqrt{2}}(x^*-x)+\frac{1}{\sqrt{2m}}\nabla f(x)]^2=\frac{m}{2}\|x^*-x\|_2^2+\frac{1}{2m}\|\nabla f(x)\|_2^2+\nabla f(x)^T(x^*-x) \ge 0\\ f(x^*) &=f(x)+\nabla f(x)(x^*-x)+(x^*-x)^T\nabla ^2f(x)(x^*-x)/2\\ & \ge f(x)+\nabla f(x)(x^*-x)+\frac{m}{2}\|x^*-x\|^2_2\\ &\ge f(x)-\frac{1}{2m}\|\nabla f(x)\|_2^2\\ \end{align*}
左边不等式:已知f(x)右边不等式:已知f(x∗)∥∇2f(x)∥2⪯m,=f(x∗)+∇f(x∗)(x−x∗)+(x−x∗)T∇2f(x∗)(x−x∗)/2≥f(x∗)+2m∥x∗−x∥22[2m(x∗−x)+2m1∇f(x)]2=2m∥x∗−x∥22+2m1∥∇f(x)∥22+∇f(x)T(x∗−x)≥0=f(x)+∇f(x)(x∗−x)+(x∗−x)T∇2f(x)(x∗−x)/2≥f(x)+∇f(x)(x∗−x)+2m∥x∗−x∥22≥f(x)−2m1∥∇f(x)∥22
四、Inexact Newton method(一个包含多种细分解法的方法)
前面的pure和damped是通过直接解 ∇ 2 f ( x k ) p k + ∇ f ( x k ) = 0 ⇒ p k = − ∇ 2 f ( x k ) − 1 ∇ f ( x k ) \nabla ^2f(x_k)p_k+\nabla f(x_k)=0\Rightarrow p_k=-\nabla ^2f(x_k)^{-1}\nabla f(x_k) ∇2f(xk)pk+∇f(xk)=0⇒pk=−∇2f(xk)−1∇f(xk) 得到的,inexact是通过迭代方式去解 r k = ∇ 2 f ( x k ) p k + ∇ f ( x k ) = 0 r_k=\nabla ^2f(x_k)p_k+\nabla f(x_k)=0 rk=∇2f(xk)pk+∇f(xk)=0 的。