牛顿法&拟牛顿法
设无约束优化问题:
min
f
(
x
)
,
x
∈
R
n
\min f(x),{\kern 1pt} \,x \in {R^n}
minf(x),x∈Rn
1 牛顿法
基本思想,通过泰勒二阶展开,通过对泰勒展开求导,并令其等于0,从而求得极小值。
将
f
(
x
)
f(x)
f(x)在
x
k
x_k
xk处进行泰勒展开:
f
(
x
)
≈
f
(
x
k
)
+
Δ
f
(
x
k
)
(
x
−
x
k
)
+
1
2
(
x
−
x
k
)
T
Δ
2
f
(
x
k
)
(
x
−
x
k
)
+
0
(
∣
∣
x
−
x
k
∣
∣
2
)
f(x) \approx f({x_k}) + {\rm{\Delta }}f({x_k})(x - {x_k}) + \frac{1}{2}{\left( {x - {x_k}} \right)^T}{{\rm{\Delta }}^2}f({x_k})(x - {x_k}) + 0(||x - {x_k}|{|^2})
f(x)≈f(xk)+Δf(xk)(x−xk)+21(x−xk)TΔ2f(xk)(x−xk)+0(∣∣x−xk∣∣2)
对泰勒式及进行求导:
∇
f
(
x
)
≈
∇
f
(
x
k
)
+
∇
2
f
(
x
k
)
(
x
−
x
k
)
\nabla f(x) \approx \nabla f({x_k}) + {\nabla ^2}f({x_k})(x - {x_k})
∇f(x)≈∇f(xk)+∇2f(xk)(x−xk)
令上次为0,便可求得极小值位置:
∇
f
(
x
k
)
=
−
∇
2
f
(
x
k
)
(
x
−
x
k
)
\nabla f({x_k}){\rm{ = }} - {\nabla ^2}f({x_k})(x - {x_k})
∇f(xk)=−∇2f(xk)(x−xk)
得:
x
=
x
k
−
∇
2
f
(
x
k
)
−
1
∇
f
(
x
k
)
x{\rm{ = }}{x_k} - {\nabla ^2}f{({x_k})^{{\rm{ - 1}}}}\nabla f({x_k})
x=xk−∇2f(xk)−1∇f(xk)
我们便可得下降到最小值的方向:
−
∇
2
f
(
x
k
)
−
1
∇
f
(
x
k
)
-{\nabla ^2}f{({x_k})^{{\rm{ - 1}}}}\nabla f({x_k})
−∇2f(xk)−1∇f(xk)
牛顿法流程:
设定初始点:
x
0
x_0
x0,迭代步数
k
=
0
k=0
k=0
开始迭代:
1)计算梯度,Hessian矩阵,从而得到牛顿迭代方向:
∇
f
(
x
k
)
2
,
f
(
x
k
)
,
d
k
=
−
∇
2
f
(
x
k
)
−
1
∇
f
(
x
k
)
\nabla f{({x_k})^2},\ \ f({x_k}),\ \ {d_k} = - {\nabla ^2}f{({x_k})^{{\rm{ - 1}}}}\nabla f({x_k})
∇f(xk)2, f(xk), dk=−∇2f(xk)−1∇f(xk)
2)判断当前迭代步长是否足够小或达到迭代次数,是则停止,否,继续下一步
3)更新:
x
=
x
k
+
d
k
k
+
+
\begin{array}{l} x{\rm{ = }}{x_k} + {d_k}\\ k ++ \end{array}
x=xk+dkk++
转步骤1)
Tips:
此外,牛顿法,是局部收敛的。因为是基于求解极值的方法,因而会局限于局部极小值,因而要求初始点选取在全局最优解附近。
牛顿法有一次收敛性。但由于
∇
2
f
(
x
k
)
{\nabla ^2}f({x_k})
∇2f(xk)不能保证一定正定,会出现秩为0的情况,此时,牛顿法会失效。因而基于修正hessian矩阵的算法出现,如LM算法。
2 拟牛顿法
拟牛顿法是一种基于牛顿法的变形,提供一种思想,包含多种解决方案。
基本思想:牛顿法虽收敛速度快,但要计算Hessian矩阵及其逆矩阵,且Hessian矩阵不一定正定,为了减少计算量,拟牛顿法被提出,其基本思想是使用目标函数的一阶信息近似牛顿法的Hessian矩阵。
拟牛顿法的条件:使用k时刻的一阶二阶近似出
k
+
1
k+1
k+1时刻的二阶Hessian矩阵。
我们将目标函数
f
(
x
)
f(x)
f(x)在
x
k
+
1
x_{k+1}
xk+1处进行泰勒展开:
f
(
x
)
≈
f
(
x
k
+
1
)
+
Δ
f
(
x
k
+
1
)
(
x
−
x
k
+
1
)
+
1
2
(
x
−
x
k
+
1
)
T
Δ
2
f
(
x
k
+
1
)
(
x
−
x
k
+
1
)
+
0
(
∣
∣
x
−
x
k
+
!
∣
∣
2
)
f(x) \approx f({x_{k + 1}}) + {\rm{\Delta }}f({x_{k + 1}})(x - {x_{k + 1}}) + \frac{1}{2}{\left( {x - {x_{k + 1}}} \right)^T}{{\rm{\Delta }}^2}f({x_{k + 1}})(x - {x_{k + 1}}) + 0(||x - {x_{k + !}}|{|^2})
f(x)≈f(xk+1)+Δf(xk+1)(x−xk+1)+21(x−xk+1)TΔ2f(xk+1)(x−xk+1)+0(∣∣x−xk+!∣∣2)
对泰勒式及进行求导:
∇
f
(
x
)
≈
∇
f
(
x
k
+
1
)
+
∇
2
f
(
x
k
+
1
)
(
x
−
x
k
+
1
)
\nabla f(x) \approx \nabla f({x_{k + 1}}) + {\nabla ^2}f({x_{k + 1}})(x - {x_{k + 1}})
∇f(x)≈∇f(xk+1)+∇2f(xk+1)(x−xk+1)
令
x
=
x
k
,
g
k
=
∇
f
(
x
k
+
1
)
x = {x_k},\,{g_k} = \nabla f({x_{k + 1}})
x=xk,gk=∇f(xk+1),则得:
g
k
−
g
k
+
1
≈
∇
2
f
(
x
k
+
1
)
(
x
k
−
x
k
+
1
)
{g_k} - {g_{k + 1}} \approx {\nabla ^2}f({x_{k + 1}})({x_k} - {x_{k + 1}})
gk−gk+1≈∇2f(xk+1)(xk−xk+1)
x
k
−
x
k
+
1
≈
∇
2
f
(
x
k
+
1
)
(
g
k
−
g
k
+
1
)
{x_k} - {x_{k + 1}} \approx {\nabla ^2}f({x_{k + 1}})({g_k} - {g_{k + 1}})
xk−xk+1≈∇2f(xk+1)(gk−gk+1)
令
q
k
=
g
k
−
g
k
+
1
,
p
k
=
x
k
−
x
k
+
1
,
H
k
+
1
=
(
∇
2
f
(
x
k
+
1
)
)
−
1
{q_k} = {g_k} - {g_{k + 1}},\,\,{p_k} = {x_k} - {x_{k + 1}},\,\,{H_{k + 1}} = \left({\nabla ^2}f({x_{k + 1}})\right)^{-1}
qk=gk−gk+1,pk=xk−xk+1,Hk+1=(∇2f(xk+1))−1,则得拟牛顿法的条件:
p
k
≈
H
k
+
1
q
k
{p_k} \approx {H_{k + 1}}{q_k}
pk≈Hk+1qk
Tips:
此处H表示的是Hessian矩阵的逆。
关于这个公式的理解,可以将x想为时间t,即,两帧(两次迭代之间
k
−
>
k
+
1
k->k+1
k−>k+1)的速度变化,等于
k
+
1
k+1
k+1时刻的加速度*两帧之间的时差。
此外,如果我们对目标函数在
x
k
x_k
xk处展开,再将
x
k
+
1
x_{k+1}
xk+1代入,可以得到 。这里,引入思考,泰勒展开式只保证在展开点的一定邻域内有效,当
x
k
x_k
xk与
x
k
+
1
x_{k+1}
xk+1相差较大时,以上拟牛顿法的条件是否也有一定误差。
牛顿法是使 H k + 1 = H k + Δ H k {H_{k + 1}} = {H_k} + \Delta {H_k} Hk+1=Hk+ΔHk来拟合 H k + 1 {H_{k + 1}} Hk+1。
2.1 对称秩1校正
这个思想是,保证 的半正定及对称性,我们令:
Δ
H
k
=
v
k
v
k
T
,
v
k
≠
0
\Delta {H_k} = {v_k}v_k^T,{v_k} \ne 0
ΔHk=vkvkT,vk=0
由于
v
k
v
k
T
{v_k}v_k^T
vkvkT使得
Δ
H
k
\Delta {H_k}
ΔHk的秩为1,故而此方法名为秩1校正。
代入拟牛顿法条件
p
k
≈
H
k
+
1
q
k
{p_k} \approx {H_{k + 1}}{q_k}
pk≈Hk+1qk
p
k
≈
H
k
+
1
q
k
=
(
H
k
+
Δ
H
k
)
q
k
p
k
=
(
H
k
+
v
k
v
k
T
)
q
k
=
H
k
q
k
+
v
k
v
k
T
q
k
\begin{array}{l} {p_k} \approx {H_{k + 1}}{q_k} = \left( {{H_k} + \Delta {H_k}} \right){q_k}\\ {p_k} = \left( {{H_k} + {v_k}v_k^T} \right){q_k} = {H_k}{q_k} + {v_k}v_k^T{q_k} \end{array}
pk≈Hk+1qk=(Hk+ΔHk)qkpk=(Hk+vkvkT)qk=Hkqk+vkvkTqk
从而得:
v
k
v
k
T
q
k
=
p
k
−
H
k
q
k
{v_k}v_k^T{q_k} = {p_k} - {H_k}{q_k}
vkvkTqk=pk−Hkqk
由于
q
k
{q_k}
qk为向量,上式可以看成
A
x
=
b
Ax=b
Ax=b,
A
A
A为对称对阵,求
A
A
A的问题。
等式两侧都乘以
(
p
k
−
H
k
q
k
)
T
{\left( {{p_k} - {H_k}{q_k}} \right)^T}
(pk−Hkqk)T,得:
v
k
v
k
T
q
k
(
p
k
−
H
k
q
k
)
T
=
(
p
k
−
H
k
q
k
)
(
p
k
−
H
k
q
k
)
T
{v_k}v_k^T{q_k}{\left( {{p_k} - {H_k}{q_k}} \right)^T} = \left( {{p_k} - {H_k}{q_k}} \right){\left( {{p_k} - {H_k}{q_k}} \right)^T}
vkvkTqk(pk−Hkqk)T=(pk−Hkqk)(pk−Hkqk)T
进而得:
v
k
v
k
T
=
(
p
k
−
H
k
q
k
)
(
p
k
−
H
k
q
k
)
T
q
k
(
p
k
−
H
k
q
k
)
T
{v_k}v_k^T = \frac{{\left( {{p_k} - {H_k}{q_k}} \right){{\left( {{p_k} - {H_k}{q_k}} \right)}^T}}}{{{q_k}{{\left( {{p_k} - {H_k}{q_k}} \right)}^T}}}
vkvkT=qk(pk−Hkqk)T(pk−Hkqk)(pk−Hkqk)T
因而我们可以发现,秩1校正成立的条件 q k ( p k − H k q k ) T > 0 {q_k}{\left( {{p_k} - {H_k}{q_k}} \right)^T}>0 qk(pk−Hkqk)T>0
2.2 DFP
基本思想,我们可以从秩1校正中发现
Δ
H
k
\Delta {H_k}
ΔHk的是由
p
k
p_k
pk和
H
k
q
k
H_kq_k
Hkqk组成的,因而我么可以设计以下方法来完成校正:a和b为系数。
Δ
H
k
=
a
(
p
k
p
k
T
)
+
b
(
H
k
q
k
)
(
H
k
q
k
)
T
=
a
(
P
k
P
k
T
)
+
b
H
k
q
k
q
k
T
H
k
T
{\rm{\Delta }}{H_k} = a\left( {{p_k}p_k^T} \right) + b({H_k}{q_k}){({H_k}{q_k})^T} = a\left( {{P_k}P_k^T} \right) + b{H_k}{q_k}q_k^TH_k^T
ΔHk=a(pkpkT)+b(Hkqk)(Hkqk)T=a(PkPkT)+bHkqkqkTHkT
代入拟牛顿法条件
p
k
≈
H
k
+
1
q
k
{p_k} \approx {H_{k + 1}}{q_k}
pk≈Hk+1qk
p
k
≈
H
k
+
1
q
k
=
(
H
k
+
Δ
H
k
)
q
k
p
k
=
(
H
k
+
a
(
p
k
p
k
T
)
+
b
H
k
q
k
q
k
T
H
k
T
)
q
k
=
H
k
q
k
+
a
p
k
p
k
T
q
k
+
b
H
k
q
k
q
k
T
H
k
T
q
k
\begin{array}{l} {p_k} \approx {H_{k + 1}}{q_k} = \left( {{H_k} + \Delta {H_k}} \right){q_k}\\ {p_k} = \left( {{H_k} + a\left( {{p_k}p_k^T} \right) + b{H_k}{q_k}q_k^TH_k^T} \right){q_k} = {H_k}{q_k} + a{p_k}p_k^T{q_k} + b{H_k}{q_k}q_k^TH_k^T{q_k} \end{array}
pk≈Hk+1qk=(Hk+ΔHk)qkpk=(Hk+a(pkpkT)+bHkqkqkTHkT)qk=Hkqk+apkpkTqk+bHkqkqkTHkTqk
目标转为求系数a和b。根据乘法的结合率,我们可以知道:
p
k
T
q
k
p_k^T{q_k}
pkTqk和
q
k
T
H
k
T
q
k
q_k^TH_k^T{q_k}
qkTHkTqk为常数,因此,可得:
(
H
k
+
b
q
k
T
H
k
T
q
k
H
k
)
q
k
+
(
a
p
k
T
q
k
−
1
)
p
k
=
0
⇒
(
1
+
b
q
k
T
H
k
T
q
k
)
H
k
q
k
+
(
a
p
k
T
q
k
−
1
)
p
k
=
0
\begin{array}{l} \left( {{H_k} + bq_k^TH_k^T{q_k}{H_k}} \right){q_k} + \left( {ap_k^T{q_k} - 1} \right){p_k} = 0\\ \Rightarrow \left( {1 + bq_k^TH_k^T{q_k}} \right){H_k}{q_k} + \left( {ap_k^T{q_k} - 1} \right){p_k} = 0 \end{array}
(Hk+bqkTHkTqkHk)qk+(apkTqk−1)pk=0⇒(1+bqkTHkTqk)Hkqk+(apkTqk−1)pk=0
设
p
k
p_k
pk和
H
k
q
k
H_kq_k
Hkqk线性无关,则要使上式
=
0
=0
=0成立,则需:
{
1
+
b
q
k
T
H
k
T
q
k
=
0
a
p
k
T
q
k
−
1
=
0
⇒
{
a
=
1
p
k
T
q
k
b
=
−
1
q
k
T
H
k
q
k
\left\{ {\begin{array}{c} {1 + bq_k^TH_k^T{q_k} = 0}\\ {ap_k^T{q_k} - 1 = 0} \end{array}} \right. \Rightarrow \left\{ {\begin{array}{c} {a = \frac{1}{{p_k^T{q_k}}}}\\ {b = - \frac{1}{{q_k^T{H_k}{q_k}}}} \end{array}} \right.
{1+bqkTHkTqk=0apkTqk−1=0⇒{a=pkTqk1b=−qkTHkqk1
从而可得:
Δ
H
k
=
p
k
p
k
T
p
k
T
q
k
−
H
k
q
k
⋅
q
k
T
H
k
q
k
T
H
k
q
k
{\rm{\Delta }}{H_k} = \frac{{{p_k}p_k^T}}{{p_k^T{q_k}}} - \frac{{{H_k}{q_k} \cdot q_k^T{H_k}}}{{q_k^T{H_k}{q_k}}}
ΔHk=pkTqkpkpkT−qkTHkqkHkqk⋅qkTHk
H
H
H为对称矩阵
H
T
=
H
H^T=H
HT=H
DFP流程:
设定初始点:
x
0
x_0
x0,迭代步数
k
=
0
k=0
k=0, H为单位矩阵
开始迭代:
1)计算梯度,信息矩阵,从而得到牛顿迭代方向:
∇
f
(
x
k
)
,
∇
2
f
(
x
k
)
,
d
k
=
−
H
k
⋅
∇
f
(
x
k
)
\nabla f({x_k}),\ \ {\nabla ^2}f({x_k}),\ \ {d_k} = - H_k \cdot \nabla f({x_k})
∇f(xk), ∇2f(xk), dk=−Hk⋅∇f(xk)
2)判断当前迭代步长是否足够小或达到迭代次数,是则停止,否,继续下一步
3)更新:
x
k
+
1
=
x
k
+
d
k
;
g
k
+
1
=
Δ
f
(
x
k
+
1
)
,
p
k
=
x
k
+
1
−
x
k
,
q
k
=
g
k
+
1
−
g
k
;
H
k
+
1
=
H
k
+
p
k
p
k
T
p
k
T
q
k
−
H
k
q
k
⋅
q
k
T
H
k
q
k
T
H
k
q
k
k
+
+
;
\begin{array}{l} x_{k + 1}{\rm{ = }}{x_k} + {d_k}; {g_{k + 1}} = {\rm{\Delta }}f({x_{k + 1}}),\ \ {p_k} = {x_{k + 1}} - {x_k},\ \ {q_k} = {g_{k + 1}} - {g_k};\\ {H_{k + 1}} = {H_k} + \frac{{{p_k}p_k^T}}{{p_k^T{q_k}}} - \frac{{{H_k}{q_k} \cdot q_k^T{H_k}}}{{q_k^T{H_k}{q_k}}}\\ \ k + + ;\\ \end{array}
xk+1=xk+dk;gk+1=Δf(xk+1), pk=xk+1−xk, qk=gk+1−gk;Hk+1=Hk+pkTqkpkpkT−qkTHkqkHkqk⋅qkTHk k++;
转步骤1)
示例代码:
#include <iostream>
#include <cmath>
using namespace std;
// 目标函数
double objectiveFunction(double x)
{
return pow(x, 2) - 4 * x + 4;
}
// 目标函数的一阶导数
double derivativeFunction(double x)
{
return 2 * x - 4;
}
// 拟牛顿法(BFGS算法)求解函数
double bfgsMethod(double initialGuess, double tolerance)
{
double x = initialGuess;
double dx = 0.0;
double H = 1.0; // 初始Hessian逆矩阵
float s = 0.1;
do {
double g = derivativeFunction(x);
double step = -H * g;
x = x + s*step;
double dg = derivativeFunction(x) - g;
// 更新Hessian逆矩阵
double deltaX = step;
double deltaY = dg;
H += (deltaX * deltaX) / (deltaX * deltaY) - (H * deltaY * deltaY * H) / (deltaY * H * deltaY);
dx = abs(step);
} while (dx > tolerance);
return x;
}
int main()
{
double initialGuess = 1.0; // 初始猜测值
double tolerance = 0.0001; // 精度
double result = bfgsMethod(initialGuess, tolerance);
cout << "Solution: " << result << endl;
return 0;
}
2.3 BFGS
基本思想:由BFGS拟合Hessian阵来完成迭代。
x
k
−
x
k
+
1
≈
∇
2
f
(
x
k
+
1
)
−
1
(
g
k
−
g
k
+
1
)
.
↦
g
k
−
g
k
≈
∇
2
f
(
x
k
+
1
)
(
x
k
−
x
k
+
1
)
{x_k} - {x_{k + 1}} \approx {\nabla ^2}f{({x_{k + 1}})^{ - 1}}({g_k} - {g_{k + 1}}).\,\,\,\quad \mapsto \quad {g_k} - {g_k} \approx {\nabla ^2}f({x_{k + 1}})({x_k} - x_{k + 1})
xk−xk+1≈∇2f(xk+1)−1(gk−gk+1).↦gk−gk≈∇2f(xk+1)(xk−xk+1)
即
p
k
≈
H
k
+
1
q
k
↦
q
k
≈
B
k
+
1
p
k
{p_k} \approx {H_{k + 1}}{q_k}\,\quad \mapsto \quad {q_k} \approx {B_{k + 1}}{p_k}
pk≈Hk+1qk↦qk≈Bk+1pk
求
Δ
B
k
\Delta {B_k}
ΔBk的思路,即可以把
Δ
H
k
=
p
k
p
k
T
p
k
T
q
k
−
H
k
q
k
⋅
q
k
T
H
k
q
k
T
H
k
q
k
{\rm{\Delta }}{H_k} = \frac{{{p_k}p_k^T}}{{p_k^T{q_k}}} - \frac{{{H_k}{q_k} \cdot q_k^T{H_k}}}{{q_k^T{H_k}{q_k}}}
ΔHk=pkTqkpkpkT−qkTHkqkHkqk⋅qkTHk中的
p
k
{p_k}
pk换成
q
k
q_k
qk,将
H
k
q
k
{H_k}{q_k}
Hkqk替换为
B
k
p
k
{B_k}{p_k}
Bkpk,最后得BFGS校正公式:
Δ
B
k
=
q
k
q
k
T
q
k
T
p
k
−
B
k
p
k
⋅
p
k
T
B
k
p
k
T
B
k
p
k
{\rm{\Delta }}{B_k} = \frac{{{q_k}q_k^T}}{{q_k^Tp_k}} - \frac{{{B_k}{p_k} \cdot p_k^T{B_k}}}{{p_k^T{B_k}{p_k}}}
ΔBk=qkTpkqkqkT−pkTBkpkBkpk⋅pkTBk
该公式是较为有效率的一个校正公式。
此外,我们对
Δ
B
k
\Delta {B_k}
ΔBk(DFP)的校正公式进行求逆,可以得到另一个的校正公式:
Δ
B
k
=
(
I
−
q
k
p
k
T
q
k
T
p
k
)
B
k
(
I
−
p
k
q
k
T
q
k
T
p
k
)
+
q
k
q
k
T
q
k
T
p
k
{\rm{\Delta }}{B_k} = (I - \frac{{{q_k}p_k^T}}{{q_k^T{p_k}}}){B_k}(I - \frac{{{p_k}q_k^T}}{{q_k^T{p_k}}}) + \frac{{{q_k}q_k^T}}{{q_k^Tp_k}}
ΔBk=(I−qkTpkqkpkT)Bk(I−qkTpkpkqkT)+qkTpkqkqkT
示例代码
#include <iostream>
#include <cmath>
using namespace std;
// 目标函数
double objectiveFunction(double x)
{
return pow(x, 2) - 4 * x + 4;
}
// 目标函数的一阶导数
double derivativeFunction(double x)
{
return 2 * x - 4;
}
// 拟牛顿法(BFGS算法)求解函数
double bfgsMethod(double initialGuess, double tolerance)
{
double x = initialGuess;
double dx = 0.0;
double B = 1.0; // 初始Hessian逆矩阵
float s = 0.1;
do {
double g = derivativeFunction(x);
double step = -g / B;
x = x + s*step;
double dg = derivativeFunction(x) - g;
// 更新Hessian逆矩阵
double deltaX = step;
double deltaY = dg;
#if 1
B += (1-(deltaY * deltaX) / (deltaY * deltaX)) * B *
(1-(deltaX * deltaY) / (deltaY * deltaX))
+ (deltaY * deltaY) / (deltaY * deltaX);
#else
B += (deltaY * deltaY) / (deltaY * deltaX)
- (B * deltaX * deltaX * B) / (deltaX * B * deltaX);
#endif
dx = abs(step);
} while (dx > tolerance);
return x;
}
int main()
{
double initialGuess = 1.0; // 初始猜测值
double tolerance = 0.0001; // 精度
double result = bfgsMethod(initialGuess, tolerance);
cout << "Solution: " << result << endl;
return 0;
}
2维加步长确定:
#include <iostream>
#include <cmath>
#include <Eigen/Dense>
#include <Eigen/Core>
using namespace Eigen;
using namespace std;
// Objective function to minimize (example: quadratic function)
double objectiveFunction(const VectorXd& x) {
return x[0]*x[0] + 2*x[1]*x[1];
}
// Gradient of the objective function
VectorXd gradient(const VectorXd& x) {
VectorXd grad(2);
grad[0] = 2*x[0];
grad[1] = 4*x[1];
return grad;
}
// BFGS optimization algorithm
void bfgsOptimization(VectorXd& x0, double epsilon, int maxIterations) {
int n = x0.size();
MatrixXd B = MatrixXd::Identity(n, n); // Initial approximation of the inverse Hessian matrix
VectorXd x = x0;
for (int iter = 0; iter < maxIterations; ++iter) {
VectorXd grad = gradient(x);
double alpha = 1; // Line search step size
// Line search to determine the step size alpha
//步长太大,缩一半
#if 1
//if (f(x_k+1) > f(x_k) - 0.5*f'(x_k)*d_k)
if (objectiveFunction(x - alpha * B * grad) > objectiveFunction(x) - 0.1 * alpha * grad.transpose() * B * grad) {
alpha *= 0.1;
}
VectorXd s = -alpha * B.inverse() * grad;
#else
if (objectiveFunction(x - alpha * grad) > objectiveFunction(x) - 0.1 * alpha * grad.transpose() * grad) {
alpha *= 0.1;
}
VectorXd s = -alpha * grad;
#endif
VectorXd xNext = x + s;
VectorXd gradNext = gradient(xNext);
VectorXd y = gradNext - grad;
// BFGS update formula
B = B + (y * y.transpose()) / (y.transpose() * s) - (B * s * s.transpose() * B) / (s.transpose() * B * s);
x = xNext;
cout << "iter: " << iter+1 << " x: " << x.transpose() << " cost: " << objectiveFunction(x) << endl;
// Check for convergence
if (grad.norm() < epsilon) {
std::cout << "Converged after " << iter + 1 << " iterations." << std::endl;
break;
}
}
std::cout << "Optimal solution: " << x.transpose() << std::endl;
std::cout << "Optimal value: " << objectiveFunction(x) << std::endl;
}
int main() {
VectorXd x0(2);
x0 << 1.0, 1.0; // Initial guess
double epsilon = 1e-6; // Tolerance for convergence
int maxIterations = 100; // Maximum number of iterations
bfgsOptimization(x0, epsilon, maxIterations);
return 0;
拟牛顿法与线搜索
由于拟牛顿法使用的是近似的hessian矩阵,因此步长的选择非常重要,通常与线搜索方法结合。
#include <iostream>
#include <cmath>
#include <Eigen/Dense>
using namespace Eigen;
using namespace std;
// 定义目标函数 f(x)
double f(Vector2d x) {
return x[0] * x[0] + 2 * x[0] * x[1] + x[1] * x[1];
}
// 计算目标函数在 x 处的梯度
Vector2d grad_f(Vector2d x) {
Vector2d res;
res[0] = 2 * x[0] + 2 * x[1];
res[1] = 2 * x[0] + 2 * x[1];
return res;
}
// BFGS 算法
Vector2d BFGS(Vector2d x0, double tol, int maxiter) {
Vector2d x = x0;
Matrix2d B = Matrix2d::Identity();
int k = 0;
while (k < maxiter) {
Vector2d g = grad_f(x);
if (g.norm() < tol) {
break;
}
Vector2d p = -B * g;
double alpha = 1.0;
double rho = 0.9;
double c = 0.0001;
double phi = f(x);
Vector2d grad_phi = grad_f(x);
while (f(x + alpha * p) > phi + c * alpha * p.dot(grad_phi)) {
alpha *= rho;
}
Vector2d s = alpha * p;
Vector2d y = grad_f(x + s) - g;
double rho_inv = 1 / y.dot(s);
Matrix2d term1 = Matrix2d::Identity() - rho_inv * s * y.transpose();
Matrix2d term2 = rho_inv * s * s.transpose();
B = term1 * B * term1.transpose() + term2;
x += s;
k++;
}
return x;
}
int main() {
Vector2d x0(1, 2); // 初始参数值
double tol = 1e-6; // 迭代停止容差
int maxiter = 100; // 最大迭代次数
Vector2d res = BFGS(x0, tol, maxiter);
cout << "Optimal parameters: (" << res[0] << ", " << res[1] << ")" << endl;
cout << "Optimal function value: " << f(res) << endl;
return 0;
}