牛顿法与拟牛顿法

牛顿法&拟牛顿法

设无约束优化问题:
min ⁡ f ( x ) ,   x ∈ R n \min f(x),{\kern 1pt} \,x \in {R^n} minf(x),xRn

1 牛顿法

基本思想,通过泰勒二阶展开,通过对泰勒展开求导,并令其等于0,从而求得极小值。

f ( x ) f(x) f(x) x k x_k xk处进行泰勒展开:
f ( x ) ≈ f ( x k ) + Δ f ( x k ) ( x − x k ) + 1 2 ( x − x k ) T Δ 2 f ( x k ) ( x − x k ) + 0 ( ∣ ∣ x − x k ∣ ∣ 2 ) f(x) \approx f({x_k}) + {\rm{\Delta }}f({x_k})(x - {x_k}) + \frac{1}{2}{\left( {x - {x_k}} \right)^T}{{\rm{\Delta }}^2}f({x_k})(x - {x_k}) + 0(||x - {x_k}|{|^2}) f(x)f(xk)+Δf(xk)(xxk)+21(xxk)TΔ2f(xk)(xxk)+0(∣∣xxk2)

对泰勒式及进行求导:
∇ f ( x ) ≈ ∇ f ( x k ) + ∇ 2 f ( x k ) ( x − x k ) \nabla f(x) \approx \nabla f({x_k}) + {\nabla ^2}f({x_k})(x - {x_k}) f(x)f(xk)+2f(xk)(xxk)

令上次为0,便可求得极小值位置:
∇ f ( x k ) = − ∇ 2 f ( x k ) ( x − x k ) \nabla f({x_k}){\rm{ = }} - {\nabla ^2}f({x_k})(x - {x_k}) f(xk)=2f(xk)(xxk)

得:
x = x k − ∇ 2 f ( x k ) − 1 ∇ f ( x k ) x{\rm{ = }}{x_k} - {\nabla ^2}f{({x_k})^{{\rm{ - 1}}}}\nabla f({x_k}) x=xk2f(xk)1f(xk)
我们便可得下降到最小值的方向: − ∇ 2 f ( x k ) − 1 ∇ f ( x k ) -{\nabla ^2}f{({x_k})^{{\rm{ - 1}}}}\nabla f({x_k}) 2f(xk)1f(xk)

牛顿法流程:
设定初始点: x 0 x_0 x0,迭代步数 k = 0 k=0 k=0
开始迭代:
1)计算梯度,Hessian矩阵,从而得到牛顿迭代方向:
∇ f ( x k ) 2 ,    f ( x k ) ,    d k = − ∇ 2 f ( x k ) − 1 ∇ f ( x k ) \nabla f{({x_k})^2},\ \ f({x_k}),\ \ {d_k} = - {\nabla ^2}f{({x_k})^{{\rm{ - 1}}}}\nabla f({x_k}) f(xk)2,  f(xk),  dk=2f(xk)1f(xk)

2)判断当前迭代步长是否足够小或达到迭代次数,是则停止,否,继续下一步

3)更新:
x = x k + d k k + + \begin{array}{l} x{\rm{ = }}{x_k} + {d_k}\\ k ++ \end{array} x=xk+dkk++
转步骤1)

Tips:
此外,牛顿法,是局部收敛的。因为是基于求解极值的方法,因而会局限于局部极小值,因而要求初始点选取在全局最优解附近。
牛顿法有一次收敛性。但由于 ∇ 2 f ( x k ) {\nabla ^2}f({x_k}) 2f(xk)不能保证一定正定,会出现秩为0的情况,此时,牛顿法会失效。因而基于修正hessian矩阵的算法出现,如LM算法。

2 拟牛顿法

拟牛顿法是一种基于牛顿法的变形,提供一种思想,包含多种解决方案。
基本思想:牛顿法虽收敛速度快,但要计算Hessian矩阵及其逆矩阵,且Hessian矩阵不一定正定,为了减少计算量,拟牛顿法被提出,其基本思想是使用目标函数的一阶信息近似牛顿法的Hessian矩阵。

拟牛顿法的条件:使用k时刻的一阶二阶近似出 k + 1 k+1 k+1时刻的二阶Hessian矩阵。
我们将目标函数 f ( x ) f(x) f(x) x k + 1 x_{k+1} xk+1处进行泰勒展开:
f ( x ) ≈ f ( x k + 1 ) + Δ f ( x k + 1 ) ( x − x k + 1 ) + 1 2 ( x − x k + 1 ) T Δ 2 f ( x k + 1 ) ( x − x k + 1 ) + 0 ( ∣ ∣ x − x k + ! ∣ ∣ 2 ) f(x) \approx f({x_{k + 1}}) + {\rm{\Delta }}f({x_{k + 1}})(x - {x_{k + 1}}) + \frac{1}{2}{\left( {x - {x_{k + 1}}} \right)^T}{{\rm{\Delta }}^2}f({x_{k + 1}})(x - {x_{k + 1}}) + 0(||x - {x_{k + !}}|{|^2}) f(x)f(xk+1)+Δf(xk+1)(xxk+1)+21(xxk+1)TΔ2f(xk+1)(xxk+1)+0(∣∣xxk+!2)

对泰勒式及进行求导:
∇ f ( x ) ≈ ∇ f ( x k + 1 ) + ∇ 2 f ( x k + 1 ) ( x − x k + 1 ) \nabla f(x) \approx \nabla f({x_{k + 1}}) + {\nabla ^2}f({x_{k + 1}})(x - {x_{k + 1}}) f(x)f(xk+1)+2f(xk+1)(xxk+1)

x = x k ,   g k = ∇ f ( x k + 1 ) x = {x_k},\,{g_k} = \nabla f({x_{k + 1}}) x=xk,gk=f(xk+1),则得:
g k − g k + 1 ≈ ∇ 2 f ( x k + 1 ) ( x k − x k + 1 ) {g_k} - {g_{k + 1}} \approx {\nabla ^2}f({x_{k + 1}})({x_k} - {x_{k + 1}}) gkgk+12f(xk+1)(xkxk+1) x k − x k + 1 ≈ ∇ 2 f ( x k + 1 ) ( g k − g k + 1 ) {x_k} - {x_{k + 1}} \approx {\nabla ^2}f({x_{k + 1}})({g_k} - {g_{k + 1}}) xkxk+12f(xk+1)(gkgk+1)

q k = g k − g k + 1 ,    p k = x k − x k + 1 ,    H k + 1 = ( ∇ 2 f ( x k + 1 ) ) − 1 {q_k} = {g_k} - {g_{k + 1}},\,\,{p_k} = {x_k} - {x_{k + 1}},\,\,{H_{k + 1}} = \left({\nabla ^2}f({x_{k + 1}})\right)^{-1} qk=gkgk+1,pk=xkxk+1,Hk+1=(2f(xk+1))1,则得拟牛顿法的条件:
p k ≈ H k + 1 q k {p_k} \approx {H_{k + 1}}{q_k} pkHk+1qk

Tips:
此处H表示的是Hessian矩阵的逆。
关于这个公式的理解,可以将x想为时间t,即,两帧(两次迭代之间 k − > k + 1 k->k+1 k>k+1)的速度变化,等于 k + 1 k+1 k+1时刻的加速度*两帧之间的时差。
此外,如果我们对目标函数在 x k x_k xk处展开,再将 x k + 1 x_{k+1} xk+1代入,可以得到 。这里,引入思考,泰勒展开式只保证在展开点的一定邻域内有效,当 x k x_k xk x k + 1 x_{k+1} xk+1相差较大时,以上拟牛顿法的条件是否也有一定误差。

牛顿法是使 H k + 1 = H k + Δ H k {H_{k + 1}} = {H_k} + \Delta {H_k} Hk+1=Hk+ΔHk来拟合 H k + 1 {H_{k + 1}} Hk+1

2.1 对称秩1校正

这个思想是,保证 的半正定及对称性,我们令:
Δ H k = v k v k T , v k ≠ 0 \Delta {H_k} = {v_k}v_k^T,{v_k} \ne 0 ΔHk=vkvkTvk=0

由于 v k v k T {v_k}v_k^T vkvkT使得 Δ H k \Delta {H_k} ΔHk的秩为1,故而此方法名为秩1校正。
代入拟牛顿法条件
p k ≈ H k + 1 q k {p_k} \approx {H_{k + 1}}{q_k} pkHk+1qk p k ≈ H k + 1 q k = ( H k + Δ H k ) q k p k = ( H k + v k v k T ) q k = H k q k + v k v k T q k \begin{array}{l} {p_k} \approx {H_{k + 1}}{q_k} = \left( {{H_k} + \Delta {H_k}} \right){q_k}\\ {p_k} = \left( {{H_k} + {v_k}v_k^T} \right){q_k} = {H_k}{q_k} + {v_k}v_k^T{q_k} \end{array} pkHk+1qk=(Hk+ΔHk)qkpk=(Hk+vkvkT)qk=Hkqk+vkvkTqk

从而得: v k v k T q k = p k − H k q k {v_k}v_k^T{q_k} = {p_k} - {H_k}{q_k} vkvkTqk=pkHkqk
由于 q k {q_k} qk为向量,上式可以看成 A x = b Ax=b Ax=b A A A为对称对阵,求 A A A的问题。
等式两侧都乘以 ( p k − H k q k ) T {\left( {{p_k} - {H_k}{q_k}} \right)^T} (pkHkqk)T,得:
v k v k T q k ( p k − H k q k ) T = ( p k − H k q k ) ( p k − H k q k ) T {v_k}v_k^T{q_k}{\left( {{p_k} - {H_k}{q_k}} \right)^T} = \left( {{p_k} - {H_k}{q_k}} \right){\left( {{p_k} - {H_k}{q_k}} \right)^T} vkvkTqk(pkHkqk)T=(pkHkqk)(pkHkqk)T
进而得:
v k v k T = ( p k − H k q k ) ( p k − H k q k ) T q k ( p k − H k q k ) T {v_k}v_k^T = \frac{{\left( {{p_k} - {H_k}{q_k}} \right){{\left( {{p_k} - {H_k}{q_k}} \right)}^T}}}{{{q_k}{{\left( {{p_k} - {H_k}{q_k}} \right)}^T}}} vkvkT=qk(pkHkqk)T(pkHkqk)(pkHkqk)T

因而我们可以发现,秩1校正成立的条件 q k ( p k − H k q k ) T > 0 {q_k}{\left( {{p_k} - {H_k}{q_k}} \right)^T}>0 qk(pkHkqk)T>0

2.2 DFP

基本思想,我们可以从秩1校正中发现 Δ H k \Delta {H_k} ΔHk的是由 p k p_k pk H k q k H_kq_k Hkqk组成的,因而我么可以设计以下方法来完成校正:a和b为系数。
Δ H k = a ( p k p k T ) + b ( H k q k ) ( H k q k ) T = a ( P k P k T ) + b H k q k q k T H k T {\rm{\Delta }}{H_k} = a\left( {{p_k}p_k^T} \right) + b({H_k}{q_k}){({H_k}{q_k})^T} = a\left( {{P_k}P_k^T} \right) + b{H_k}{q_k}q_k^TH_k^T ΔHk=a(pkpkT)+b(Hkqk)(Hkqk)T=a(PkPkT)+bHkqkqkTHkT

代入拟牛顿法条件
p k ≈ H k + 1 q k {p_k} \approx {H_{k + 1}}{q_k} pkHk+1qk p k ≈ H k + 1 q k = ( H k + Δ H k ) q k p k = ( H k + a ( p k p k T ) + b H k q k q k T H k T ) q k = H k q k + a p k p k T q k + b H k q k q k T H k T q k \begin{array}{l} {p_k} \approx {H_{k + 1}}{q_k} = \left( {{H_k} + \Delta {H_k}} \right){q_k}\\ {p_k} = \left( {{H_k} + a\left( {{p_k}p_k^T} \right) + b{H_k}{q_k}q_k^TH_k^T} \right){q_k} = {H_k}{q_k} + a{p_k}p_k^T{q_k} + b{H_k}{q_k}q_k^TH_k^T{q_k} \end{array} pkHk+1qk=(Hk+ΔHk)qkpk=(Hk+a(pkpkT)+bHkqkqkTHkT)qk=Hkqk+apkpkTqk+bHkqkqkTHkTqk

目标转为求系数a和b。根据乘法的结合率,我们可以知道: p k T q k p_k^T{q_k} pkTqk q k T H k T q k q_k^TH_k^T{q_k} qkTHkTqk为常数,因此,可得:
( H k + b q k T H k T q k H k ) q k + ( a p k T q k − 1 ) p k = 0 ⇒ ( 1 + b q k T H k T q k ) H k q k + ( a p k T q k − 1 ) p k = 0 \begin{array}{l} \left( {{H_k} + bq_k^TH_k^T{q_k}{H_k}} \right){q_k} + \left( {ap_k^T{q_k} - 1} \right){p_k} = 0\\ \Rightarrow \left( {1 + bq_k^TH_k^T{q_k}} \right){H_k}{q_k} + \left( {ap_k^T{q_k} - 1} \right){p_k} = 0 \end{array} (Hk+bqkTHkTqkHk)qk+(apkTqk1)pk=0(1+bqkTHkTqk)Hkqk+(apkTqk1)pk=0
p k p_k pk H k q k H_kq_k Hkqk线性无关,则要使上式 = 0 =0 =0成立,则需:
{ 1 + b q k T H k T q k = 0 a p k T q k − 1 = 0 ⇒ { a = 1 p k T q k b = − 1 q k T H k q k \left\{ {\begin{array}{c} {1 + bq_k^TH_k^T{q_k} = 0}\\ {ap_k^T{q_k} - 1 = 0} \end{array}} \right. \Rightarrow \left\{ {\begin{array}{c} {a = \frac{1}{{p_k^T{q_k}}}}\\ {b = - \frac{1}{{q_k^T{H_k}{q_k}}}} \end{array}} \right. {1+bqkTHkTqk=0apkTqk1=0{a=pkTqk1b=qkTHkqk1
从而可得:
Δ H k = p k p k T p k T q k − H k q k ⋅ q k T H k q k T H k q k {\rm{\Delta }}{H_k} = \frac{{{p_k}p_k^T}}{{p_k^T{q_k}}} - \frac{{{H_k}{q_k} \cdot q_k^T{H_k}}}{{q_k^T{H_k}{q_k}}} ΔHk=pkTqkpkpkTqkTHkqkHkqkqkTHk
H H H为对称矩阵 H T = H H^T=H HT=H

DFP流程:
设定初始点: x 0 x_0 x0,迭代步数 k = 0 k=0 k=0, H为单位矩阵
开始迭代:
1)计算梯度,信息矩阵,从而得到牛顿迭代方向:
∇ f ( x k ) ,    ∇ 2 f ( x k ) ,    d k = − H k ⋅ ∇ f ( x k ) \nabla f({x_k}),\ \ {\nabla ^2}f({x_k}),\ \ {d_k} = - H_k \cdot \nabla f({x_k}) f(xk),  2f(xk),  dk=Hkf(xk)
2)判断当前迭代步长是否足够小或达到迭代次数,是则停止,否,继续下一步

3)更新:
x k + 1 = x k + d k ; g k + 1 = Δ f ( x k + 1 ) ,    p k = x k + 1 − x k ,    q k = g k + 1 − g k ; H k + 1 = H k + p k p k T p k T q k − H k q k ⋅ q k T H k q k T H k q k   k + + ; \begin{array}{l} x_{k + 1}{\rm{ = }}{x_k} + {d_k}; {g_{k + 1}} = {\rm{\Delta }}f({x_{k + 1}}),\ \ {p_k} = {x_{k + 1}} - {x_k},\ \ {q_k} = {g_{k + 1}} - {g_k};\\ {H_{k + 1}} = {H_k} + \frac{{{p_k}p_k^T}}{{p_k^T{q_k}}} - \frac{{{H_k}{q_k} \cdot q_k^T{H_k}}}{{q_k^T{H_k}{q_k}}}\\ \ k + + ;\\ \end{array} xk+1=xk+dk;gk+1=Δf(xk+1),  pk=xk+1xk,  qk=gk+1gk;Hk+1=Hk+pkTqkpkpkTqkTHkqkHkqkqkTHk k++;
转步骤1)
示例代码:

#include <iostream>
#include <cmath>

using namespace std;

// 目标函数
double objectiveFunction(double x)
{
    return pow(x, 2) - 4 * x + 4;
}

// 目标函数的一阶导数
double derivativeFunction(double x)
{
    return 2 * x - 4;
}

// 拟牛顿法(BFGS算法)求解函数
double bfgsMethod(double initialGuess, double tolerance)
{
    double x = initialGuess;
    double dx = 0.0;
    double H = 1.0; // 初始Hessian逆矩阵
    float s = 0.1;
    do {
        double g = derivativeFunction(x);
        double step = -H * g;
        x = x + s*step;

        double dg = derivativeFunction(x) - g;
        
        // 更新Hessian逆矩阵
        double deltaX = step;
        double deltaY = dg;
        H += (deltaX * deltaX) / (deltaX * deltaY) - (H * deltaY * deltaY * H) / (deltaY * H * deltaY);

        dx = abs(step);
    } while (dx > tolerance);

    return x;
}

int main()
{
    double initialGuess = 1.0; // 初始猜测值
    double tolerance = 0.0001; // 精度

    double result = bfgsMethod(initialGuess, tolerance);

    cout << "Solution: " << result << endl;

    return 0;
}
2.3 BFGS

基本思想:由BFGS拟合Hessian阵来完成迭代。
x k − x k + 1 ≈ ∇ 2 f ( x k + 1 ) − 1 ( g k − g k + 1 ) .     ↦ g k − g k ≈ ∇ 2 f ( x k + 1 ) ( x k − x k + 1 ) {x_k} - {x_{k + 1}} \approx {\nabla ^2}f{({x_{k + 1}})^{ - 1}}({g_k} - {g_{k + 1}}).\,\,\,\quad \mapsto \quad {g_k} - {g_k} \approx {\nabla ^2}f({x_{k + 1}})({x_k} - x_{k + 1}) xkxk+12f(xk+1)1(gkgk+1).gkgk2f(xk+1)(xkxk+1)
p k ≈ H k + 1 q k   ↦ q k ≈ B k + 1 p k {p_k} \approx {H_{k + 1}}{q_k}\,\quad \mapsto \quad {q_k} \approx {B_{k + 1}}{p_k} pkHk+1qkqkBk+1pk
Δ B k \Delta {B_k} ΔBk的思路,即可以把 Δ H k = p k p k T p k T q k − H k q k ⋅ q k T H k q k T H k q k {\rm{\Delta }}{H_k} = \frac{{{p_k}p_k^T}}{{p_k^T{q_k}}} - \frac{{{H_k}{q_k} \cdot q_k^T{H_k}}}{{q_k^T{H_k}{q_k}}} ΔHk=pkTqkpkpkTqkTHkqkHkqkqkTHk中的 p k {p_k} pk换成 q k q_k qk,将 H k q k {H_k}{q_k} Hkqk替换为 B k p k {B_k}{p_k} Bkpk,最后得BFGS校正公式:
Δ B k = q k q k T q k T p k − B k p k ⋅ p k T B k p k T B k p k {\rm{\Delta }}{B_k} = \frac{{{q_k}q_k^T}}{{q_k^Tp_k}} - \frac{{{B_k}{p_k} \cdot p_k^T{B_k}}}{{p_k^T{B_k}{p_k}}} ΔBk=qkTpkqkqkTpkTBkpkBkpkpkTBk
该公式是较为有效率的一个校正公式。

此外,我们对 Δ B k \Delta {B_k} ΔBk(DFP)的校正公式进行求逆,可以得到另一个的校正公式:
Δ B k = ( I − q k p k T q k T p k ) B k ( I − p k q k T q k T p k ) + q k q k T q k T p k {\rm{\Delta }}{B_k} = (I - \frac{{{q_k}p_k^T}}{{q_k^T{p_k}}}){B_k}(I - \frac{{{p_k}q_k^T}}{{q_k^T{p_k}}}) + \frac{{{q_k}q_k^T}}{{q_k^Tp_k}} ΔBk=(IqkTpkqkpkT)Bk(IqkTpkpkqkT)+qkTpkqkqkT
示例代码

#include <iostream>
#include <cmath>

using namespace std;

// 目标函数
double objectiveFunction(double x)
{
    return pow(x, 2) - 4 * x + 4;
}

// 目标函数的一阶导数
double derivativeFunction(double x)
{
    return 2 * x - 4;
}

// 拟牛顿法(BFGS算法)求解函数
double bfgsMethod(double initialGuess, double tolerance)
{
    double x = initialGuess;
    double dx = 0.0;
     double B = 1.0; // 初始Hessian逆矩阵
     float s = 0.1;
     do {
         double g = derivativeFunction(x);
         double step = -g / B;
         x = x + s*step;

         double dg = derivativeFunction(x) - g;

         // 更新Hessian逆矩阵
         double deltaX = step;
         double deltaY = dg;
#if 1
        B += (1-(deltaY * deltaX) / (deltaY * deltaX)) * B *
             (1-(deltaX * deltaY) / (deltaY * deltaX))
           + (deltaY * deltaY) / (deltaY * deltaX);
#else
        B += (deltaY * deltaY) / (deltaY * deltaX)
           - (B * deltaX * deltaX * B) / (deltaX * B * deltaX);
#endif

        dx = abs(step);
    } while (dx > tolerance);

    return x;
}

int main()
{
    double initialGuess = 1.0; // 初始猜测值
    double tolerance = 0.0001; // 精度

    double result = bfgsMethod(initialGuess, tolerance);

    cout << "Solution: " << result << endl;

    return 0;
}

2维加步长确定:

#include <iostream>
#include <cmath>
#include <Eigen/Dense>
#include <Eigen/Core>

using namespace Eigen;
using namespace std;

// Objective function to minimize (example: quadratic function)
double objectiveFunction(const VectorXd& x) {
    return x[0]*x[0] + 2*x[1]*x[1];
}

// Gradient of the objective function
VectorXd gradient(const VectorXd& x) {
    VectorXd grad(2);
    grad[0] = 2*x[0];
    grad[1] = 4*x[1];
    return grad;
}

// BFGS optimization algorithm
void bfgsOptimization(VectorXd& x0, double epsilon, int maxIterations) {
    int n = x0.size();
    MatrixXd B = MatrixXd::Identity(n, n);  // Initial approximation of the inverse Hessian matrix
    VectorXd x = x0;

    for (int iter = 0; iter < maxIterations; ++iter) {
        VectorXd grad = gradient(x);
        double alpha = 1;  // Line search step size

        // Line search to determine the step size alpha
        //步长太大,缩一半
#if 1
        //if (f(x_k+1) > f(x_k) - 0.5*f'(x_k)*d_k)
        if (objectiveFunction(x - alpha * B * grad) > objectiveFunction(x) - 0.1 * alpha * grad.transpose() * B * grad) {
            alpha *= 0.1;
        }
        VectorXd s = -alpha * B.inverse() * grad;

#else

        if (objectiveFunction(x - alpha * grad) > objectiveFunction(x) - 0.1 * alpha * grad.transpose() * grad) {
            alpha *= 0.1;
        }
        VectorXd s = -alpha * grad;
#endif

        VectorXd xNext = x + s;
        VectorXd gradNext = gradient(xNext);
        VectorXd y = gradNext - grad;

        // BFGS update formula
        B = B + (y * y.transpose()) / (y.transpose() * s) - (B * s * s.transpose() * B) / (s.transpose() * B * s);

        x = xNext;

        cout << "iter: " << iter+1 << " x: " << x.transpose() << " cost: " << objectiveFunction(x) << endl;
        // Check for convergence
        if (grad.norm() < epsilon) {
            std::cout << "Converged after " << iter + 1 << " iterations." << std::endl;
            break;
        }
    }

    std::cout << "Optimal solution: " << x.transpose() << std::endl;
    std::cout << "Optimal value: " << objectiveFunction(x) << std::endl;
}

int main() {
    VectorXd x0(2);
    x0 << 1.0, 1.0;  // Initial guess

    double epsilon = 1e-6;  // Tolerance for convergence
    int maxIterations = 100;  // Maximum number of iterations

    bfgsOptimization(x0, epsilon, maxIterations);

    return 0;
拟牛顿法与线搜索

由于拟牛顿法使用的是近似的hessian矩阵,因此步长的选择非常重要,通常与线搜索方法结合。

#include <iostream>
#include <cmath>
#include <Eigen/Dense>

using namespace Eigen;
using namespace std;

// 定义目标函数 f(x)
double f(Vector2d x) {
    return x[0] * x[0] + 2 * x[0] * x[1] + x[1] * x[1];
}

// 计算目标函数在 x 处的梯度
Vector2d grad_f(Vector2d x) {
    Vector2d res;
    res[0] = 2 * x[0] + 2 * x[1];
    res[1] = 2 * x[0] + 2 * x[1];
    return res;
}

// BFGS 算法
Vector2d BFGS(Vector2d x0, double tol, int maxiter) {
    Vector2d x = x0;
    Matrix2d B = Matrix2d::Identity();
    int k = 0;
    while (k < maxiter) {
        Vector2d g = grad_f(x);
        if (g.norm() < tol) {
            break;
        }
        Vector2d p = -B * g;
        double alpha = 1.0;
        double rho = 0.9;
        double c = 0.0001;
        double phi = f(x);
        Vector2d grad_phi = grad_f(x);
        while (f(x + alpha * p) > phi + c * alpha * p.dot(grad_phi)) {
            alpha *= rho;
        }
        Vector2d s = alpha * p;
        Vector2d y = grad_f(x + s) - g;
        double rho_inv = 1 / y.dot(s);
        Matrix2d term1 = Matrix2d::Identity() - rho_inv * s * y.transpose();
        Matrix2d term2 = rho_inv * s * s.transpose();
        B = term1 * B * term1.transpose() + term2;
        x += s;
        k++;
    }
    return x;
}

int main() {
    Vector2d x0(1, 2); // 初始参数值
    double tol = 1e-6; // 迭代停止容差
    int maxiter = 100; // 最大迭代次数
    Vector2d res = BFGS(x0, tol, maxiter);
    cout << "Optimal parameters: (" << res[0] << ", " << res[1] << ")" << endl;
    cout << "Optimal function value: " << f(res) << endl;
    return 0;
}

参考链接
https://www.bilibili.com/video/BV1e64y1Y7Sr/

  • 22
    点赞
  • 23
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值