拟牛顿迭代法-BFGS - Python实现

本文介绍了BFGS算法的原理,包括利用一阶信息构造近似二阶Hessian矩阵,结合ArmijoRule实现超线性收敛。算法通过迭代更新矩阵,保证正定对称性,实现最优化。同时,提供了一个Python代码示例,展示了BFGS算法的实现过程。
摘要由CSDN通过智能技术生成

1 算法特征:

利用函数 f ( x ⃗ ) f(\vec{x}) f(x )的1阶信息, 构造其近似的二阶Hessian矩阵. 结合Armijo Rule, 在最优化过程中达到超线性收敛的目的.
算法推导:
为书写方便, 引入如下两个符号 B B B D D D分别表示近似Hessian矩阵及其逆矩阵:
{ B ≈ H D ≈ H − 1 \begin{equation} \left \{ \begin{array}{c} B \approx H \\ D \approx H^{-1} \end{array} \right. \end{equation} {BHDH1
注意, B B B D D D均为对称矩阵.
考虑第 k k k步迭代, 将其与第 k − 1 k-1 k1步迭代的优化变量及梯度之差分别记为 s ⃗ k \vec{s}_k s k y ⃗ k \vec{y}_k y k:
{ s ⃗ k = x ⃗ k − x ⃗ k − 1 y ⃗ k = ∇ f ( x ⃗ k ) − ∇ f ( x ⃗ k − 1 ) \begin{equation} \left \{ \begin{array}{c} \vec{s}_k = \vec{x}_k - \vec{x}_{k-1} \\ \vec{y}_k = \nabla{f(\vec{x}_k)} - \nabla{f(\vec{x}_{k-1})} \end{array} \right. \end{equation} {s k=x kx k1y k=f(x k)f(x k1)
引入割线条件(类似于微分中值定理):
{ y ⃗ k = B k ⋅ s ⃗ k \begin{equation} \left \{ \begin{array}{c} \vec{y}_k = B_k \cdot \vec{s}_k \end{array} \right. \end{equation} {y k=Bks k
因此有:
{ D k ⋅ y ⃗ k = s ⃗ k \begin{equation} \left \{ \begin{array}{c} D_k \cdot \vec{y}_k = \vec{s}_k \end{array} \right. \end{equation} {Dky k=s k
在相邻两步迭代过程中, 采用Frobenius范数衡量矩阵之变化. 为增强近似Hessian矩阵的稳定性, 考虑近似Hessian矩阵遵循

保守变化, 即在满足约束条件的情况下, 相邻两步迭代过程中 B B B D D D的变化越小越好. 由此抽象出一个优化问题如下(以下忽略矢量符号):
{ min ⁡ ∥ D k − D k − 1 ∥ F 2 s . t . D k ⋅ y k = s k D k  is symmetric \begin{equation} \left \{ \begin{array}{c} \min\quad &\left \| D_k - D_{k-1}\right \|_F^2\\ \mathrm{s.t.} \quad &D_k \cdot y_k = s_k\\ \quad &D_k\text{ is symmetric} \end{array} \right. \end{equation} mins.t.DkDk1F2Dkyk=skDk is symmetric
其中, D k D_k Dk为优化变量, 代表第 k k k步迭代近似Hessian矩阵的逆矩阵. 该优化问题的最优解形式如下:
{ D k = ( I − ρ k s k y k T ) D k − 1 ( I − ρ k y k s k T ) + ρ k s k s k T \begin{equation} \left \{ \begin{array}{c} D_k = (I - \rho_ks_ky_k^T)D_{k-1}(I - \rho_ky_ks_k^T) + \rho_ks_ks_k^T \end{array} \right. \end{equation} {Dk=(IρkskykT)Dk1(IρkykskT)+ρkskskT
其中, ρ k = ( y k T s k ) − 1 \rho_k = (y_k^Ts_k)^{-1} ρk=(ykTsk)1.

2 原理推导:

BFGS算法是用一个矩阵 B k B_k Bk去模拟海瑟矩阵 H k H_k Hk,它的更新公式同样假设有两个附加项:
B k + 1 = B k + P k + Q k \begin{equation} B_{k+1}=B_k+P_k+Q_k \end{equation} Bk+1=Bk+Pk+Qk
当然,它需要满足拟牛顿条件:
B k + δ k = y k \begin{equation} B_k+ δ_k=y_k \end{equation} Bk+δk=yk
所以:
B k + 1 δ k = B k δ k + P k δ k + Q k δ k = y k \begin{equation} B_{k+1}δ_k=B_kδ_k+P_kδ_k+Q_kδ_k=y_k \end{equation} Bk+1δk=Bkδk+Pkδk+Qkδk=yk

考虑,使 P k P_k Pk Q k Q_k Qk满足下面两个条件:
{ P k δ k = y k Q k δ k = − B k δ k \begin{equation} \left \{ \begin{array}{c} P_kδ_k=y_k \\ Q_kδ_k=−B_kδ_k \end{array} \right. \end{equation} {Pkδk=ykQkδk=Bkδk

可以得到满足条件的解:
{ P k = y k y k T y k T δ k Q k = − B k δ k δ T k B k δ k T B k δ k \begin{equation} \left \{ \begin{array}{c} P_k = \frac {y_ky_k^T } {y_k^Tδ_k} \\ Q_k=−\frac {B_kδ_kδT_kB_k}{δ_k^TB_kδ_k} \end{array} \right. \end{equation} {Pk=ykTδkykykTQk=δkTBkδkBkδkδTkBk

同样可以证明,如果 B 0 B_0 B0正定对称,那么迭代过程中的每个矩阵 B k B_k Bk都是正定对称的,由于这里是对 H k H_k Hk的近似,所以每次更新梯度时,还需要对 B k B_k Bk做求逆计算,我们可以使用两次如下的Sherman-Morrison公式:
( A + u u T ) 1 = A − 1 − A − 1 u u T A − 1 1 + u T A − 1 u \begin{equation} (A+uu^T)^1=A^{-1}−\frac {A^{-1}uu^TA^{-1}}{1+u^TA^{-1}u} \end{equation} (A+uuT)1=A11+uTA1uA1uuTA1
得到BFGS算法关于Gk的迭代公式:
G k + 1 = ( I − δ k y k T δ k T y k ) G k ( I − δ k y k T δ k T y k ) T + δ k δ k T δ k T y k \begin{equation} G_{k+1}=(I−\frac {δ_ky_k^T}{δ_k^Ty_k})G_k(I−\frac {δ_ky_k^T}{δ_k^Ty_k})^T+\frac {δ_kδ_k^T}{δ_k^Ty_k}\\ \end{equation} Gk+1=(IδkTykδkykT)Gk(IδkTykδkykT)T+δkTykδkδkT

3 代码实现:

# BFGS算法实现
# 通过Armijo Rule确定迭代步长

import numpy
from matplotlib import pyplot as plt


# 目标函数0阶信息
def func(x1, x2):
    funcVal = 5 * x1 ** 2 + 2 * x2 ** 2 + 3 * x1 - 10 * x2 + 4
    return funcVal
# 目标函数1阶信息
def grad(x1, x2):
    gradVal = numpy.array([[10 * x1 + 3], [4 * x2 - 10]])
    return gradVal


class BFGS(object):

    def __init__(self, seed=None, epsilon=1.e-6, maxIter=300):
        self.__seed = seed                         # 迭代起点
        self.__epsilon = epsilon                   # 计算精度
        self.__maxIter = maxIter                   # 最大迭代次数

        self.__xPath = list()                      # 记录优化变量之路径
        self.__fPath = list()                      # 记录目标函数值之路径


    def solve(self):
        self.__init_path()

        xCurr = self.__init_seed(self.__seed)
        fCurr = func(xCurr[0, 0], xCurr[1, 0])
        gCurr = grad(xCurr[0, 0], xCurr[1, 0])
        DCurr = self.__init_D(xCurr.shape[0])      # 矩阵D初始化 ~ 此处采用单位矩阵
        self.__save_path(xCurr, fCurr)

        for i in range(self.__maxIter):
            if self.__is_converged(gCurr):
                self.__print_MSG()
                break

            dCurr = -numpy.matmul(DCurr, gCurr)
            alpha = self.__calc_alpha_by_ArmijoRule(xCurr, fCurr, gCurr, dCurr)

            xNext = xCurr + alpha * dCurr
            fNext = func(xNext[0, 0], xNext[1, 0])
            gNext = grad(xNext[0, 0], xNext[1, 0])
            DNext = self.__update_D_by_BFGS(xCurr, gCurr, xNext, gNext, DCurr)

            xCurr, fCurr, gCurr, DCurr = xNext, fNext, gNext, DNext
            self.__save_path(xCurr, fCurr)
        else:
            if self.__is_converged(gCurr):
                self.__print_MSG()
            else:
                print("BFGS not converged after {} steps!".format(self.__maxIter))


    def show(self):
        if not self.__xPath:
            self.solve()

        fig = plt.figure(figsize=(10, 4))
        ax1 = plt.subplot(1, 2, 1)
        ax2 = plt.subplot(1, 2, 2)

        ax1.plot(numpy.arange(len(self.__fPath)), self.__fPath, "k.")
        ax1.plot(0, self.__fPath[0], "go", label="starting point")
        ax1.plot(len(self.__fPath) - 1, self.__fPath[-1], "r*", label="solution")
        ax1.set(xlabel="$iterCnt$", ylabel="$iterVal$")
        ax1.legend()

        x1 = numpy.linspace(-100, 100, 500)
        x2 = numpy.linspace(-100, 100, 500)
        x1, x2 = numpy.meshgrid(x1, x2)
        f = func(x1, x2)
        ax2.contour(x1, x2, f, levels=36)
        x1Path = list(item[0] for item in self.__xPath)
        x2Path = list(item[1] for item in self.__xPath)
        ax2.plot(x1Path, x2Path, "k--", lw=2)
        ax2.plot(x1Path[0], x2Path[0], "go", label="starting point")
        ax2.plot(x1Path[-1], x2Path[-1], "r*", label="solution")
        ax2.set(xlabel="$x_1$", ylabel="$x_2$")
        ax2.legend()

        fig.tight_layout()
        fig.savefig("bfgs.png", dpi=300)
        plt.close()


    def __print_MSG(self):
        print("Iteration steps: {}".format(len(self.__xPath) - 1))
        print("Seed: {}".format(self.__xPath[0].reshape(-1)))
        print("Solution: {}".format(self.__xPath[-1].reshape(-1)))


    def __is_converged(self, gCurr):
        if numpy.linalg.norm(gCurr) <= self.__epsilon:
            return True
        return False


    def __update_D_by_BFGS(self, xCurr, gCurr, xNext, gNext, DCurr):
        sk = xNext - xCurr
        yk = gNext - gCurr
        rk = 1 / numpy.matmul(yk.T, sk)[0, 0]

        term1 = rk * numpy.matmul(sk, yk.T)
        term2 = rk * numpy.matmul(yk, sk.T)
        I = numpy.identity(term1.shape[0])
        term3 = numpy.matmul(I - term1, DCurr)
        term4 = numpy.matmul(term3, I - term2)
        term5 = rk * numpy.matmul(sk, sk.T)

        Dk = term4 + term5
        return Dk


    def __calc_alpha_by_ArmijoRule(self, xCurr, fCurr, gCurr, dCurr, c=1.e-4, v=0.5):
        i = 0
        alpha = v ** i
        xNext = xCurr + alpha * dCurr
        fNext = func(xNext[0, 0], xNext[1, 0])

        while True:
            if fNext <= fCurr + c * alpha * numpy.matmul(dCurr.T, gCurr)[0, 0]: break
            i += 1
            alpha = v ** i
            xNext = xCurr + alpha * dCurr
            fNext = func(xNext[0, 0], xNext[1, 0])

        return alpha


    def __init_D(self, n):
        D = numpy.identity(n)
        return D


    def __init_seed(self, seed):
        if seed is None:
            seed = numpy.random.uniform(-100, 100, 2)

         # seed = np.array(seed).reshape((2, 1))
        seed = np.array([[35.67422137],[-78.98629502]]) cyb
        print("seed",seed)
        return seed


    def __init_path(self):
        self.__xPath.clear()
        self.__fPath.clear()


    def __save_path(self, xCurr, fCurr):
        self.__xPath.append(xCurr)
        self.__fPath.append(fCurr)


if __name__ == "__main__":
    obj = BFGS()
    obj.solve()
    obj.show()
    plt.show()
    

最终效果如下:

迭代结果
seed [[ 35.67422137],[-78.98629502]]
Iteration steps: 9
Seed: [ 35.67422137 -78.98629502]
Solution: [-0.3 2.5]

参考一:博客园-LOGAN_XIONG
参考二:博客园-努力的番茄

  • 2
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

cyb_cqu

您的鼓励,是我持续创作的动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值