变尺度法(算法分析+python代码解释)

本文介绍了DFP算法,一种基于变尺度法的优化算法,强调了其遵循的拟牛顿性质、二次收敛性和稳定性。DFP算法通过构造对称正定矩阵来近似Hesse矩阵,尤其适用于严格凸函数的全局收敛。尽管存在内存消耗和稳定性问题,BFGS算法因其更优的稳定性而被提及。
摘要由CSDN通过智能技术生成

前言:

Newton和阻尼Newton收敛速度快,但需计算 Hesse 矩阵的逆,d^{k}=-(\bigtriangledown ^{2}f(x^{k}))^{-1}\bigtriangledown f(x^{k}),计算量大。现有 n 阶正定矩阵 H_{k} 近似代替 Hesse 矩阵的逆 (\bigtriangledown ^{2}f(x^{k}))^{-1},即 H_{k}\approx (\bigtriangledown ^{2}f(x^{k}))^{-1} ,搜索方向是 p^{k}=-H_{k}g_{k} 由此产生的方法称为变尺度法,H_{k} 称为尺度矩阵。

构造尺度矩阵的原则:

1.拟牛顿性质:

在 x^{k+1} 点处对函数 f 做二阶 Taylor 展开:

f(x)=f(x^{k+1})+(\bigtriangledown f(x^{k+1}))^{T}(x-x^{k+1})+\frac{1}{2}(x-x^{k+1})^{T}\bigtriangledown ^{2}f(x^{k+1})(x-x^{k+1})+o(\left | \left | x-x^{k+1} \right | \right |^{2})

略去高阶项:

f(x)=f(x^{k+1})+(\bigtriangledown f(x^{k+1}))^{T}(x-x^{k+1})+\frac{1}{2}(x-x^{k+1})^{T}\bigtriangledown ^{2}f(x^{k+1})(x-x^{k+1})

两端求导:

\bigtriangledown f(x)\approx \bigtriangledown f(x^{k+1})+\bigtriangledown ^{2}f(x^{k+1})(x-x^{k+1})

令 x=x^{k} ,则

\bigtriangledown f(x^{k})\approx \bigtriangledown f(x^{k+1})+\bigtriangledown ^{2}f(x^{k+1})(x^{k}-x^{k+1})

记  g_{k}=\bigtriangledown f(x^{k}) ,那么 

\bigtriangleup g_{k}=g_{k+1}-g_{k}=\bigtriangledown f(x^{k+1})-\bigtriangledown f(x^{k})

\bigtriangleup x^{k}=x^{k+1}-x^{k}

并忽略高阶无穷小量,则有:

\bigtriangleup g_{k}\approx -\bigtriangledown ^{2}f(x^{k+1})(x^{k}-x^{k+1})=\bigtriangledown ^{2}f(x^{k+1})\bigtriangleup x^{k}

\Rightarrow\bigtriangledown ^{2}f(x^{k+1})^{-1} \bigtriangleup g_{k}=\bigtriangleup x^{k}

又因为 Hesse 矩阵 \bigtriangledown ^{2}f(x^{k+1}) 对称正定( H_{k}\approx (\bigtriangledown ^{2}f(x^{k}))^{-1} ),则:

\bigtriangleup x^{k}\approx \bigtriangledown ^{2}f(x^{k+1})^{-1} \bigtriangleup g_{k}

\displaystyle \bigtriangleup x^{k}= H_{k+1} \bigtriangleup g_{k}

(注: H_{k+1} 不唯一)

2.二次收敛性:

f(x)=\frac{1}{2}x^{T}Ax+b^{T}x+c,A 对称正定,把算法用于 n 元正定二次函数时,至多 n 次迭代达到极小点,故:

构造的搜索方向 p^{1},p^{2},......p^{n} 是一组 A 的共轭向量组且 H_{n+1}=A^{-1}

3.稳定性:

若忽略计算过程的舍入误差,在算法的每个迭代点 x^{k} 都能选择步长,使得下个迭代点 x^{k+1} 处的函数值下降,则称此算法是稳定的。

p^{k}=-H_{k}g_{k} 是下降方向,可以保证算法稳定

\Rightarrow H_{k} 对称正定

总结:

综合来看,依据三个原则:拟牛顿性质,二次收敛性,稳定性,H_{k} 需满足对应的三个条件。这个方法最早由Davidon1959年提出,1963FletcherPowell又加以改进,因此称之为DFP算法,DFP算法是目前使用最多的无约束最优化算法之一。

DFP算法:

Hk的构造方法:

对称正定满足拟牛顿性质 \displaystyle \bigtriangleup x^{k}= H_{k+1} \bigtriangleup g_{k} ,

设:

H_{k+1}=H_{k}+\alpha _{k}u^{k}(u^{k})^{T}+\beta _{k}v^{k}(v^{k})^{T}

其中,\alpha _{k},\beta _{k} 是常数,u^{k},v^{k} 是 n 维列向量

H_{k+1} 对称正定满足拟牛顿性质 \displaystyle \bigtriangleup x^{k}= H_{k+1} \bigtriangleup g_{k} ,

则有(全部右乘 \bigtriangleup g_{k} ):

\alpha _{k}u^{k}(u^{k})^{T}\bigtriangleup g_{k}+\beta _{k}v^{k}(v^{k})^{T}\bigtriangleup g_{k}=H_{k+1}\bigtriangleup g_{k}-H_{k}\bigtriangleup g_{k}=\bigtriangleup x^{k}-H_{k}\bigtriangleup g_{k}

由于 u^{k},v^{k} 并不唯一,故简单的取:

\left\{\begin{matrix} \alpha _{k}u^{k}(u^{k})^{T}\bigtriangleup g_{k}=\bigtriangleup x^{k}\\\beta _{k}v^{k}(v^{k})^{T}\bigtriangleup g_{k}=-H_{k}\bigtriangleup g_{k} \end{matrix}\right.

\Rightarrow \left\{\begin{matrix} \alpha _{k}(u^{k})^{T}\bigtriangleup g_{k}u^{k}=\bigtriangleup x^{k}\\\beta _{k}(v^{k})^{T}\bigtriangleup g_{k}v^{k}=-H_{k}\bigtriangleup g_{k} \end{matrix}\right.

\alpha _{k}(u^{k})^{T}\bigtriangleup g_{k}=1u^{k}=\bigtriangleup x^{k}\beta _{k}(v^{k})^{T}\bigtriangleup g_{k}=-1v^{k}=H_{k}\bigtriangleup g_{k}

\Rightarrow \left\{\begin{matrix} \alpha _{k}=\frac{1}{(\bigtriangleup x^{k})^{T}\bigtriangleup g_{k}}\\\beta _{k}=-\frac{1}{(\bigtriangleup g_{k})^{T}H_{k}\bigtriangleup g_{k}}\end{matrix}\right.

\Rightarrow H_{k+1}=H_{k}+\frac{\bigtriangleup x^{k}(\bigtriangleup x^{k})^{T}}{(\bigtriangleup x^{k})^{T}\bigtriangleup g_{k}}-\frac{H_{k}\bigtriangleup g_{k}(\bigtriangleup g_{k})^{T}H_{k}}{(\bigtriangleup g_{k})^{T}H_{k}\bigtriangleup g_{k}}

DFP算法定理:

  1. 若 g_{i}\neq 0 ,i=1,2,...,n,则DFP算法构造的矩阵 H_{i} (i=1,2,...,n)为对称正定矩阵。
  2. 设用 DFP 算法求解下列问题,minf(x)=\frac{1}{2}x^{T}Ax+b^{T}x+c,其中 A 为 n 阶对称正定矩阵,有 DFP 方法产生的搜索方向为 p^{1},p^{2},......p^{n},对应的尺度矩阵为 H_{1},H_{2},...,H_{n}则有:
  • (p^{i})^{T}g_{i}=01\leq i\leq j\leq n
  • H_{j}Ap^{i}=p^{i}1\leq i\leq j\leq n
  • (p^{i})^{T}Ap^{j}=01\leq i\leq j\leq n

DFP算法步骤:

步骤1:选定初始点 x^{1},初始矩阵 H_{1}=I_{n}\varepsilon >0

步骤2:如果 \left | \left | g_{1} \right | \right |\leq \varepsilon,算法停止,x^{*}=x^{1},否则转步骤3.

步骤3:取p^{1}=-H_{1}g_{1},k=1

步骤4:精确一维搜索找最佳步长 \lambda _{k} ,令 x^{k+1}=x^{k}+\lambda _{k}p^{k}.

步骤5:如果 \left | \left | g_{k+1} \right | \right |\leq \varepsilon,算法停止,x^{*}=x^{k+1},否则转步骤6.

步骤6:如果 k=n ,令x^{1}=x^{k+1},p^{1}=-g_{k+1},k=1,转步骤4;否则转步骤7.

步骤7:令 \bigtriangleup x^{k}=x^{k+1}-x^{k} ,\bigtriangleup g_{k}=g_{k+1}-g_{k},计算

H_{k+1}=H_{k}+\frac{\bigtriangleup x^{k}(\bigtriangleup x^{k})^{T}}{(\bigtriangleup x^{k})^{T}\bigtriangleup g_{k}}-\frac{H_{k}\bigtriangleup g_{k}(\bigtriangleup g_{k})^{T}H_{k}}{(\bigtriangleup g_{k})^{T}H_{k}\bigtriangleup g_{k}},和 p^{k+1}=-H_{k+1}g_{k+1},k=k+1,转步骤4.

例题分析:

同样的,由于前面的太过抽象,我们来用一道例题加深对该算法的理解

例题:用  DFP 算法求 min f(x)=2x_{1}^{2}+x_{2}^{2}-4x_{1}+2 ,已知初始点(2,1)^{T},迭代精度\varepsilon=0.001.

解:计算函数 f 的梯度函数:g_{({x})}=\binom{4x_{1}-4}{2x_{2}}

第一次迭代:

g_{1}=\binom{4}{2}H_{1}=\bigl(\begin{smallmatrix} 1 & 0\\0 &1 \end{smallmatrix}\bigr)p^{1}=-H_{1}g_{1}=-\bigl(\begin{smallmatrix} 1 &0 \\0 &1 \end{smallmatrix}\bigr)\binom{4}{2}=\binom{-4}{-2}

x^{2}=x^{1}+\lambda p^{1}=\binom{2-4\lambda }{1-2\lambda }

\phi (\lambda )=2(2-4\lambda )^{2}+(1-2\lambda )^{2}-4(2-4\lambda )+2

令 \phi '(\lambda )=0\Rightarrow \lambda_{1} =\frac{5}{18}

\therefore x^{2}=\binom{\frac{8}{9}}{\frac{4}{9}},g_{2}=g(x^{2})=\binom{-\frac{4}{9}}{\frac{8}{9}}

不满足精度,继续迭代。

第二次迭代:

\bigtriangleup x^{1}=x^{2}-x^{1}=\lambda _{1}p^{1}=\binom{-\frac{10}{9}}{-\frac{5}{9}}\bigtriangleup g_{1}=g_{2}-g_{1}=\binom{-\frac{40}{9}}{-\frac{10}{9}}

H_{2}=H_{1}+\frac{\bigtriangleup x^{1}(\bigtriangleup x^{1})^{T}}{(\bigtriangleup x^{1})^{T}\bigtriangleup g_{1}}-\frac{H_{1}\bigtriangleup g_{1}(\bigtriangleup g_{1})^{T}H_{1}}{(\bigtriangleup g_{1})^{T}H_{1}\bigtriangleup g_{1}}=\frac{1}{306}\bigl(\begin{smallmatrix} 86 &-38 \\-38 &305 \end{smallmatrix}\bigr)

令 p^{2}=-H_{2}g_{2}=\frac{12}{51}\binom{1}{-4}

x^{3}=x^{2}+\lambda p^{2}=\binom{\frac{8}{9}+\frac{12}{51}\lambda }{\frac{4}{9}-4\frac{12}{51}\lambda }

\phi (\lambda )=f(\frac{8}{9}+\frac{12}{51}\lambda,\frac{4}{9}-4\frac{12}{51}\lambda )

令 \phi '(\lambda )=0\Rightarrow \lambda_{2} =\frac{17}{36}

\therefore x^{3}=\binom{1}{0},g_{3}=g(x^{3})=\binom{0}{0}

算法终止,得到最优解 x^{*}=x^{3}=\binom{1}{0}

算法代码:

'''
变尺度算法   (DFP算法)
2023.11.5
'''
import numpy as np
from sympy import symbols, diff, solve

x1 = symbols("x1")
x2 = symbols("x2")
λ = symbols("λ")
f = 2 * x1 ** 2 + x2 ** 2 - 4 * x1 + 2


def DFP(x1_init, x2_init, ε):
    # 一阶求导
    grad_1 = diff(f, x1)
    grad_2 = diff(f, x2)

    x1_curr = x1_init
    x2_curr = x2_init

    first_grad_1_value = grad_1.subs({x1: x1_curr, x2: x2_curr})
    first_grad_2_value = grad_2.subs({x1: x1_curr, x2: x2_curr})
    g1 = np.array([first_grad_1_value, first_grad_2_value], dtype=float)
    norm_result_1 = np.linalg.norm(g1, ord=2, axis=0)
    if norm_result_1 <= ε:
        print("算法停止,该精度下的最优点是[%f,%f]" % (float(x1_curr), float(x2_curr)))
        print("最优解为{}".format(f.subs({x1: x1_curr, x2: x2_curr})))
    else:
        H1 = np.array([[1, 0], [0, 1]])
        p1 = -np.dot(H1, g1)
        x1_new = x1_curr + λ * p1[0]
        x2_new = x2_curr + λ * p1[1]
        f_new = f.subs({x1: x1_new, x2: x2_new})
        grad_3 = diff(f_new, λ)
        λ_value = solve(grad_3, λ)[0]

        x1_curr = x1_new.subs(λ, λ_value)
        x2_curr = x2_new.subs(λ, λ_value)

        print("第{}次迭代".format(1), "当前最优点为[%f,%f]" % (float(x1_curr), float(x2_curr)))

        second_grad_1_value = grad_1.subs({x1: x1_curr, x2: x2_curr})
        second_grad_2_value = grad_2.subs({x1: x1_curr, x2: x2_curr})

        g2 = np.array([second_grad_1_value, second_grad_2_value], dtype=float)
        norm_result_2 = np.linalg.norm(g2, ord=2, axis=0)
        if norm_result_2 <= ε:
            print("算法停止,该精度下的最优点是[%f,%f]" % (float(x1_curr), float(x2_curr)))
            print("最优解为{}".format(f.subs({x1: x1_curr, x2: x2_curr})))
        else:
            DeltaX = np.array([λ_value * p1[0], λ_value * p1[1]], dtype=float)
            DeltaG = g2 - g1

            H2 = H1 + np.dot(DeltaX, np.transpose(DeltaX)) / np.dot(np.transpose(DeltaG), DeltaX) - np.dot(( np.dot(H1 , DeltaG) , np.transpose(DeltaG)), H1) / np.dot(np.dot(np.transpose(DeltaG), H1), DeltaG)
            '''
            H2 = H1 + DeltaX * np.transpose(DeltaX) / (np.transpose(DeltaG) * DeltaX) - H1 * DeltaG * np.transpose(
                DeltaG) * H1 / (np.transpose(DeltaG) * H1 * DeltaG)
            H2 = H1 + np.outer(DeltaX, DeltaX) / (np.dot(np.transpose(DeltaG), DeltaX)) - np.dot(H1, np.dot(DeltaG, np.transpose(DeltaG))) * H1 / (np.dot(np.transpose(DeltaG),np.dot(H1, DeltaG)))
                '''
            p2 = -np.dot(H2, g2)
            x1_new = x1_curr + λ * p2[0]
            x2_new = x2_curr + λ * p2[1]
            f_new = f.subs({x1: x1_new, x2: x2_new})
            grad_3 = diff(f_new, λ)
            λ_value = solve(grad_3, λ)[0]

            x1_curr = x1_new.subs(λ, λ_value)
            x2_curr = x2_new.subs(λ, λ_value)
            print("第{}次迭代".format(2), "当前最优点为[%f,%f]" % (float(x1_curr), float(x2_curr)))
            print("最优解为{}".format(f.subs({x1: x1_curr, x2: x2_curr})))

            grad_1_value = grad_1.subs({x1: x1_curr, x2: x2_curr})
            grad_2_value = grad_2.subs({x1: x1_curr, x2: x2_curr})

            g3 = np.array([grad_1_value, grad_2_value], dtype=float)
            norm_result_3 = np.linalg.norm(g3, ord=2, axis=0)
            if norm_result_3 <= ε:
                print("算法停止,该精度下的最优解是[%f,%f]" % (float(x1_curr), float(x2_curr)))
                print("最优解为{}".format(f.subs({x1: x1_curr, x2: x2_curr})))
    return x1_curr, x2_curr


result = DFP(2, 1, 0.001)

该代码所运算的结果为:

第1次迭代 当前最优点为[0.888889,0.444444]
第2次迭代 当前最优点为[1.045204,0.034471]
最优解为0.00527512053991046

(注:这里由于内部存储数据类型的不同,导致存在一定的误差)

总结:

1)如果 H_{k} 能保持对称正定矩阵,则 DFP 算法具有二次终止性

2)对严格凸函数,DFP 算法具有全局收敛性

3)H_{k} 具有对称正定继承性

4)稳定性不够好

5)占用计算机内存单元多

为此BroydenFletcherGoldfarbShanno1970年提出的变尺度算法 BFGS , BFGS法比DFP法更稳定,超线性收敛。

​​​​​​​

(行文中若有纰漏,希望大家指正)

  • 1
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
DASCAN(Discrete All Scale Canopy Analysis)算法是一种用于分析森林结构的方,它可以计算各种尺度的森林结构指标,如树冠覆盖度、高度分布、树冠形态等。以下是一个简单的Python实现: ```python import numpy as np import matplotlib.pyplot as plt def dascan(heights, resolution): """DASCAN algorithm implementation.""" heights = np.array(heights) res = resolution scales = np.arange(res, heights.max(), res) canopy_cover = [] height_std = [] height_cv = [] for scale in scales: # Calculate the number of cells occupied by the tree canopy occupancy = np.sum(heights >= scale) canopy_cover.append(occupancy / heights.size) # Calculate the standard deviation of the heights within the canopy heights_within_canopy = heights[heights >= scale] height_std.append(np.std(heights_within_canopy)) # Calculate the coefficient of variation of the heights within the canopy height_cv.append(np.std(heights_within_canopy) / np.mean(heights_within_canopy)) return scales, canopy_cover, height_std, height_cv # Example usage heights = [10, 8, 6, 4, 2, 1, 3, 5, 7, 9, 11, 12, 13, 14, 15] res = 1 scales, canopy_cover, height_std, height_cv = dascan(heights, res) # Plot the results fig, ax1 = plt.subplots() ax1.plot(scales, canopy_cover, 'b-', label='Canopy cover') ax1.set_xlabel('Scale (m)') ax1.set_ylabel('Canopy cover') ax2 = ax1.twinx() ax2.plot(scales, height_std, 'r-', label='Canopy height std') ax2.plot(scales, height_cv, 'r--', label='Canopy height cv') ax2.set_ylabel('Canopy height') fig.legend(loc='upper right') plt.show() ``` 在这个例子中,我们使用高度列表`heights`和分辨率`res`作为输入,然后计算树冠覆盖度、高度标准差和高度异系数等指标,并用Matplotlib绘制了结果。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

背对人潮

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值