机器学习面试必知:拟牛顿法(DFP和BFGS)

牛顿法的特点就是收敛快。但是运用牛顿法需要计算二阶偏导数,而且目标函数的Hesse矩阵可能非正定。为了克服牛顿法的缺点,人们提出了拟牛顿法,它的基本思想是用不包含二阶导数的矩阵近似牛顿法中的Hesse矩阵的逆矩阵。

牛顿法的迭代公式
x ( k + 1 ) = x ( k ) + λ d ( k ) x^{(k+1)}=x^{(k)}+\lambda d^{(k)} x(k+1)=x(k)+λd(k) d ( k ) = − ▽ 2 f ( x ( k ) ) − 1 ▽ f ( x ( k ) ) d^{(k)}=-\bigtriangledown ^{2}f(x^{(k)})^{-1}\bigtriangledown f(x^{(k)}) d(k)=2f(x(k))1f(x(k))

为了构造 ▽ 2 f ( x ( k ) ) − 1 \bigtriangledown ^{2}f(x^{(k)})^{-1} 2f(x(k))1的近似矩阵 H k H_{k} Hk,我们先来分析 ▽ 2 f ( x ( k ) ) − 1 \bigtriangledown ^{2}f(x^{(k)})^{-1} 2f(x(k))1与一阶导数的关系。将 f ( x ) f(x) f(x)在点 x ( k + 1 ) x^{(k+1)} x(k+1)展开成泰勒级数 f ( x ) = f ( x ( k + 1 ) ) + ▽ f ( x ( k + 1 ) ) T ( x − x ( k + 1 ) ) f(x)=f(x^{(k+1)})+\bigtriangledown f(x^{(k+1)})^{T}(x-x^{(k+1)}) f(x)=f(x(k+1))+f(x(k+1))T(xx(k+1)) + 1 2 ( x − x ( k + 1 ) ) T ▽ 2 f ( x ( k + 1 ) ) ( x − x ( k + 1 ) ) +\frac{1}{2}(x-x^{(k+1)})^{T} \bigtriangledown ^{2}f(x^{(k+1)})(x-x^{(k+1)}) +21(xx(k+1))T2f(x(k+1))(xx(k+1))由此可知,在 x ( k + 1 ) x^{(k+1)} x(k+1)附近有 ▽ f ( x ) ≈ ▽ f ( x ( k + 1 ) ) + ▽ 2 f ( x ( k + 1 ) ) ( x − x ( k + 1 ) ) \bigtriangledown f(x) \approx \bigtriangledown f(x^{(k+1)})+\bigtriangledown ^{2}f(x^{(k+1)})(x-x^{(k+1)}) f(x)f(x(k+1))+2f(x(k+1))(xx(k+1)) x = x ( k ) x=x^{(k)} x=x(k) ▽ f ( x ( k ) ) ≈ ▽ f ( x ( k + 1 ) ) + ▽ 2 f ( x ( k + 1 ) ) ( x ( k ) − x ( k + 1 ) ) \bigtriangledown f(x^{(k)}) \approx \bigtriangledown f(x^{(k+1)})+\bigtriangledown ^{2}f(x^{(k+1)})(x^{(k)}-x^{(k+1)}) f(x(k))f(x(k+1))+2f(x(k+1))(x(k)x(k+1)) p ( k ) = x ( k + 1 ) − x ( k ) p^{(k)}=x^{(k+1)}-x^{(k)} p(k)=x(k+1)x(k) q ( k ) = ▽ f ( x ( k + 1 ) ) − ▽ f ( x ( k ) ) q^{(k)}=\bigtriangledown f(x^{(k+1)})-\bigtriangledown f(x^{(k)}) q(k)=f(x(k+1))f(x(k)) q ( k ) ≈ ▽ 2 f ( x ( k + 1 ) ) p ( k ) q^{(k)}\approx \bigtriangledown ^{2}f(x^{(k+1)})p^{(k)} q(k)2f(x(k+1))p(k)如果Hesse矩阵 ▽ 2 f ( x ( k + 1 ) ) \bigtriangledown ^{2}f(x^{(k+1)}) 2f(x(k+1))可逆则 p ( k ) ≈ ▽ 2 f ( x ( k + 1 ) ) − 1 q ( k ) p^{(k)}\approx \bigtriangledown ^{2}f(x^{(k+1)})^{-1}q^{(k)} p(k)2f(x(k+1))1q(k)这样计算出p和q后根据上式就能估计Hesse矩阵的逆。因此我们可以用不包含二阶导数的矩阵 H k + 1 H_{k+1} Hk+1取代Hesse矩阵的逆 p ( k ) = H k + 1 q ( k ) p^{(k)}=H_{k+1}q^{(k)} p(k)=Hk+1q(k)这就是拟牛顿法,接下来所要做的就是确定这个矩阵 H k + 1 H_{k+1} Hk+1

DFB算法又被称为变尺度法

H k + 1 = H k + p ( k ) p ( k ) T p ( k ) T q ( k ) − H k q ( k ) q ( k ) T H k q ( k ) T H k q ( k ) H_{k+1}=H_{k}+\frac{p^{(k)}p^{(k)T}}{p^{(k)T}q^{(k)}}-\frac{H_{k}q^{(k)}q^{(k)T}H_{k}}{q^{(k)T}H_{k}q^{(k)}} Hk+1=Hk+p(k)Tq(k)p(k)p(k)Tq(k)THkq(k)Hkq(k)q(k)THk满足 p ( k ) = H k + 1 q ( k ) p^{(k)}=H_{k+1}q^{(k)} p(k)=Hk+1q(k)
DFB方法计算如下:

  1. 初始化 x ( 1 ) x^{(1)} x(1),允许误差 ϵ > 0 \epsilon >0 ϵ>0
  2. H 1 = I n H_{1}=I_{n} H1=In(单位矩阵), k = 1 k=1 k=1,计算出在 x ( 1 ) x^{(1)} x(1)处的梯度 g 1 = ▽ f ( x ( 1 ) ) g_{1}=\bigtriangledown f(x^{(1)}) g1=f(x(1))
  3. d ( k ) = − H k g k d^{(k)}=-H_{k}g_{k} d(k)=Hkgk
  4. x ( k ) x^{(k)} x(k)出发,沿着 d ( k ) d^{(k)} d(k)搜索,求步长 λ k \lambda _{k} λk,使其满足 f ( x ( k ) + λ k d ( k ) ) = m i n λ ≥ 0 f ( x ( k ) + λ d ( k ) ) f(x^{(k)}+\lambda _{k}d^{(k)})=min_{\lambda \geq0}f(x^{(k)}+\lambda d^{(k)}) f(x(k)+λkd(k))=minλ0f(x(k)+λd(k))更新 x ( k + 1 ) = x ( k ) + λ k d ( k ) x^{(k+1)}=x^{(k)}+\lambda _{k}d^{(k)} x(k+1)=x(k)+λkd(k)
  5. 检验是否满足收敛准则,若 ∣ ∣ ▽ f ( x ( k + 1 ) ) ∣ ∣ ≤ ϵ ||\bigtriangledown f(x^{(k+1)})|| \leq \epsilon f(x(k+1))ϵ则停止迭代,得到点 x ^ = x ( k + 1 ) \hat{x}=x^{(k+1)} x^=x(k+1);否则进行步骤6
  6. k = n k=n k=n,则令 x ( 1 ) = x ( k + 1 ) x^{(1)}=x^{(k+1)} x(1)=x(k+1),返回步骤2;否则进行步骤7
  7. g k + 1 = ▽ f ( x ( k + 1 ) ) g_{k+1}=\bigtriangledown f(x^{(k+1)}) gk+1=f(x(k+1)) p ( k ) = x ( k + 1 ) − x ( k ) p^{(k)}=x^{(k+1)}-x^{(k)} p(k)=x(k+1)x(k) q ( k ) = g k + 1 − g k q^{(k)}=g_{k+1}-g_{k} q(k)=gk+1gk计算 H k + 1 = H k + p ( k ) p ( k ) T p ( k ) T q ( k ) − H k q ( k ) q ( k ) T H k q ( k ) T H k q ( k ) H_{k+1}=H_{k}+\frac{p^{(k)}p^{(k)T}}{p^{(k)T}q^{(k)}}-\frac{H_{k}q^{(k)}q^{(k)T}H_{k}}{q^{(k)T}H_{k}q^{(k)}} Hk+1=Hk+p(k)Tq(k)p(k)p(k)Tq(k)THkq(k)Hkq(k)q(k)THkk=k+1,返回步骤3

BFGS
H k + 1 B F G S = H k + ( 1 + q ( k ) T H k q ( k ) p ( k ) T q ( k ) ) p ( k ) p ( k ) T p ( k ) T q ( k ) − p ( k ) q ( k ) T H k + H k q ( k ) p ( k ) T p ( k ) T q ( k ) H_{k+1}^{BFGS}=H_{k}+(1+\frac{q^{(k)T}H_{k}q^{(k)}}{p^{(k)T}q^{(k)}})\frac{p^{(k)}p^{(k)T}}{p^{(k)T}q^{(k)}}-\frac{p^{(k)}q^{(k)T}H_{k}+H_{k}q^{(k)}p^{(k)T}}{p^{(k)T}q^{(k)}} Hk+1BFGS=Hk+(1+p(k)Tq(k)q(k)THkq(k))p(k)Tq(k)p(k)p(k)Tp(k)Tq(k)p(k)q(k)THk+Hkq(k)p(k)T

  • 2
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值