拟牛顿法(Quasi-Newton Methods)是求解非线性优化问题最有效的方法之一,于20世纪50年代由美国Argonne国家实验室的物理学家W. C. Davidon所提出来。Davidon设计的这种
算法在当时看来是非
线性优化领域最具创造性的发明之一。不久R. Fletcher和M. J. D. Powell证实了这种新的算法远比其他方法快速和可靠,使得非线性优化这门学科在一夜之间突飞猛进。在之后的20年里,拟牛顿方法得到了蓬勃发展,出现了大量的变形公式以及数以百计的相关论文。
拟牛顿法和
最速下降法(Steepest Descent Methods)一样只要求每一步迭代时知道目标函数的梯度。通过测量梯度的变化,构造一个目标函数的模型使之足以产生
超线性收敛性。这类方法大大优于最速下降法,尤其对于困难的问题。另外,因为拟牛顿法不需要
二阶导数的信息,所以有时比牛顿法(Newton's Method)更为有效。如今,优化软件中包含了大量的拟牛顿
算法用来解决无约束,约束,和大规模的优化问题。
拟牛顿法的基本思想如下。首先构造
目标函数在当前迭代
的二次模型:
![](https://i-blog.csdnimg.cn/blog_migrate/9a3f5de7422dbd0441fd17d6661aa2a0.png)
![](https://i-blog.csdnimg.cn/blog_migrate/9a6d035b540a32fcd640739bcd4dd223.png)
![](https://i-blog.csdnimg.cn/blog_migrate/fc4f1fab8a3bbce91ce6fb0a8a52d4f8.png)
![](https://i-blog.csdnimg.cn/blog_migrate/c03915e42f066ad57ecd4ffdfb8bae8d.png)
![](https://i-blog.csdnimg.cn/blog_migrate/5cb3212f24a08ed328ba9631f89263a7.png)
![](https://i-blog.csdnimg.cn/blog_migrate/b9ec138942eb62f67659202601308068.png)
![](https://i-blog.csdnimg.cn/blog_migrate/c03915e42f066ad57ecd4ffdfb8bae8d.png)
![](https://i-blog.csdnimg.cn/blog_migrate/c03915e42f066ad57ecd4ffdfb8bae8d.png)
![](https://i-blog.csdnimg.cn/blog_migrate/c4055dd11783ef6d8167c91d1832a21f.png)
![](https://i-blog.csdnimg.cn/blog_migrate/e77b6850deca867d015871852d45a45a.png)
我们尽可能地利用上一步的信息来选取
。具体地,我们要求
,从而得到
![](https://i-blog.csdnimg.cn/blog_migrate/c03915e42f066ad57ecd4ffdfb8bae8d.png)
![](https://i-blog.csdnimg.cn/blog_migrate/4e343d2952de35c0c6984fee1b6e7232.png)
![](https://i-blog.csdnimg.cn/blog_migrate/362fb71dad6cfb84a71dfe1ab9cfea7a.png)
DFP方法
记
,
,
,DFP公式为
![](https://i-blog.csdnimg.cn/blog_migrate/1c45a738cc87534fddd78e569e9214f4.png)
![](https://i-blog.csdnimg.cn/blog_migrate/d738baca1ffc5a1743d49e62edfb5aa4.png)
![](https://i-blog.csdnimg.cn/blog_migrate/6c43dc15d3ab5a747ea23bb1d6124130.png)
![](https://i-blog.csdnimg.cn/blog_migrate/d785bcd191b27fa388978a1534160f68.png)
其实现如下所示:
#!/usr/bin/python
# -*- coding:utf8 -*-
import random
import numpy as np
import math
#f(x,y) = (x-2)^2+(y-1)^2 + 1
def solution(grad_func) :
rate = 0.3
#Gk必须保证为正定矩阵
#Gk = np.eye(2)
Gk = np.diag([random.uniform(0.0001, 100), random.uniform(0.0001, 100)])
x = random.uniform(-10000,10000)
y = random.uniform(-10000,10000)
point = np.array([[x, y]]).transpose()
# numpy.array中是初始化是按行的
grad_last = grad_func(point[0][0], point[1][0]).transpose()
if reduce(lambda a,b: math.sqrt(a*a+b*b), [grad_last[i][0] for i in xrange(grad_last.shape[0])]) < 0.000001 :
return get_point_coordinate(point)
for index in xrange(0, 10000) :
pk = -1 * Gk.dot(grad_last)
# 寻找最优的rate
#rate = grad_last.transpose().dot(pk)[0][0] / (pk.transpose().dot(pk)[0][0]) * (-0.5);print "rate", rate
point = point + rate * pk
grad = grad_func(point[0][0], point[1][0]).transpose()
print "grad", grad.transpose();#print "Gk", Gk
if reduce(lambda a,b: math.sqrt(a*a+b*b), [grad[i][0] for i in xrange(grad.shape[0])]) < 0.000001 : break
delta_k = rate * pk
y_k = (grad - grad_last)
Pk = delta_k.dot(delta_k.transpose()) / (delta_k.transpose().dot(y_k))
Qk= Gk.dot(y_k).dot(y_k.transpose()).dot(Gk) / (y_k.transpose().dot(Gk).dot(y_k)) * (-1.)
Gk += Pk + Qk
grad_last = grad
print "times of iterate : %s" % index
return get_point_coordinate(point)
def get_point_coordinate(point):
return point[0][0], point[1][0]
if __name__ == "__main__" :
x, y = solution(lambda a,b: np.array([[2*(a-2), 2*(b-1)]]))
print "minimum point of f(x,y) = (x-2)^2+(y-1)^2 + 1 : (%s,%s)" % (x, y)
执行结果为:
grad [[-165661.89194611 -109405.68739563]]
grad [[-100881.92784944 -98677.83692405]]
grad [[-86031.40619151 24003.06422296]]
grad [[-58462.97031009 16589.21632892]]
grad [[-40924.16152279 11612.14537593]]
grad [[-28646.91295682 8128.50214772]]
grad [[-20052.83906991 5689.95150292]]
grad [[-14036.98734894 3982.96605204]]
grad [[-9825.89114426 2788.07623643]]
grad [[-6878.12380098 1951.6533655 ]]
grad [[-4814.68666069 1366.15735585]]
grad [[-3370.28066248 956.3101491 ]]
grad [[-2359.19646374 669.41710437]]
grad [[-1651.43752462 468.59197306]]
grad [[-1156.00626723 328.01438114]]
grad [[-809.20438706 229.6100668 ]]
grad [[-566.44307094 160.72704676]]
grad [[-396.51014966 112.50893273]]
grad [[-277.55710476 78.75625291]]
grad [[-194.28997333 55.12937704]]
grad [[-136.00298133 38.59056393]]
grad [[-95.20208693 27.01339475]]
grad [[-66.64146085 18.90937632]]
grad [[-46.6490226 13.23656343]]
grad [[-32.65431582 9.2655944 ]]
grad [[-22.85802107 6.48591608]]
grad [[-16.00061475 4.54014126]]
grad [[-11.20043033 3.17809888]]
grad [[-7.84030123 2.22466922]]
grad [[-5.48821086 1.55726845]]
grad [[-3.8417476 1.09008792]]
grad [[-2.68922332 0.76306154]]
grad [[-1.88245632 0.53414308]]
grad [[-1.31771943 0.37390015]]
grad [[-0.9224036 0.26173011]]
grad [[-0.64568252 0.18321108]]
grad [[-0.45197776 0.12824775]]
grad [[-0.31638443 0.08977343]]
grad [[-0.2214691 0.0628414]]
grad [[-0.15502837 0.04398898]]
grad [[-0.10851986 0.03079229]]
grad [[-0.0759639 0.0215546]]
grad [[-0.05317473 0.01508822]]
grad [[-0.03722231 0.01056175]]
grad [[-0.02605562 0.00739323]]
grad [[-0.01823893 0.00517526]]
grad [[-0.01276725 0.00362268]]
grad [[-0.00893708 0.00253588]]
grad [[-0.00625595 0.00177511]]
grad [[-0.00437917 0.00124258]]
grad [[-0.00306542 0.00086981]]
grad [[-0.00214579 0.00060886]]
grad [[-0.00150205 0.0004262 ]]
grad [[-0.00105144 0.00029834]]
grad [[-0.00073601 0.00020884]]
grad [[-0.0005152 0.00014619]]
grad [[-0.00036064 0.00010233]]
grad [[ -2.52450311e-04 7.16322521e-05]]
grad [[ -1.76715217e-04 5.01425765e-05]]
grad [[ -1.23700652e-04 3.50998035e-05]]
grad [[ -8.65904565e-05 2.45698625e-05]]
grad [[ -6.06133196e-05 1.71989037e-05]]
grad [[ -4.24293237e-05 1.20392326e-05]]
grad [[ -2.97005266e-05 8.42746283e-06]]
grad [[ -2.07903686e-05 5.89922398e-06]]
grad [[ -1.45532580e-05 4.12945679e-06]]
grad [[ -1.01872806e-05 2.89061975e-06]]
grad [[ -7.13109643e-06 2.02343382e-06]]
grad [[ -4.99176750e-06 1.41640368e-06]]
grad [[ -3.49423725e-06 9.91482574e-07]]
grad [[ -2.44596608e-06 6.94037802e-07]]
grad [[ -1.71217625e-06 4.85826461e-07]]
grad [[ -1.19852338e-06 3.40078523e-07]]
grad [[ -8.38966364e-07 2.38054966e-07]]
times of iterate : 73
minimum point of f(x,y) = (x-2)^2+(y-1)^2 + 1 : (1.99999958052,1.00000011903)
可以看到最终的解与实际解(2, 1)非常接近!