Levenberg–Marquardt算法

  本次是对Levenberg–Marquardt的学习总结,是为之后看懂sparse bundle ajdustment打基础。这篇笔记包含如下内容:


  • 回顾高斯牛顿算法,引入LM算法
  • 惩罚因子的计算(迭代步子的计算)
  • 完整的算法流程及代码样例

1.      回顾高斯牛顿,引入LM算法

 根据之前的博文: Gauss-Newton算法学习
  假设我们研究如下形式的非线性最小二乘问题:

  r(x)为某个问题的残差residual,是关于x的非线性函数。我们知道高斯牛顿法的迭代公式:


  Levenberg–Marquardt算法是对高斯牛顿的改进,在迭代步长上略有不同:


  

  最速下降法对初始点没有特别要求具有整体收敛性,但是相邻两次的搜索方向是相互垂直的,所以收敛并不一定快。总而言之就是:当目标函数的等值线接近于圆(球)时,下降较快;等值线类似于扁长的椭球时,一开始快,后来很慢。This is good if the current iterate is far from the solution.


  c.   如果μ的值很小,那么hlm成了高斯牛顿法的方向(适合迭代的最后阶段,非常接近最优解,避免了最速下降的震荡)


 由此可见,惩罚因子既下降的方向又影响下降步子的大小。

 

2.    惩罚因子的计算[迭代步长计算]

  我们的目标是求f的最小值,我们希望迭代开始时,惩罚因子μ被设定为较小的值,若


    信赖域方法与线搜索技术一样,也是优化算法中的一种保证全局收敛的重要技术. 它们的功能都是在优化算法中求出每次迭代的位移, 从而确定新的迭代点.所不同的是: 线搜索技术是先产生位移方向(亦称为搜索方向), 然后确定位移的长度(亦称为搜索步长)。而信赖域技术则是直接确定位移, 产生新的迭代点。

    信赖域方法的基本思想是:首先给定一个所谓的“信赖域半径”作为位移长度的上界,并以当前迭代点为中心以此“上界”为半径确定一个称之为“信赖域”的闭球区域。然后,通过求解这个区域内的“信赖域子问题”(目标函数的二次近似模型) 的最优点来确定“候选位移”。若候选位移能使目标函数值有充分的下降量, 则接受该候选位移作为新的位移,并保持或扩大信赖域半径, 继续新的迭代。否则, 说明二次模型与目标函数的近似度不够理想,需要缩小信赖域半径,再通过求解新的信赖域内的子问题得到新的候选位移。如此重复下去,直到满足迭代终止条件。

  现在用信赖域方法解决之前的无约束线性规划:




  如果q很大,说明L(h)非常接近F(x+h),我们可以减少惩罚因子μ,以便于下次迭代此时算法更接近高斯牛顿算法。如果q很小或者是负的,说明是poor approximation,我们需要增大惩罚因子,减少步长,此时算法更接近最速下降法。具体来说,

a.当q大于0时,此次迭代有效:




b.当q小于等于0时,此次迭代无效:




3.完整的算法流程及代码距离

LM的算法流程和高斯牛顿几乎一样,只是迭代步长求法利用信赖域法

(1)给定初始点x(0),允许误差ε>0,置k=0

(2)当f(xk+1)-f(xk)小于阈值ε时,算法退出,否则(3)

(3)xk+1=xk+hlm,代入f,返回(1)



两个例子还是沿用之前的。

例子1,根据美国1815年至1885年数据,估计人口模型中的参数A和B。如下表所示,已知年份和人口总量,及人口模型方程,求方程中的参数。



   
   
  1. // A simple demo of Gauss-Newton algorithm on a user defined function
  2. #include <cstdio>
  3. #include <vector>
  4. #include <opencv2/core/core.hpp>
  5. using namespace std;
  6. using namespace cv;
  7. const double DERIV_STEP = 1e-5;
  8. const int MAX_ITER = 100;
  9. void LM(double(*Func)(const Mat &input, const Mat ¶ms), // function pointer
  10. const Mat &inputs, const Mat &outputs, Mat ¶ms);
  11. double Deriv(double(*Func)(const Mat &input, const Mat ¶ms), // function pointer
  12. const Mat &input, const Mat ¶ms, int n);
  13. // The user defines their function here
  14. double Func(const Mat &input, const Mat ¶ms);
  15. int main()
  16. {
  17. // For this demo we're going to try and fit to the function
  18. // F = A*exp(t*B)
  19. // There are 2 parameters: A B
  20. int num_params = 2;
  21. // Generate random data using these parameters
  22. int total_data = 8;
  23. Mat inputs(total_data, 1, CV_64F);
  24. Mat outputs(total_data, 1, CV_64F);
  25. //load observation data
  26. for( int i= 0; i < total_data; i++) {
  27. inputs.at< double>(i, 0) = i+ 1; //load year
  28. }
  29. //load America population
  30. outputs.at< double>( 0, 0)= 8.3;
  31. outputs.at< double>( 1, 0)= 11.0;
  32. outputs.at< double>( 2, 0)= 14.7;
  33. outputs.at< double>( 3, 0)= 19.7;
  34. outputs.at< double>( 4, 0)= 26.7;
  35. outputs.at< double>( 5, 0)= 35.2;
  36. outputs.at< double>( 6, 0)= 44.4;
  37. outputs.at< double>( 7, 0)= 55.9;
  38. // Guess the parameters, it should be close to the true value, else it can fail for very sensitive functions!
  39. Mat params(num_params, 1, CV_64F);
  40. //init guess
  41. params.at< double>( 0, 0) = 6;
  42. params.at< double>( 1, 0) = 0.3;
  43. LM(Func, inputs, outputs, params);
  44. printf( "Parameters from GaussNewton: %lf %lf\n", params.at< double>( 0, 0), params.at< double>( 1, 0));
  45. return 0;
  46. }
  47. double Func(const Mat &input, const Mat ¶ms)
  48. {
  49. // Assumes input is a single row matrix
  50. // Assumes params is a column matrix
  51. double A = params.at< double>( 0, 0);
  52. double B = params.at< double>( 1, 0);
  53. double x = input.at< double>( 0, 0);
  54. return A* exp(x*B);
  55. }
  56. //calc the n-th params' partial derivation , the params are our final target
  57. double Deriv(double(*Func)(const Mat &input, const Mat ¶ms), const Mat &input, const Mat ¶ms, int n)
  58. {
  59. // Assumes input is a single row matrix
  60. // Returns the derivative of the nth parameter
  61. Mat params1 = params.clone();
  62. Mat params2 = params.clone();
  63. // Use central difference to get derivative
  64. params1.at< double>(n, 0) -= DERIV_STEP;
  65. params2.at< double>(n, 0) += DERIV_STEP;
  66. double p1 = Func(input, params1);
  67. double p2 = Func(input, params2);
  68. double d = (p2 - p1) / ( 2*DERIV_STEP);
  69. return d;
  70. }
  71. void LM(double(*Func)(const Mat &input, const Mat ¶ms),
  72. const Mat &inputs, const Mat &outputs, Mat ¶ms)
  73. {
  74. int m = inputs.rows;
  75. int n = inputs.cols;
  76. int num_params = params.rows;
  77. Mat r(m, 1, CV_64F); // residual matrix
  78. Mat r_tmp(m, 1, CV_64F);
  79. Mat Jf(m, num_params, CV_64F); // Jacobian of Func()
  80. Mat input(1, n, CV_64F); // single row input
  81. Mat params_tmp = params.clone();
  82. double last_mse = 0;
  83. float u = 1, v = 2;
  84. Mat I = Mat::ones(num_params, num_params, CV_64F); //construct identity matrix
  85. int i = 0;
  86. for(i= 0; i < MAX_ITER; i++) {
  87. double mse = 0;
  88. double mse_temp = 0;
  89. for( int j= 0; j < m; j++) {
  90. for( int k= 0; k < n; k++) { //copy Independent variable vector, the year
  91. input.at< double>( 0,k) = inputs.at< double>(j,k);
  92. }
  93. r.at< double>(j, 0) = outputs.at< double>(j, 0) - Func(input, params); //diff between previous estimate and observation population
  94. mse += r.at< double>(j, 0)*r.at< double>(j, 0);
  95. for( int k= 0; k < num_params; k++) {
  96. Jf.at< double>(j,k) = Deriv(Func, input, params, k); //construct jacobian matrix
  97. }
  98. }
  99. mse /= m;
  100. params_tmp = params.clone();
  101. Mat hlm = (Jf.t()*Jf + u*I).inv()*Jf.t()*r;
  102. params_tmp += hlm;
  103. for( int j= 0; j < m; j++) {
  104. r_tmp.at< double>(j, 0) = outputs.at< double>(j, 0) - Func(input, params_tmp); //diff between current estimate and observation population
  105. mse_temp += r_tmp.at< double>(j, 0)*r_tmp.at< double>(j, 0);
  106. }
  107. mse_temp /= m;
  108. Mat q(1,1,CV_64F);
  109. q = (mse - mse_temp)/( 0.5*hlm.t()*(u*hlm-Jf.t()*r));
  110. double q_value = q.at< double>( 0, 0);
  111. if(q_value> 0)
  112. {
  113. double s = 1.0/ 3.0;
  114. v = 2;
  115. mse = mse_temp;
  116. params = params_tmp;
  117. double temp = 1 - pow( 2*q_value -1, 3);
  118. if(s>temp)
  119. {
  120. u = u * s;
  121. } else
  122. {
  123. u = u * temp;
  124. }
  125. } else
  126. {
  127. u = u*v;
  128. v = 2*v;
  129. params = params_tmp;
  130. }
  131. // The difference in mse is very small, so quit
  132. if( fabs(mse - last_mse) < 1e-8) {
  133. break;
  134. }
  135. //printf("%d: mse=%f\n", i, mse);
  136. printf( "%d %lf\n", i, mse);
  137. last_mse = mse;
  138. }
  139. }


A=7.0,B=0.26   (初始值,A=6,B=0.3) ,100次迭代到第7次收敛了。和之前差不多,但是LM对于初始的选择是非常敏感的,如果A=6,B=6,则拟合失败!


我调用了matlab的接口跑LM,结果也是一样错误的,图片上可以看到拟合失败
clc;
clear;
a0=[6,6];
options=optimset('Algorithm','Levenberg-Marquardt','Display','iter');
data_1=[1 2 3 4 5 6 7 8];
obs_1=[8.3 11.0 14.7 19.7 26.7 35.2 44.4 55.9];
a=lsqnonlin(@myfun,a0,[],[],options,data_1,obs_1);
plot(data_1,obs_1,'o');
hold on
plot(data_1,a(1)*exp(a(2)*data_1),'b');
plot(data_1,7*exp(a(2)*data_1),'b');
%hold off
a(1)
a(2)

function E = myfun(a, x,y)
%这是一个测试文件用于测试 lsqnonlin
% Detailed explanation goes here
x=x(?;

y=y(?;
Y=a(1)exp(a(2)x);
E=y-Y;
end
最后一个点拟合失败的,所以函数不对的


   因此虽然莱文博格-马夸特迭代法能够自适应的在高斯牛顿和最速下降法之间调整,既可保证在收敛较慢时迭代过程总是下降的,又可保证迭代过程在解的邻域内迅速收敛。但是,LM对于初始点选择还是比较敏感的!


例子2:我想要拟合如下模型,


  由于缺乏观测量,就自导自演,假设4个参数已知A=5,B=1,C=10,D=2,构造100个随机数作为x的观测值,计算相应的函数观测值。然后,利用这些观测值,反推4个参数。


  
  
  1. // A simple demo of Gauss-Newton algorithm on a user defined function
  2. #include <cstdio>
  3. #include <vector>
  4. #include <opencv2/core/core.hpp>
  5. using namespace std;
  6. using namespace cv;
  7. const double DERIV_STEP = 1e-5;
  8. const int MAX_ITER = 100;
  9. void LM(double(Func)(const Mat &input, const Mat ¶ms), // function pointer
  10. const Mat &inputs, const Mat &outputs, Mat ¶ms);
  11. double Deriv(double(Func)(const Mat &input, const Mat ¶ms), // function pointer
  12. const Mat &input, const Mat ¶ms, int n);
  13. // The user defines their function here
  14. double Func(const Mat &input, const Mat ¶ms);
  15. int main()
  16. {
  17. // For this demo we’re going to try and fit to the function
  18. // F = Asin(Bx) + Ccos(Dx)
  19. // There are 4 parameters: A, B, C, D
  20. int num_params = 4;
  21. // Generate random data using these parameters
  22. int total_data = 100;
  23. double A = 5;
  24. double B = 1;
  25. double C = 10;
  26. double D = 2;
  27. Mat inputs(total_data, 1, CV_64F);
  28. Mat outputs(total_data, 1, CV_64F);
  29. for( int i= 0; i < total_data; i++) {
  30. double x = -10.0 + 20.0 rand() / ( 1.0 + RAND_MAX); // random between [-10 and 10]
  31. double y = A sin(B x) + C cos(D x);
  32. // Add some noise
  33. // y += -1.0 + 2.0rand() / (1.0 + RAND_MAX);
  34. inputs.at< double>(i, 0) = x;
  35. outputs.at< double>(i, 0) = y;
  36. }
  37. // Guess the parameters, it should be close to the true value, else it can fail for very sensitive functions!
  38. Mat params(num_params, 1, CV_64F);
  39. params.at< double>( 0, 0) = 1;
  40. params.at< double>( 1, 0) = 1;
  41. params.at< double>( 2, 0) = 8; // changing to 1 will cause it not to find the solution, too far away
  42. params.at< double>( 3, 0) = 1;
  43. LM(Func, inputs, outputs, params);
  44. printf( “True parameters: %f %f %f %f\n”, A, B, C, D);
  45. printf( “Parameters from GaussNewton: %f %f %f %f\n”, params.at< double>( 0, 0), params.at< double>( 1, 0),
  46. params.at< double>( 2, 0), params.at< double>( 3, 0));
  47. return 0;
  48. }
  49. double Func(const Mat &input, const Mat ¶ms)
  50. {
  51. // Assumes input is a single row matrix
  52. // Assumes params is a column matrix
  53. double A = params.at< double>( 0, 0);
  54. double B = params.at< double>( 1, 0);
  55. double C = params.at< double>( 2, 0);
  56. double D = params.at< double>( 3, 0);
  57. double x = input.at< double>( 0, 0);
  58. return A* sin(B x) + C cos(D*x);
  59. }
  60. //calc the n-th params’ partial derivation , the params are our final target
  61. double Deriv(double(*Func)(const Mat &input, const Mat ¶ms), const Mat &input, const Mat ¶ms, int n)
  62. {
  63. // Assumes input is a single row matrix
  64. // Returns the derivative of the nth parameter
  65. Mat params1 = params.clone();
  66. Mat params2 = params.clone();
  67. // Use central difference to get derivative
  68. params1.at< double>(n, 0) -= DERIV_STEP;
  69. params2.at< double>(n, 0) += DERIV_STEP;
  70. double p1 = Func(input, params1);
  71. double p2 = Func(input, params2);
  72. double d = (p2 - p1) / ( 2*DERIV_STEP);
  73. return d;
  74. }
  75. void LM(double(*Func)(const Mat &input, const Mat ¶ms),
  76. const Mat &inputs, const Mat &outputs, Mat ¶ms)
  77. {
  78. int m = inputs.rows;
  79. int n = inputs.cols;
  80. int num_params = params.rows;
  81. Mat r(m, 1, CV_64F); // residual matrix
  82. Mat r_tmp(m, 1, CV_64F);
  83. Mat Jf(m, num_params, CV_64F); // Jacobian of Func()
  84. Mat input(1, n, CV_64F); // single row input
  85. Mat params_tmp = params.clone();
  86. double last_mse = 0;
  87. float u = 1, v = 2;
  88. Mat I = Mat::ones(num_params, num_params, CV_64F); //construct identity matrix
  89. int i = 0;
  90. for(i= 0; i < MAX_ITER; i++) {
  91. double mse = 0;
  92. double mse_temp = 0;
  93. for( int j= 0; j < m; j++) {
  94. for( int k= 0; k < n; k++) { //copy Independent variable vector, the year
  95. input.at< double>( 0,k) = inputs.at< double>(j,k);
  96. }
  97. r.at< double>(j, 0) = outputs.at< double>(j, 0) - Func(input, params); //diff between estimate and observation population
  98. mse += r.at< double>(j, 0)*r.at< double>(j, 0);
  99. for( int k= 0; k < num_params; k++) {
  100. Jf.at< double>(j,k) = Deriv(Func, input, params, k); //construct jacobian matrix
  101. }
  102. }
  103. mse /= m;
  104. params_tmp = params.clone();
  105. Mat hlm = (Jf.t() Jf + uI).inv()*Jf.t()*r;
  106. params_tmp += hlm;
  107. for( int j= 0; j < m; j++) {
  108. r_tmp.at< double>(j, 0) = outputs.at< double>(j, 0) - Func(input, params_tmp); //diff between estimate and observation population
  109. mse_temp += r_tmp.at< double>(j, 0) r_tmp.at<double>(j,0);
  110. }
  111. mse_temp /= m;
  112. Mat q(1,1,CV_64F);
  113. q = (mse - mse_temp)/( 0.5 hlm.t()(uhlm-Jf.t()*r));
  114. double q_value = q.at< double>( 0, 0);
  115. if(q_value> 0)
  116. {
  117. double s = 1.0/ 3.0;
  118. v = 2;
  119. mse = mse_temp;
  120. params = params_tmp;
  121. double temp = 1 - pow( 2 q_value-1,3);
  122. if(s>temp)
  123. {
  124. u = u * s;
  125. } else
  126. {
  127. u = u * temp;
  128. }
  129. } else
  130. {
  131. u = uv;
  132. v = 2*v;
  133. params = params_tmp;
  134. }
  135. // The difference in mse is very small, so quit
  136. if( fabs(mse - last_mse) < 1e-8) {
  137. break;
  138. }
  139. //printf("%d: mse=%f\n", i, mse);
  140. printf( "%d %lf\n", i, mse);
  141. last_mse = mse;
  142. }
  143. }


  我们看到迭代了100次,结果几何和高斯牛顿算出来是一样的。我们绘制LM和高斯牛顿的残差函数收敛过程,发现 LM一直是总体下降的,没有太多反复。
高斯牛顿:

LM:



  • 0
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
Levenberg-Marquardt算法是非线性最小二乘问题的一种最优化方法,通常用于曲线拟合或参数估计问题。下面是该算法的一个简单实现示例: 假设我们有一组数据点(x,y),并且我们想要拟合一个函数f(x)来最小化均方误差。 首先,我们定义一个函数f(x,params)来表示所要拟合的函数,其中params代表待估参数。 下面是Levenberg-Marquardt算法的代码: ```python def levenberg_marquardt(x, y, param0, f, tol=1e-6, max_iter=1000): # x, y: 输入数据 # param0: 待估参数的初始值 # f: 所要拟合的函数 # tol: 收敛阈值 # max_iter: 最大迭代次数 # 返回:估计的参数值 # 初始化参数 params = np.array(param0) n = len(params) # 计算jacobian矩阵 def jacobian(x, params): h = 1e-6 J = np.zeros((len(x), n)) for i in range(n): p = params.copy() p[i] += h J[:, i] = (f(x, p) - f(x, params)) / h return J # 初始化lambda值和误差矩阵 lamda = 0.01 err = y - f(x, params) # 迭代 for i in range(max_iter): J = jacobian(x, params) # 计算增量 A = np.dot(J.T, J) + lamda * np.eye(n) b = np.dot(J.T, err) dp = np.linalg.solve(A, b) new_params = params + dp # 计算新的误差 new_err = y - f(x, new_params) if np.sum(new_err**2) < np.sum(err**2): params = new_params err = new_err lamda /= 10 else: lamda *= 10 # 判断是否达到收敛 if np.max(np.abs(dp)) < tol: return params # 达到最大迭代次数,返回估计的参数值 return params ``` 以上代码实现了Levenberg-Marquardt算法的主要步骤:计算jacobian矩阵、初始化lambda值和误差矩阵、计算增量并更新参数,以及判断是否达到收敛。 在实际使用中,可以根据具体的问题调整迭代次数、lambda值和收敛阈值等参数,以达到更好的拟合效果。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值