INTRODUCTION TO NONELINEAR OPTIMIZATION Excise 5.2 Freudenstein and Roth Test Function

Amir Beck’s INTRODUCTION TO NONELINEAR OPTIMIZATION Theory, Algorithms, and Applications with MATLAB Excise 5.2

INTRODUCTION TO NONELINEAR OPTIMIZATION Theory, Algorithms, and Applications with MATLAB. Amir Beck. 2014

本文主要涉及题目(ii)的MATLAB部分,
题目(i): https://blog.csdn.net/IYXUAN/article/details/121454946

实验目的

  • 掌握回溯法梯度下降、阻尼牛顿法和混合牛顿法;
  • 学会使用MATLAB,通过上述三种方法求解优化问题;
  • 了解这三种方法的优劣,并能够分析结果产生的原因。

实验环境

MATLAB R2021a

实验内容

Consider the Freudenstein and Roth test function

f ( x ) = f 1 ( x ) 2 + f 2 ( x ) 2 , x ∈ R 2 , f\left(x\right)=f_1\left(x\right)^2+f_2\left(x\right)^2,x\in \mathbb{R}^2, f(x)=f1(x)2+f2(x)2,xR2,

where

f 1 ( x ) = − 13 + x 1 + ( ( 5 − x 2 ) x 2 − 2 ) x 2   , f_1\left(x\right)=-13+x_1+\left(\left(5-x_2\right)x_2-2\right)x_2\ , f1(x)=13+x1+((5x2)x22)x2 ,

f 2 ( x ) = − 29 + x 1 + ( ( x 2 + 1 ) x 2 − 14 ) x 2   . f_2\left(x\right)=-29+x_1+\left(\left(x_2+1\right)x_2-14\right)x_2\ . f2(x)=29+x1+((x2+1)x214)x2 .

Use MATLAB to employ the following three methods on the problem of minimizing f f f :

  1. the gradient method with backtracking and parameters ( s , α , β ) = ( 1 , 0.5 , 0.5 ) \left(s,\alpha,\beta\right)=\left(1,0.5,0.5\right) (s,α,β)=(1,0.5,0.5).
  2. the hybrid Newton’s method with parameters ( s , α , β ) = ( 1 , 0.5 , 0.5 ) \left(s,\alpha,\beta\right)=\left(1,0.5,0.5\right) (s,α,β)=(1,0.5,0.5).
  3. damped Gauss–Newton’s method with a backtracking line search strategy with parameters ( s , α , β ) = ( 1 , 0.5 , 0.5 ) \left(s,\alpha,\beta\right)=\left(1,0.5,0.5\right) (s,α,β)=(1,0.5,0.5).

All the algorithms should use the stopping criteria ∣ ∣   ∇ f ( x ) ∣ ∣ ≤ ϵ , ϵ = 10 − 5 \left|\left|\ \nabla f\left(x\right)\right|\right|\le\epsilon,\epsilon={10}^{-5}  f(x)ϵ,ϵ=105. Each algorithm should be employed four times on the following four starting points: ( − 50 , 7 ) T , ( 20 , 7 ) T , ( 20 , − 18 ) T , ( 5 , − 10 ) T \left(-50,7\right)^T,\left(20,7\right)^T,\left(20,-18\right)^T,\left(5,-10\right)^T (50,7)T,(20,7)T,(20,18)T,(5,10)T. For each of the four starting points, compare the number of iterations and the point to which each method converged. If a method did not converge, explain why.

算法描述

gradient_method_backtracking

回溯法梯度下降算法描述

设定 ϵ \epsilon ϵ

初始化 x 0 ∈ R n x_0\in \mathbb{R}^n x0Rn

FOR k = 0, 1, 2, …

  1. 选取下降方向 d k = g ( x k ) d_k=g(x_k) dk=g(xk)
  2. 选取步长 t k t_k tk,使得 f ( x k + t k d k ) < f ( x k ) f(x_k+t_kd_k)<f(x_k) f(xk+tkdk)<f(xk)
  3. 设置 x k + 1 = x k + t k d k x_{k+1}=x_k+t_kd_k xk+1=xk+tkdk
  4. IF ∥ g ( x k + 1 ) ∥ ≤ ϵ \Vert g(x_{k+1}) \Vert \le \epsilon g(xk+1)ϵ THEN STOP,OUTPUT x k + 1 x_{k+1} xk+1

在算法循环的 2. 步长选取中,对于定步长的梯度下降来说, t k t_k tk 为一常量,即 t k = t ˉ t_k=\bar{t} tk=tˉ. 而对于采用回溯法的非精确线搜索来说,令步长的初值 t k = s , s > 0 t_k=s, s>0 tk=s,s>0,更新步长 t k ← β t k , β ∈ ( 0 , 1 ) t_k \gets \beta t_k, \beta \in (0,1) tkβtk,β(0,1),直到满足 f ( x k ) − f ( x k + t k d k ) ≥ − α t k ∇ f ( x k ) T d k , α ∈ ( 0 , 1 ) f(x_k)-f(x_k+t_kd_k)\ge -\alpha t_k\nabla f(x_k)^Td_k, \alpha \in (0,1) f(xk)f(xk+tkdk)αtkf(xk)Tdk,α(0,1),可以证明,当 t ∈ [ 0 , ϵ ] t\in [0,\epsilon] t[0,ϵ],上述不等式总成立。

newton_backtracking

回溯牛顿算法描述:

回溯牛顿法和上述梯度下降相似,只需将1. 下降方向选取 改为牛顿方向即可。

d k = − ( ∇ 2 f ( x k ) ) − 1 ∇ f ( x k ) d_k=-\left( \nabla^2f(x_k) \right )^{-1}\nabla f(x_k) dk=(2f(xk))1f(xk).

newton_hybrid

混合牛顿法描述:

在选取下降方向 d k d_k dk时,当 Hessian 矩阵 ∇ 2 f ( x k ) \nabla^2f(x_k) 2f(xk)非正定时,使用回溯法梯度下降处理,即 d k = − ∇ f ( x k ) d_k=-\nabla f(x_k) dk=f(xk)。当 ∇ 2 f ( x k ) \nabla^2f(x_k) 2f(xk)正定时, d k d_k dk可选用牛顿方向。在判定 Hessian 矩阵是否正定时,使用 Cholesky 分解 作为判定条件。

实验步骤

计算 f ( x ) , g ( x ) , h ( x ) f(x),g(x),h(x) f(x),g(x),h(x)

f ( x ) = 2 x 1 2 + 12 x 1 x 2 2 − 32 x 1 x 2 − 84 x 1 + 2 x 2 6 − 8 x 2 5 + 2 x 2 4 − 80 x 2 3 + 12 x 2 2 + 864 x 2 + 1010 f(x)=2x_1^2 + 12x_1x_2^2 - 32x_1x_2 - 84x_1 + 2x_2^6 - 8x_2^5 + 2x_2^4 - 80x_2^3 + 12x_2^2 + 864x_2 + 1010 f(x)=2x12+12x1x2232x1x284x1+2x268x25+2x2480x23+12x22+864x2+1010

g ( x ) = ( 12 x 2 2 − 32 x 2 + 4 x 1 − 84 , 24 x 2 − 32 x 1 + 24 x 1 x 2 − 240 x 2 2 + 8 x 2 3 − 40 x 2 4 + 12 x 2 5 + 864 ) T g(x)=\left ( 12x_2^2 - 32x_2 + 4x_1 - 84,24x_2 - 32x_1 + 24x_1x_2 - 240x_2^2 + 8x_2^3 - 40x_2^4 + 12x_2^5 + 864\right )^T g(x)=(12x2232x2+4x184,24x232x1+24x1x2240x22+8x2340x24+12x25+864)T

h ( x ) = ( 4 24 x 2 − 32 24 x 2 − 32 60 x 2 4 − 160 x 2 3 + 24 x 2 2 − 480 x 2 + 24 x 1 + 24 ) h(x)=\begin{pmatrix} 4 &24x_2 - 32 \\ 24x_2 - 32 &60x_2^4 - 160x_2^3 + 24x_2^2 - 480x_2 + 24x_1 + 24 \end{pmatrix} h(x)=(424x23224x23260x24160x23+24x22480x2+24x1+24)

clc;clear
syms f1 f2 x1 x2 f g h
f1=-13+x1+((5-x2)*x2-2)*x2;
f2=-29+x1+((x2+1)*x2-14)*x2;
f=expand(f1^2+f2^2)
% f = 2*x1^2 + 12*x1*x2^2 - 32*x1*x2 - 84*x1 + 2*x2^6 - 8*x2^5 + 2*x2^4 - 80*x2^3 + 12*x2^2 + 864*x2 + 1010
g=gradient(f)
% g = [12*x2^2 - 32*x2 + 4*x1 - 84; 24*x2 - 32*x1 + 24*x1*x2 - 240*x2^2 + 8*x2^3 - 40*x2^4 + 12*x2^5 + 864]
h=hessian(f)
% h = [4, 24*x2 - 32; 24*x2 - 32, 60*x2^4 - 160*x2^3 + 24*x2^2 - 480*x2 + 24*x1 + 24]

绘制函数的等高线和曲面图

在这里插入图片描述

Figure 1. contour and surface plots of f ( x ) f(x) f(x) around the global optimal solution x ∗ = ( 5 , 4 ) x^*=(5,4) x=(5,4).

在这里插入图片描述

Figure 2. contour and surface plots of f ( x ) f(x) f(x) around the local optimal solution x ∗ = ( 11.4128 , − 0.8968 ) x^*=(11.4128, -0.8968) x=(11.4128,0.8968).

clc;clear
clc;clear
clc;clear
% local optimality
x1 = linspace(11.40,11.42,50);
x2 = linspace(-0.897,-0.8965,50);
% global optimality
% x1 = linspace(4.9,5.1,50);
% x2 = linspace(3.9,4.1,50);
[X1,X2] = meshgrid(x1,x2);
f = 2*X1.^2 + 12*X1.*X2.^2 - 32*X1.*X2 - 84*X1 + 2*X2.^6 - 8*X2.^5 ...
    + 2*X2.^4 - 80*X2.^3 + 12*X2.^2 + 864*X2 + 1010;
createfigure(X1,X2,f)
%% 由 surfc 函数生成:
function createfigure(xdata1, ydata1, zdata1)
%CREATEFIGURE(xdata1, ydata1, zdata1)
%  XDATA1:  surface xdata
%  YDATA1:  surface ydata
%  ZDATA1:  surface zdata

%  Auto-generated by MATLAB on 08-Dec-2021 15:44:51

% Create figure
figure1 = figure;

% Create axes
axes1 = axes('Parent',figure1);
hold(axes1,'on');

% Create surf
surf(xdata1,ydata1,zdata1,'Parent',axes1);

% Create contour
contour(xdata1,ydata1,zdata1,'ZLocation','zmin');

view(axes1,[-113 13]);
grid(axes1,'on');
axis(axes1,'tight');
hold(axes1,'off');
% Create colorbar
colorbar(axes1);

执行函数

clc;clear

%% init
s = 1;
alpha = .5;
beta = .5;
epsilon = 1e-5;
p1 = [-50;7];
p2 = [20;7];
p3 = [20;-18];
p4 = [5;-10];
f = @(x) 2*x(1)^2 + 12*x(1)*x(2)^2 - 32*x(1)*x(2) - 84*x(1) ...
    + 2*x(2)^6 - 8*x(2)^5 + 2*x(2)^4 - 80*x(2)^3 + 12*x(2)^2 ...
    + 864*x(2) + 1010;
g = @(x) [12*x(2)^2 - 32*x(2) + 4*x(1) - 84; 24*x(2) - 32*x(1)...
    + 24*x(1)*x(2) - 240*x(2)^2 + 8*x(2)^3 - 40*x(2)^4 + 12*x(2)^5 + 864];
h = @(x) [4, 24*x(2) - 32; 24*x(2) - 32, 60*x(2)^4 - 160*x(2)^3 ...
    + 24*x(2)^2 - 480*x(2) + 24*x(1) + 24];

%% call func
% gradient_method_backtracking
[x_gb1,fun_val_gb1] = gradient_method_backtracking(f,g,p1,s,alpha,...
beta,epsilon);
[x_gb2,fun_val_gb2] = gradient_method_backtracking(f,g,p2,s,alpha,...
beta,epsilon);
[x_gb3,fun_val_gb3] = gradient_method_backtracking(f,g,p3,s,alpha,...
beta,epsilon);
[x_gb4,fun_val_gb4] = gradient_method_backtracking(f,g,p4,s,alpha,...
beta,epsilon);

% newton_backtracking
x_nb1 = newton_backtracking(f,g,h,p1,alpha,beta,epsilon);
x_nb2 = newton_backtracking(f,g,h,p2,alpha,beta,epsilon);
x_nb3 = newton_backtracking(f,g,h,p3,alpha,beta,epsilon);
x_nb4 = newton_backtracking(f,g,h,p4,alpha,beta,epsilon);

% newton_hybrid
x_nh1 = newton_hybrid(f,g,h,p1,alpha,beta,epsilon);
x_nh2 = newton_hybrid(f,g,h,p2,alpha,beta,epsilon);
x_nh3 = newton_hybrid(f,g,h,p3,alpha,beta,epsilon);
x_nh4 = newton_hybrid(f,g,h,p4,alpha,beta,epsilon);
function [x,fun_val]=gradient_method_backtracking(f,g,x0,s,alpha,...
beta,epsilon)

% Gradient method with backtracking stepsize rule
%
% INPUT
%=======================================
% f ......... objective function
% g ......... gradient of the objective function
% x0......... initial point
% s ......... initial choice of stepsize
% alpha ..... tolerance parameter for the stepsize selection
% beta ...... the constant in which the stepsize is multiplied
% at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
% of min f(x)
% fun_val ... optimal function value

x=x0;
grad=g(x);
fun_val=f(x);
iter=0;
while (norm(grad)>epsilon)
    iter=iter+1;
    t=s;
    while (fun_val-f(x-t*grad)<alpha*t*norm(grad)^2)
        t=beta*t;
    end
    x=x-t*grad;
    fun_val=f(x);
    grad=g(x);
    fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...
    iter,norm(grad),fun_val);
end
function x=newton_backtracking(f,g,h,x0,alpha,beta,epsilon)

% Newton’s method with backtracking
%
% INPUT
%=======================================
% f ......... objective function
% g ......... gradient of the objective function
% h ......... hessian of the objective function
% x0......... initial point
% alpha ..... tolerance parameter for the stepsize selection strategy
% beta ...... the proportion in which the stepsize is multiplied
% at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
% of min f(x)
% fun_val ... optimal function value

x=x0;
gval=g(x);
hval=h(x);
d=hval\gval;
iter=0;
while ((norm(gval)>epsilon)&&(iter<10000))
    iter=iter+1;
    t=1;
    while(f(x-t*d)>f(x)-alpha*t*gval'*d)
        t=beta*t;
    end
    x=x-t*d;
    fprintf('iter= %2d f(x)=%10.10f\n',iter,f(x))
    gval=g(x);
    hval=h(x);
    d=hval\gval;
end
if (iter==10000)
    fprintf('did not converge\n')
end
function x=newton_hybrid(f,g,h,x0,alpha,beta,epsilon)

% Hybrid Newton’s method
%
% INPUT
%=======================================
% f ......... objective function
% g ......... gradient of the objective function
% h ......... hessian of the objective function
% x0......... initial point
% alpha ..... tolerance parameter for the stepsize selection strategy
% beta ...... the proportion in which the stepsize is multiplied
% at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
% of min f(x)
% fun_val ... optimal function value

x=x0;
gval=g(x);
hval=h(x);
[L,p]=chol(hval,'lower');
if (p==0)
    d=L'\(L\gval);
else
    d=gval;
end
iter=0;
while ((norm(gval)>epsilon)&&(iter<10000))
    iter=iter+1;
    t=1;
    while(f(x-t*d)>f(x)-alpha*t*gval'*d)
        t=beta*t;
    end
        x=x-t*d;
        fprintf('iter= %2d f(x)=%10.10f\n',iter,f(x))
        gval=g(x);
        hval=h(x);
        [L,p]=chol(hval,'lower');
    if (p==0)
        d=L'\(L\gval);
    else
        d=gval;
    end
end
if (iter==10000)
    fprintf('did not converge\n')
end

结果与分析

p 1 = ( − 50 , 7 ) T , p 2 = ( 20 , 7 ) T , p 3 = ( 20 , − 18 ) T , p 4 = ( 5 , − 10 ) T p_1=\left(-50,7\right)^T,p_2=\left(20,7\right)^T,p_3=\left(20,-18\right)^T,p_4=\left(5,-10\right)^T p1=(50,7)T,p2=(20,7)T,p3=(20,18)T,p4=(5,10)T

gradient_method_backtracking

x*fun_valiter
p15.0, 4.00.000000-
p25.0, 4.00.000000-
p311.4128, -0.896848.984254-
p45.0, 4.00.000107-
# p1
iter_number =   1 norm_grad = 16886.999242 fun_val = 11336.705024 
iter_number =   2 norm_grad = 5600.459983 fun_val = 5803.984203 
iter_number =   3 norm_grad = 1056.831987 fun_val = 4723.717473 
iter_number =   4 norm_grad = 434.668354 fun_val = 4676.341028 
iter_number =   5 norm_grad = 181.792854 fun_val = 4664.456052 
iter_number =   6 norm_grad = 481.845230 fun_val = 4645.697772 
iter_number =   7 norm_grad = 1450.431648 fun_val = 2784.316562 
iter_number =   8 norm_grad = 626.401853 fun_val = 2597.516663 
iter_number =   9 norm_grad = 220.402369 fun_val = 2565.305713 
iter_number =  10 norm_grad = 116.512243 fun_val = 2561.054079 
......
iter_number = 20849 norm_grad = 0.000092 fun_val = 0.000000 
iter_number = 20850 norm_grad = 0.000092 fun_val = 0.000000 
iter_number = 20851 norm_grad = 0.000092 fun_val = 0.000000 
iter_number = 20852 norm_grad = 0.000092 fun_val = 0.000000 
......

# p2
iter_number =   1 norm_grad = 20020.154130 fun_val = 11955.275024 
iter_number =   2 norm_grad = 8107.335439 fun_val = 3744.085975 
iter_number =   3 norm_grad = 2993.085982 fun_val = 1129.842099 
iter_number =   4 norm_grad = 949.370487 fun_val = 445.778197 
iter_number =   5 norm_grad = 192.946640 fun_val = 320.464548 
iter_number =   6 norm_grad = 86.882053 fun_val = 313.996463 
iter_number =   7 norm_grad = 41.236184 fun_val = 311.880377 
iter_number =   8 norm_grad = 223.087207 fun_val = 295.899225 
iter_number =   9 norm_grad = 91.972368 fun_val = 287.479235 
iter_number =  10 norm_grad = 40.293830 fun_val = 285.264643 
......
iter_number = 13961 norm_grad = 0.000151 fun_val = 0.000000 
iter_number = 13962 norm_grad = 0.000151 fun_val = 0.000000 
iter_number = 13963 norm_grad = 0.000151 fun_val = 0.000000 
iter_number = 13964 norm_grad = 0.000151 fun_val = 0.000000 
iter_number = 13965 norm_grad = 0.000151 fun_val = 0.000000 
......

# p3
iter_number =   1 norm_grad = 10459534.883496 fun_val = 26901796.557585 
iter_number =   2 norm_grad = 4328861.498929 fun_val = 9350549.111728 
iter_number =   3 norm_grad = 1815010.486957 fun_val = 3306985.630845 
iter_number =   4 norm_grad = 764085.159510 fun_val = 1178877.486706 
iter_number =   5 norm_grad = 322588.854922 fun_val = 423840.683140 
iter_number =   6 norm_grad = 136745.858345 fun_val = 154327.967236 
iter_number =   7 norm_grad = 58355.714510 fun_val = 57260.204151 
iter_number =   8 norm_grad = 25173.561268 fun_val = 21780.708212 
iter_number =   9 norm_grad = 11034.676282 fun_val = 8504.776429 
iter_number =  10 norm_grad = 4932.916085 fun_val = 3367.298679 
......
iter_number = 11759 norm_grad = 0.000034 fun_val = 48.984254 
iter_number = 11760 norm_grad = 0.000034 fun_val = 48.984254 
iter_number = 11761 norm_grad = 0.000034 fun_val = 48.984254 
iter_number = 11762 norm_grad = 0.000034 fun_val = 48.984254 
iter_number = 11763 norm_grad = 0.000034 fun_val = 48.984254 
iter_number = 11764 norm_grad = 0.000034 fun_val = 48.984254 
......

# p4
iter_number =   1 norm_grad = 740484.580250 fun_val = 1125740.309089 
iter_number =   2 norm_grad = 318800.867908 fun_val = 410873.876977 
iter_number =   3 norm_grad = 135283.737154 fun_val = 147433.167712 
iter_number =   4 norm_grad = 57286.724183 fun_val = 52647.763355 
iter_number =   5 norm_grad = 24273.902315 fun_val = 18647.021114 
iter_number =   6 norm_grad = 10264.843110 fun_val = 6442.533586 
iter_number =   7 norm_grad = 4279.439838 fun_val = 2094.562330 
iter_number =   8 norm_grad = 1694.354839 fun_val = 604.979471 
iter_number =   9 norm_grad = 568.486420 fun_val = 158.093768 
iter_number =  10 norm_grad = 98.759692 fun_val = 69.674565 
......
iter_number = 6072 norm_grad = 0.000107 fun_val = 0.000000 
iter_number = 6073 norm_grad = 0.000107 fun_val = 0.000000 
iter_number = 6074 norm_grad = 0.000107 fun_val = 0.000000 
iter_number = 6075 norm_grad = 0.000107 fun_val = 0.000000 
iter_number = 6076 norm_grad = 0.000107 fun_val = 0.000000 
......

newton_backtracking

x*fun_valiter
p15.0, 4.0-0.0000008
p25.0, 4.00.00000011
p311.4128, -0.896848.98425416
p411.4128, -0.896848.98425413
# p1
iter=  1 f(x)=17489.2000310639
iter=  2 f(x)=3536.1276294897
iter=  3 f(x)=556.9651095482
iter=  4 f(x)=50.4029326519
iter=  5 f(x)=1.2248344195
iter=  6 f(x)=0.0013284549
iter=  7 f(x)=0.0000000018
iter=  8 f(x)=-0.0000000000

# p2
iter=  1 f(x)=18114.8519548468
iter=  2 f(x)=3676.4191962662
iter=  3 f(x)=583.8090713510
iter=  4 f(x)=53.8298627069
iter=  5 f(x)=1.3684607619
iter=  6 f(x)=0.0016459572
iter=  7 f(x)=0.0000000027
iter=  8 f(x)=0.0000000007
iter=  9 f(x)=0.0000000002
iter= 10 f(x)=0.0000000000
iter= 11 f(x)=0.0000000000

# p3
iter=  1 f(x)=21356364.5665758252
iter=  2 f(x)=5561714.7070109081
iter=  3 f(x)=1444932.1368662491
iter=  4 f(x)=374422.3650773597
iter=  5 f(x)=96843.5106842914
iter=  6 f(x)=25078.5606496656
iter=  7 f(x)=6556.5446423615
iter=  8 f(x)=1764.6276022580
iter=  9 f(x)=509.7515417783
iter= 10 f(x)=171.5021447876
iter= 11 f(x)=76.9343325176
iter= 12 f(x)=52.3964690617
iter= 13 f(x)=49.0499328705
iter= 14 f(x)=48.9842770185
iter= 15 f(x)=48.9842536792
iter= 16 f(x)=48.9842536792

# p4
iter=  1 f(x)=695503.8833055196
iter=  2 f(x)=180005.8157903731
iter=  3 f(x)=46559.5630006172
iter=  4 f(x)=12100.5787363822
iter=  5 f(x)=3202.4785180644
iter=  6 f(x)=889.1211707798
iter=  7 f(x)=275.2063763915
iter=  8 f(x)=106.0983438392
iter=  9 f(x)=59.2021422471
iter= 10 f(x)=49.5512377643
iter= 11 f(x)=48.9860167506
iter= 12 f(x)=48.9842536959
iter= 13 f(x)=48.9842536792

newton_hybrid

x*fun_valiter
p15.0, 4.0-0.0000008
p25.0, 4.00.00000011
p311.4128, -0.896848.98425416
p411.4128, -0.896848.98425413
# p1
iter=  1 f(x)=17489.2000310639
iter=  2 f(x)=3536.1276294897
iter=  3 f(x)=556.9651095482
iter=  4 f(x)=50.4029326519
iter=  5 f(x)=1.2248344195
iter=  6 f(x)=0.0013284549
iter=  7 f(x)=0.0000000018
iter=  8 f(x)=-0.0000000000

# p2
iter=  1 f(x)=18114.8519548468
iter=  2 f(x)=3676.4191962662
iter=  3 f(x)=583.8090713510
iter=  4 f(x)=53.8298627069
iter=  5 f(x)=1.3684607619
iter=  6 f(x)=0.0016459572
iter=  7 f(x)=0.0000000027
iter=  8 f(x)=0.0000000007
iter=  9 f(x)=0.0000000002
iter= 10 f(x)=0.0000000000
iter= 11 f(x)=0.0000000000

# p3
iter=  1 f(x)=21356364.5665758252
iter=  2 f(x)=5561714.7070109081
iter=  3 f(x)=1444932.1368662491
iter=  4 f(x)=374422.3650773597
iter=  5 f(x)=96843.5106842914
iter=  6 f(x)=25078.5606496656
iter=  7 f(x)=6556.5446423615
iter=  8 f(x)=1764.6276022580
iter=  9 f(x)=509.7515417783
iter= 10 f(x)=171.5021447876
iter= 11 f(x)=76.9343325176
iter= 12 f(x)=52.3964690617
iter= 13 f(x)=49.0499328705
iter= 14 f(x)=48.9842770185
iter= 15 f(x)=48.9842536792
iter= 16 f(x)=48.9842536792

# p4
iter=  1 f(x)=695503.8833055196
iter=  2 f(x)=180005.8157903731
iter=  3 f(x)=46559.5630006172
iter=  4 f(x)=12100.5787363822
iter=  5 f(x)=3202.4785180644
iter=  6 f(x)=889.1211707798
iter=  7 f(x)=275.2063763915
iter=  8 f(x)=106.0983438392
iter=  9 f(x)=59.2021422471
iter= 10 f(x)=49.5512377643
iter= 11 f(x)=48.9860167506
iter= 12 f(x)=48.9842536959
iter= 13 f(x)=48.9842536792

结果分析

回溯法梯度下降(gradient_method_backtracking)在 f ( x ) f(x) f(x)函数上的表现情况比较糟糕,在所有初始点(p1-p4)上均不能满足结束条件 ∣ ∣   ∇ f ( x ) ∣ ∣ ≤ ϵ , ϵ = 10 − 5 \left|\left|\ \nabla f\left(x\right)\right|\right|\le\epsilon,\epsilon={10}^{-5}  f(x)ϵ,ϵ=105,所以其迭代步数 i t e r → ∞ iter \to \infty iter. 但是可以发现,随着迭代次数的不断增加,最优解 x ∗ x^* x和函数值 f u n _ v a l fun\_val fun_val均收敛。在起始点为p1,p2,p4时,回溯梯度下降求得的最优解 x ˉ → ( 5.0 , 4.0 ) , f u n _ v a l → 0.0 \bar{x} \to (5.0,4.0), fun\_val \to 0.0 xˉ(5.0,4.0),fun_val0.0,显然是 f ( x ) f(x) f(x)的全局最优,但当以p3为起始点时,所得解为局部最优。此外,当以p4为起始点时, f u n _ v a l → 0.000107 fun\_val \to 0.000107 fun_val0.000107,收敛效果不如p1,p2起始点。总之,回溯法梯度下降的结果说明,由梯度下降得到的 f ( x ) f(x) f(x)的最优解,无论是在全局最优还是局部最优,其梯度 ∇ f ( x ∗ ) \nabla f\left(x^*\right) f(x)难以收敛到 0 \mathscr{0} 0,这也导致了梯度下降的发散。

回溯牛顿法(newton_backtracking)和混合牛顿法(newton_hybrid)得到的结果极为相似,二者在所有四个起始点上的收敛结果表现良好。当以p1,p2为起始点时,两种方法均在10步左右收敛到全局最优 ( 5.0 , 4.0 ) (5.0,4.0) (5.0,4.0)。当以p3,p4为起始点时,两方法也能在16步以内收敛到局部最优 ( 11.4128 , − 0.8968 ) (11.4128,-0.8968) (11.4128,0.8968). 观察两种方法在四个起始点上的输出结果,不难发现,输出结果完全相同,说明 ∇ 2 f ( x k ) , k = 1 , 2 , … \nabla ^2f(x_k), k=1,2,\dots 2f(xk),k=1,2,是非正定的,所以混合牛顿法中的(a)步(纯牛顿)永远不执行,混合牛顿此时等价于回溯牛顿。

总之,在 f ( x ) f(x) f(x)上,当以 ∣ ∣   ∇ f ( x ) ∣ ∣ ≤ ϵ , ϵ = 10 − 5 \left|\left|\ \nabla f\left(x\right)\right|\right|\le\epsilon,\epsilon={10}^{-5}  f(x)ϵ,ϵ=105为终止条件时,牛顿法的迭代次数明显少于梯度下降法。两种方法,当选取的起始点不同,可能会导致收敛不到全局最优解。

©️ Sylvan Ding ❤️

  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值