INTRODUCTION TO NONELINEAR OPTIMIZATION Excise 5.2 Freudenstein and Roth Test Function

最新推荐文章于 2022-07-02 13:52:19 发布

Sylvan Ding

最新推荐文章于 2022-07-02 13:52:19 发布

阅读量1.5k

点赞数 2

分类专栏：凸优化

©SylvanDing

本文链接：https://blog.csdn.net/IYXUAN/article/details/121454946

版权

凸优化专栏收录该内容

5 篇文章 8 订阅

订阅专栏

Amir Beck’s INTRODUCTION TO NONELINEAR OPTIMIZATION Theory, Algorithms, and Applications with MATLAB Excise 5.2

INTRODUCTION TO NONELINEAR OPTIMIZATION Theory, Algorithms, and Applications with MATLAB. Amir Beck. 2014

Excise 5.2

5.2. Consider the Freudenstein and Roth test function

$f\left(x\right)=f_1\left(x\right)^2+f_2\left(x\right)^2,x\in R^2,$

where

$f_1\left(x\right)=-13+x_1+\left(\left(5-x_2\right)x_2-2\right)x_2\ ,$

$f_2\left(x\right)=-\mathbf{13}+x_2+\left(\left(x_2+1\right)x_2-14\right)x_2\ .$

（这里书上是 -29 ，我抄错了…呜呜呜，😭 就这样做下去吧…，题目我也改了一下）
正确的题目在 https://blog.csdn.net/IYXUAN/article/details/121801303

(i) Show that the function f has five stationary points. Find them and prove that three points are global minimizers, the last two points is saddle points.

Proof.

Substituting $f_1\left(x\right)$ and $f_2\left(x\right)$ in $f\left(x\right)$ yields

$f\left(x\right)=2x_2^6-8x_2^5+4x_2^4+\left(-2x_1-46\right)x_2^3+\left(10x_1+17\right)x_2^2+\left(390-4x_1\right)x_2+\left(x_1-13\right)^2+169\ .$

By Definition 2.7 (stationary points), the stationary points are those satisfying

在这里插入图片描述

Substituting $x_1=x_2^3-5x_2^2+2x_2+13$ (which is deduced from the first equation) in the second equation yields the simplified equation (1)

$3x_2^5+5x_2^4-50x_2^3-78x_2^2+143x_2+169=0.$

After factorizing equation (1) we have the factorized equation (2)

$\left(x_2+1\right)\left(3x_2^2+2x_2-13\right)\left(x_2^2-13\right)=0.$

Equation (2) is established when $x_2+1=0$ , $3x_2^2+2x_2-13=0$ , or $x_2^2-13=0$ ,

whose solutions are $x_2=-1$ , $\frac{-1-2\sqrt{10}}{3}$ , $\frac{-1+2\sqrt{10}}{3}$ , $\sqrt{13}$ , $-\sqrt{13}$ . Substituting $x_2$ in $x_1$ yields the respective solutions: $x_1=5$ , $- 36.2420$ , $6.3902$ , $15\sqrt{13}-52$ , $-15\sqrt{13}-52$ .

The Hessian of f is given by

在这里插入图片描述

For the stationary point (5,-1) we have

在这里插入图片描述

which is a positive definite matrix with a positive trace and determinant by Corollary 2.18.
Therefore, (5,-1) is a strict local minimum point by Theorem 2.27 (sufficient second order optimality condition).

For the stationary point $(15\sqrt{13}-52,\sqrt{13})$ we have

在这里插入图片描述

which is a positive definite matrix. Therefore, $(15\sqrt{13}-52,\sqrt{13})$ is also a strict local minimum point.

For the stationary point $(-15\sqrt{13}-52,-\sqrt{13})$ we have

在这里插入图片描述

which is a positive definite matrix. Therefore, $(-15\sqrt{13}-52,-\sqrt{13})$ is also a strict local minimum point.

Let $x_1=0$ , and $\left|x_2\right|\rightarrow\infty$ , then we have

$x_1=0, and \left|x_2\right|\rightarrow\infty, then we have$

By Theorem 2.32 (attainment under coerciveness), f(x) is coercive and has a global minimum points over R.

Since we have already solved 3 strict minimum points, now calculate the values of function f at these points.

$f\left(5,-1\right)=f\left(15\sqrt{13}-52,\sqrt{13}\right)=f\left(-15\sqrt{13}-52,-\sqrt{13}\right)=0$

We find that all these 3 points have same optimal value 0, so all of them are nonstrict global minimum points.

For the stationary point $(-36.2420,\frac{-1-2\sqrt{10}}{3})$ we have

在这里插入图片描述

while the Hessian has one negative eigenvalue approximately at -0.0700 and the other one is positive around 1835.0, which suggests that the matrix is indefinite by Theorem 2.17.e (eigenvalue characterization theorem). Since the indefiniteness of Hessian at $(-36.2420,\frac{-1-2\sqrt{10}}{3})$ , the point is a saddle point according to Theorem 2.29 (sufficient condition for a saddle point).

For the stationary point $(6.3902,\frac{-1+2\sqrt{10}}{3})$ we have

在这里插入图片描述

which is indefinite with both positive and negative elements in the diagonal of the Hessian. Therefore, the point $(6.3902,\frac{-1+2\sqrt{10}}{3})$ is also a saddle point.

(ii) Use MATLAB to employ the following three methods on the problem of minimizing f :

the gradient method with backtracking and parameters $\left(s,\alpha,\beta\right)=\left(1,0.5,0.5\right)$ .
the hybrid Newton’s method with parameters $\left(s,\alpha,\beta\right)=\left(1,0.5,0.5\right)$ .
damped Gauss–Newton’s method with a backtracking line search strategy with parameters $\left(s,\alpha,\beta\right)=\left(1,0.5,0.5\right)$ .
All the algorithms should use the stopping criteria $\left|\left|\ \nabla f\left(x\right)\right|\right|\le\epsilon,\epsilon={10}^{-5}$ . Each algorithm should be employed four times on the following four starting points:
$\left(-50,7\right)^T,\left(20,7\right)^T,\left(20,-18\right)^T,\left(5,-10\right)^T$ . For each of the four starting points, compare the number of iterations and the point to which each method converged. If a method did not converge, explain why.

The contour and surface plots of the function are plotted in Figure 1.

在这里插入图片描述

Let $X_1=\left(-50,7\right)^T,{X_2=\left(20,7\right)}^T,X_3=\left(20,-18\right)^T,X_4=\left(5,-10\right)^T$ .
We codename the above three minimizing methods as GB(gradient method with backtracking), HN(hybrid Newton’s method), NB(damped Newton’s method with backtracking).

f=@(x) (-13+x(1)+((5-x(2))*x(2)-2)*x(2))^2 + (-13+x(2)+((x(2)+1)*x(2)-14)*x(2))^2;
g=@(x) [-2*x(2)^3+10*x(2)^2-4*x(2)+2*x(1)-26;...
	16*x(2)^3-3*x(2)^2*(2*x(1)+46)-4*x(1)-40*x(2)^4+12*x(2)^5+2*x(2)*(10*x(1)+17)+390];
h=@(x) [1 -3*x(2)^2+10*x(2)-2;...
	-3*x(2)^2+10*x(2)-2 10*x(1)+24*x(2)^2-80*x(2)^3+30*x(2)^4-3*x(2)*(2*x(1)+46)+17];

代码

function [x,fun_val]=gradient_method_backtracking(f,g,x0,s,alpha,...
beta,epsilon)

% Gradient method with backtracking stepsize rule
%
% INPUT
%=======================================
% f ......... objective function
% g ......... gradient of the objective function
% x0......... initial point
% s ......... initial choice of stepsize
% alpha ..... tolerance parameter for the stepsize selection
% beta ...... the constant in which the stepsize is multiplied
% at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
% of min f(x)
% fun_val ... optimal function value

x=x0;
grad=g(x);
fun_val=f(x);
iter=0;
while (norm(grad)>epsilon)
    iter=iter+1;
    t=s;
    while (fun_val-f(x-t*grad)<alpha*t*norm(grad)^2)
        t=beta*t;
    end
    x=x-t*grad;
    fun_val=f(x);
    grad=g(x);
    fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...
    iter,norm(grad),fun_val);
end

function x=newton_backtracking(f,g,h,x0,alpha,beta,epsilon)

% Newton’s method with backtracking
%
% INPUT
%=======================================
% f ......... objective function
% g ......... gradient of the objective function
% h ......... hessian of the objective function
% x0......... initial point
% alpha ..... tolerance parameter for the stepsize selection strategy
% beta ...... the proportion in which the stepsize is multiplied
% at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
% of min f(x)
% fun_val ... optimal function value

x=x0;
gval=g(x);
hval=h(x);
d=hval\gval;
iter=0;
while ((norm(gval)>epsilon)&&(iter<10000))
    iter=iter+1;
    t=1;
    while(f(x-t*d)>f(x)-alpha*t*gval'*d)
        t=beta*t;
    end
    x=x-t*d;
    fprintf('iter= %2d f(x)=%10.10f\n',iter,f(x))
    gval=g(x);
    hval=h(x);
    d=hval\gval;
end
if (iter==10000)
    fprintf('did not converge\n')
end

function x=newton_hybrid(f,g,h,x0,alpha,beta,epsilon)

% Hybrid Newton’s method
%
% INPUT
%=======================================
% f ......... objective function
% g ......... gradient of the objective function
% h ......... hessian of the objective function
% x0......... initial point
% alpha ..... tolerance parameter for the stepsize selection strategy
% beta ...... the proportion in which the stepsize is multiplied
% at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
% of min f(x)
% fun_val ... optimal function value

x=x0;
gval=g(x);
hval=h(x);
[L,p]=chol(hval,'lower');
if (p==0)
    d=L'\(L\gval);
else
    d=gval;
end
iter=0;
while ((norm(gval)>epsilon)&&(iter<10000))
    iter=iter+1;
    t=1;
    while(f(x-t*d)>f(x)-alpha*t*gval'*d)
        t=beta*t;
    end
        x=x-t*d;
        fprintf('iter= %2d f(x)=%10.10f\n',iter,f(x))
        gval=g(x);
        hval=h(x);
        [L,p]=chol(hval,'lower');
    if (p==0)
        d=L'\(L\gval);
    else
        d=gval;
    end
end
if (iter==10000)
    fprintf('did not converge\n')
end

1. gradient_method_backtracking

X_1
iter_number =   1 norm_grad = 20449.270740 fun_val = 15294.869588 
iter_number =   2 norm_grad = 8641.597498 fun_val = 6590.461309 
iter_number =   3 norm_grad = 3243.407821 fun_val = 3600.964616 
iter_number =   4 norm_grad = 998.460370 fun_val = 2802.322856 
iter_number =   5 norm_grad = 177.249171 fun_val = 2668.602665
......
iter_number = 573875 norm_grad = 0.000011 fun_val = 0.000000 
iter_number = 573876 norm_grad = 0.000015 fun_val = 0.000000 
iter_number = 573877 norm_grad = 0.000011 fun_val = 0.000000 
iter_number = 573878 norm_grad = 0.000016 fun_val = 0.000000 
iter_number = 573879 norm_grad = 0.000010 fun_val = 0.000000

X_2
iter_number =   1 norm_grad = 19908.598309 fun_val = 11825.670982 
iter_number =   2 norm_grad = 8081.633889 fun_val = 3696.171028 
iter_number =   3 norm_grad = 2966.115361 fun_val = 1102.231453 
iter_number =   4 norm_grad = 919.934003 fun_val = 433.994995 
iter_number =   5 norm_grad = 168.340107 fun_val = 318.268440
......
iter_number = 4876 norm_grad = 0.000011 fun_val = 0.000000 
iter_number = 4877 norm_grad = 0.000014 fun_val = 0.000000 
iter_number = 4878 norm_grad = 0.000010 fun_val = 0.000000 
iter_number = 4879 norm_grad = 0.000013 fun_val = 0.000000 
iter_number = 4880 norm_grad = 0.000010 fun_val = 0.000000

X_3
iter_number =   1 norm_grad = 10467474.560463 fun_val = 26943900.500393 
iter_number =   2 norm_grad = 4332079.923761 fun_val = 9366327.076717 
iter_number =   3 norm_grad = 1816678.627920 fun_val = 3313434.050683 
iter_number =   4 norm_grad = 764962.552652 fun_val = 1181244.835355 
iter_number =   5 norm_grad = 323019.474471 fun_val = 424420.069151
......
iter_number = 4914 norm_grad = 0.000011 fun_val = 0.000000 
iter_number = 4915 norm_grad = 0.000014 fun_val = 0.000000 
iter_number = 4916 norm_grad = 0.000010 fun_val = 0.000000 
iter_number = 4917 norm_grad = 0.000012 fun_val = 0.000000 
iter_number = 4918 norm_grad = 0.000010 fun_val = 0.000000

X_4
iter_number =   1 norm_grad = 740015.014385 fun_val = 1122781.246864 
iter_number =   2 norm_grad = 318266.908540 fun_val = 409081.024955 
iter_number =   3 norm_grad = 134899.424199 fun_val = 146625.951992 
iter_number =   4 norm_grad = 57053.000249 fun_val = 52419.670771 
iter_number =   5 norm_grad = 24163.571025 fun_val = 18704.114373
......
iter_number = 3292 norm_grad = 0.000011 fun_val = 0.000000 
iter_number = 3293 norm_grad = 0.000014 fun_val = 0.000000 
iter_number = 3294 norm_grad = 0.000010 fun_val = 0.000000 
iter_number = 3295 norm_grad = 0.000013 fun_val = 0.000000 
iter_number = 3296 norm_grad = 0.000010 fun_val = 0.000000

2. newton_hybrid
We deem that the method diverges when iteration numbers surpassing 1 0000. 

X_1
iter=  1 f(x)=24168.6348753983
iter=  2 f(x)=5281.8092360431
iter=  3 f(x)=943.3456847885
iter=  4 f(x)=109.2070073817
iter=  5 f(x)=4.6739803803
iter=  6 f(x)=0.0186701442
iter=  7 f(x)=0.0000003816
iter=  8 f(x)=0.0000000000

X_2
iter=  1 f(x)=22276.1160786165
iter=  2 f(x)=4841.4691558324
iter=  3 f(x)=850.8966301276
iter=  4 f(x)=95.1123281787
iter=  5 f(x)=3.7461154078
iter=  6 f(x)=0.0123117639
iter=  7 f(x)=0.0000001665
iter=  8 f(x)=0.0000000000

X_3
iter=  1 f(x)=12708065.1999076791
iter=  2 f(x)=3598508.6695166193
iter=  3 f(x)=1048848.0285602217
iter=  4 f(x)=303170.7484528549
iter=  5 f(x)=86826.0162440179
......
iter= 11 f(x)=8.3458674974
iter= 12 f(x)=0.3309474447
iter= 13 f(x)=0.0010572765
iter= 14 f(x)=0.0000000133
iter= 15 f(x)=0.0000000000

X_4
iter=  1 f(x)=311084.5021149455
iter=  2 f(x)=85396.3604490121
iter=  3 f(x)=24069.1868676354
iter=  4 f(x)=6541.2522870205
iter=  5 f(x)=1677.8013670689
iter=  6 f(x)=384.6484542764
iter=  7 f(x)=70.6436337793
iter=  8 f(x)=8.0957145600
iter=  9 f(x)=0.3149413944
iter= 10 f(x)=0.0009624574
iter= 11 f(x)=0.0000000110
iter= 12 f(x)=0.0000000000

3. newton_backtracking
和2. newton_hybrid结果相似，你可以自己试一下

Table 1. Iteration numbers of each method on four different starting points.

在这里插入图片描述

According to table1, note that compared with other HN method, the algorithm of GB performs badly. Starting from X_1, it spends more than 500 thousand steps to converge, which is basically deemed divergent. On the rest three points, GB’s pessimistic performance also lags far behind the HN’s. Observing the output data of GB, we find the fact that it is the stopping criteria we set, the norm of gradient of f must be less than epsilon, that leads to divergence or a significant deferment of convergence while the value of objective function f has been infinitely approaching to its global minimum value 0.