最优化基础(2):经典优化方法的基本原理(FUNDAMENTALS APPROACH OF CLASSICAL OPTIMIZATION METHODS)

12 篇文章 7 订阅 ¥69.90 ¥99.00
本文深入探讨了经典优化方法,包括梯度下降法和二分法,阐述了它们的基本原理和优缺点。文章介绍了在寻找函数最小值时如何更新迭代,并讨论了多变量情况下的优化策略。此外,还提到了拉格朗日乘数法和Kuhn-Tucker条件在非线性约束优化问题中的应用,为理解和解决各种优化问题提供了基础。
摘要由CSDN通过智能技术生成

CHAPTER 2: FUNDAMENTALS APPROACH OF CLASSICAL OPTIMIZATION METHODS

1. Single variable problem exercise
    1.1 Newton Raphson iterative method

This is a method to get the approximate solution of a function f ( x ) f(x) f(x) . This a possible way to obtain a solution x b e s t x_{best} xbest of f ′ ( x ) = 0 f'(x) = 0 f(x)=0 .The equation below shows how to update x x x in each iteration.
x k + 1 = x k − f ′ ( x k ) f ′ ′ ( x k ) x_{k+1} = x_k - \frac{f'(x_k)}{f''(x_k)} xk+1=xkf(xk)f(xk)
在这里插入图片描述Advantage:The velocity of convergence of this method is good.
Disadvantage:We may find different result if we start at different initial points.

    1.2 Secant method iterative method

This is a method improved by Newton Raphson iterative method. It’s used to find the minimum of the function .Its exploratory domain is in the interval [ x A , x B ] [x_A,x_B] [xA,xB] .
In this method, the second derivative is not necessary and it’s replaced by the Δ \Delta Δ .
x k + 1 = x k − f ′ ( x k ) Δ w i t h Δ = f ( x B ) − f ( x A ) x B − x A x_{k+1} = x_k - \frac{f'(x_k)}{\Delta} with \Delta=\frac{f(x_B)-f(x_A)}{x_B-x_A} xk+1=xkΔf(xk)withΔ=xBxAf(xB)f(xA)
在这里插入图片描述

function [y_start,y_end,x_reality,n_reality] = arccut(x_start,x_end,tolerance,n_limit)
%%
% solve the root of f' = 0 using secant method
% x_start A
% x_end B
% tolerance
% n_limit number of the max itretion
%%
% y_start f'(A)
% y_end f'(B)
% x_reality
% n_reality final iteration
%%
format long; 
n_reality = 0;
%%
while 1
    x_reality = x_end;
    if(abs(x_end - x_start) <= tolerance)
        fprintf('tolerance %.14f,the root of diff1 %.14f\n,iteration:%d\n',...
        tolerance,x_reality,n_reality);
    break;
    elseif(n_reality > n_limit) 
        disp('sorry,we can not find the value');
        break;
    else
        n_reality = n_reality + 1;
        y_start = diff1(x_start);
        y_end = diff1(x_end);
        fprintf('n_reality=%3.0f, x_end=%12.14f,y_end=%12.14f\n',n_reality,x_end,y_end);
    % secant method
        x_end = x_end - y_end / (y_end - y_start) * (x_end - x_start);
        x_start = x_reality; 
    end
end
end

2. General interpretation of the monodirectional problem.
    2.1 Single variable function

When the function is a single variable function f ( x ) f(x) f(x), it’s easy to obtain the following relation:
x k + 1 = x k + α ∗ d k x_{k+1} = x_{k} + \alpha* d_k xk+1=xk+αdk
d k d_k dkis generally derived from the derivative d k = − g r a d ( f ) d_k = -grad(f) dk=grad(f). α \alpha α can be the second derivative.What happens if our single variable turns into multi-variable.

    2.2 Multivariable function

在这里插入图片描述We use a straight line expression(a fixed point and a direction vector) to change our multi-variances X = [ x 1 , x 2 ] X=[x_1,x_2] X=[x1,x2] to a single variable α \alpha α.
Thus all the points are given by our straight line expression and we could get a new single variable function F ( α ) F(\alpha) F(α) .
F ( α ) = f ( x k + α ∗ d k ) F(\alpha) = f(x_{k} + \alpha* d_k) F(α)=f(xk+αdk)

Remark : the formula of given by the ppt miss a “-”
Remark: d k T d k + 1 = 0 d_k^T d_{k+1} = 0 dkTdk+1=0

3. Iterative approach without derivative

The main idea of this method seems like a game:guess a number from 0 to 100.
在这里插入图片描述Remark:Be careful of the selection of boundary interval.
if f ( x 1 ) < f ( x 2 ) f(x_1)<f(x_2) f(x1)<f(x2),the minimum is in [ x m i n , x 2 ] [x_{min},x_2] [xmin,x2]
if f ( x 1 ) < f ( x 2 ) f(x_1)<f(x_2) f(x1)<f(x2),the minimum is in [ x 1 , x m a x ] [x_1,x_{max}] [x1,xmax]

Here is the wrong code which shows we did not understand this method because we use diff in the code.

function [x,f] = dichotomous(x_start,x_end)
format long; 
n_reality = 0;
eps = 10e-6;
tolerance = 10e-5;
n_limit = 1000;
%%
while 1
    x_reality = x_end;
    if(abs(x_end - x_start) <= tolerance)
        fprintf('tolerance %.14f,the root of diff1 %.14f\n,iteration:%d\n',...
        tolerance,x_reality,n_reality);
        break;
    elseif(n_reality > n_limit) 
        disp('sorry,we can not find the value');
        break;
    %elseif(diff1(x_start)*diff1(x_end)>0)
    %    disp('please choose another x_start and x_end')
    %    break;
    else
        n_reality = n_reality + 1;
        % dichotomous method
        mid = x_start +(x_end-x_start)/2;
        x1 = mid-eps/2;
        x2 = mid+eps/2;
        if objective(x1)<objective(x2)
            x_end = x2;
            L0 = x_end-x_start;
        elseif objective(x1)>objective(x2)
            x_start = x1;
            L0 = x_end-x_start;
        else
            x_start = x1;
            x_end = x2;
        end
        x = (x_start+x_end)/2;
        f = objective(x);
        fprintf('iteration:%d\n£¬x_end= %.14f,y_end %.14f\n',n_reality,x,f);
    end
end
end
4. Unconstrained multivariable optimization problem

This part, generally we should add some new constraint to stop the process, it could be absolute test or relative test.

5. Optimality conditions

This part we get some necessary conditions and sufficient conditions for local optimality, global optimality.

6. Gradient properties
    6.1 

A points x 0 x_0 x0on the contour lines, there is the fundamental property:
d T g r a d ( f ( x ) ) = 0 d^Tgrad(f(x))=0 dTgrad(f(x))=0

It means that the gradient of function f ( x ) f(x) f(x) is orthogonal to the contour line passing at x 0 x_0 x0 .

    6.2 Descente direction

The product d T g r a d ( f ( x ) ) = 0 d^Tgrad(f(x))=0 dTgrad(f(x))=0 represent the projection of the gradient on the direction .

7. Descent methods

The most important step for descent methods is

  • find a direction d k d_k dk
  • find an “optimal” step α \alpha α
  • update x k + 1 = x k + α k d k x_{k+1}=x_k+\alpha_kd_k xk+1=xk+αkdk
function [xbest,fbest]=newtonRaphson(x0)
y = objective(x0);
dy = diff1(x0);
d2y = diff2(x0);
x = x0;
h = 100;
eps = 1.0e-6;
i = 0;
while h>eps
    xnew = x -diff1(x)/diff2(x);
    h = abs(objective(xnew)-objective(x))/abs(objective(x));
    fprintf('n_reality=%3.0f, x=%12.14f,y=%12.14f\n',i,xnew,objective(xnew));
    x = xnew;
    i = i+1;
end
xbest = xnew;
fbest = objective(xbest);
end

   7.1 Steepest descent method

This approach is based on the fact that − g r a d ( f ( x k ) ) -grad(f(x_k)) grad(f(xk)) represent the steepest descent direction at x 0 x_0 x0.

    7.2 Gradient conjugate method

We can only use this method in quadratic function.
f = 1 2 X T A X + B T X + C f = \frac{1}{2}X^TAX+B^TX+C f=21XTAX+BTX+C
where A = ( n , n ) A =(n,n) A=(n,n)symmetric and positive definition, B = n B = n B=ncomponents, C = c o n s t C=const C=const

8. Heuristic methods

This is a method that avoid the explicit calculation of the derivative of the objective function.We will mention it in next Chapter.

9. Analysis of the effect of equality constraints
    9.1 Non-linear equality constraint

The non-linear equality constraint is , when is parallel to about , it’s no longer possible to improve the criterion so is a local minimum.

10. ANALYSIS OF THE CONSTRAINTS EFFECT

After we draw the contour of our objective function and add the constraints,we can have a direct understanding about constraints effect.

  • no effect of on the minimum value
  • linear constraints
  • non-linear constraints
  • local minimum introduced by non-linear constraints

This part is also an analysis of the direction of points in different constraints.

    10.1 Admissible direction

A direction is said to be admissible if all small movements from the current point in this direction lead points in the admissible domain.
在这里插入图片描述

11. Lagrange formulation

This formulation transform the initial problem with n variables and m constraints to a system of equation with (n+m) equations and (m+n) variables, and introduced m new variables.
We still feel uncertain about this Lagrange formulation.
L ( x , λ , μ ) = f ( x ) + ∑ i = 1 i = m μ i h i ( x ) + ∑ i = 1 i = m λ i g i ( x ) L(\mathbf{x}, \lambda, \mu)=f(\mathbf{x})+\sum_{i=1}^{i=m} \mu_{i} h_{i}(\mathbf{x})+\sum_{i=1}^{i=m} \lambda_{i} g_{i}(\mathbf{x}) L(x,λ,μ)=f(x)+i=1i=mμihi(x)+i=1i=mλigi(x)
∇ L ( x , λ , μ ∗ ) = 0 \nabla L\left(x, \lambda, \mu^{*}\right)=0 L(x,λ,μ)=0
expand it,
∇ f ( x ∗ ) + ∑ i = 1 i = m μ i ∇ h i ( x ) + ∑ i = 1 i = m λ i ∇ g i ( x ) = 0 \nabla f\left(x^{*}\right)+\sum_{i=1}^{i=m} \mu_{i} \nabla h_{i}(x)+\sum_{i=1}^{i=m} \lambda_{i} \nabla g_{i}(x)=0 f(x)+i=1i=mμihi(x)+i=1i=mλigi(x)=0
Different parameters set to zero express different types of constraint.l
which means if we want to find a optimal solution,the direction of ∇ f ( x ) \nabla f(x) f(x) and ∇ g ( x ) \nabla g(x) g(x) should be the same or opposite.

12. Kuhn-Tucker condition

This a important judgments based for the non-linear optimization problem that it has a optimal solution or not. Generally, if the optimization problem is satisfy Kuhn-Tucker condition, we could use Lagrange formulation to transform this problem. The figure below is from website.
在这里插入图片描述
Look at our example with inequality constraints,
min{ f ( x ) = ( x 1 − 2 ) 2 + ( x 2 − 2 ) 2 : g 1 ( x ) = − x 1 , g 2 ( x ) = x 2 , g 3 ( x ) = x 1 + x 2 − 1 f(x) = (x_1 -2)^2 + (x_2 -2)^2:g_1(x)=-x_1,g_2(x)=x_2,g_3(x)=x_1+x_2-1 f(x)=(x12)2+(x22)2:g1(x)=x1,g2(x)=x2,g3(x)=x1+x21}
∇ L x 1 = 2 ( x 1 − 2 ) + λ 1 ( − 1 ) + λ 2 ( 0 ) + λ 3 = 0 \nabla L_{x_1} =2\left(x_{1}-2\right)+\lambda_{1}(-1)+\lambda_{2}(0)+\lambda_{3}=0 Lx1=2(x12)+λ1(1)+λ2(0)+λ3=0
∇ L x 1 = 2 ( x 2 − 2 ) + λ 1 ( 0 ) + λ 2 ( − 1 ) + λ 3 = 0 \nabla L_{x_1} =2\left(x_{2}-2\right)+\lambda_{1}(0)+\lambda_{2}(-1)+\lambda_{3}=0 Lx1=2(x22)+λ1(0)+λ2(1)+λ3=0
And what should we do next?(not very understand the last page(PPT))

13.  Summary

This chapter introduced a lot of methods about solve the single variable and multi-variable function, especially the way to judge and find the local minimum and global minimum. Lagrange formulation and Kuhn-Tucker condition is a simple and useful method to judge and compute the optimal solution.

disp('------newtonRaphson method-----');
[x1,f1]=newtonRaphson(0);
disp(['initial value is x0 = 0 xbest =',num2str(x1)]);
[x2,f2]=newtonRaphson(5);
disp(['initial value is x0 = 5 xbest =',num2str(x2)]);

disp('------secant method------------');
% A = 2,B = 10,tolenrance 10e-5,maxIteration1000
 [y_start,y_end,x_reality,n_reality] = arccut(2,10,10e-5,1000);
 
disp('------dichotomous method-------');
%x_sart = 1,x_end = 10
 [x4,f4] = dichotomous(1,10);
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

肥鼠路易

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值