最优化基础(2)：经典优化方法的基本原理(FUNDAMENTALS APPROACH OF CLASSICAL OPTIMIZATION METHODS)

最新推荐文章于 2024-08-08 08:30:00 发布

肥鼠路易

最新推荐文章于 2024-08-08 08:30:00 发布

阅读量687

点赞数

分类专栏：最优化计算Matlab 文章标签： matlab 梯度下降法二分法

本文链接：https://blog.csdn.net/weixin_44991673/article/details/105189864

版权

最优化计算Matlab 专栏收录该内容

12 篇文章 7 订阅 ¥69.90 ¥99.00

订阅专栏

本文深入探讨了经典优化方法，包括梯度下降法和二分法，阐述了它们的基本原理和优缺点。文章介绍了在寻找函数最小值时如何更新迭代，并讨论了多变量情况下的优化策略。此外，还提到了拉格朗日乘数法和Kuhn-Tucker条件在非线性约束优化问题中的应用，为理解和解决各种优化问题提供了基础。

摘要由CSDN通过智能技术生成

CHAPTER 2: FUNDAMENTALS APPROACH OF CLASSICAL OPTIMIZATION METHODS

1. Single variable problem exercise
    1.1 Newton Raphson iterative method

This is a method to get the approximate solution of a function $f (x)$ . This a possible way to obtain a solution $x_{best}$ of $f^{'} (x) = 0$ .The equation below shows how to update $x$ in each iteration.
$x_{k+1} = x_k - \frac{f'(x_k)}{f''(x_k)}$
在这里插入图片描述 Advantage:The velocity of convergence of this method is good.
Disadvantage:We may find different result if we start at different initial points.

    1.2 Secant method iterative method

This is a method improved by Newton Raphson iterative method. It’s used to find the minimum of the function .Its exploratory domain is in the interval $x_A,x_B]$ .
In this method, the second derivative is not necessary and it’s replaced by the $\Delta$ .
$x_{k+1} = x_k - \frac{f'(x_k)}{\Delta} with \Delta=\frac{f(x_B)-f(x_A)}{x_B-x_A}$
在这里插入图片描述

function [y_start,y_end,x_reality,n_reality] = arccut(x_start,x_end,tolerance,n_limit)
%%
% solve the root of f' = 0 using secant method
% x_start A
% x_end B
% tolerance
% n_limit number of the max itretion
%%
% y_start f'(A)
% y_end f'(B)
% x_reality
% n_reality final iteration
%%
format long; 
n_reality = 0;
%%
while 1
    x_reality = x_end;
    if(abs(x_end - x_start) <= tolerance)
        fprintf('tolerance %.14f,the root of diff1 %.14f\n,iteration:%d\n',...
        tolerance,x_reality,n_reality);
    break;
    elseif(n_reality > n_limit) 
        disp('sorry,we can not find the value');
        break;
    else
        n_reality = n_reality + 1;
        y_start = diff1(x_start);
        y_end = diff1(x_end);
        fprintf('n_reality=%3.0f, x_end=%12.14f,y_end=%12.14f\n',n_reality,x_end,y_end);
    % secant method
        x_end = x_end - y_end / (y_end - y_start) * (x_end - x_start);
        x_start = x_reality; 
    end
end
end

2. General interpretation of the monodirectional problem.
    2.1 Single variable function

When the function is a single variable function $f (x)$ , it’s easy to obtain the following relation:
$x_{k+1} = x_{k} + \alpha* d_k$
$d_k$ is generally derived from the derivative $d_k = -grad(f)$ . $\alpha$ can be the second derivative.What happens if our single variable turns into multi-variable.

    2.2 Multivariable function

在这里插入图片描述 We use a straight line expression(a fixed point and a direction vector) to change our multi-variances $X=[x_1,x_2]$ to a single variable $\alpha$ .
Thus all the points are given by our straight line expression and we could get a new single variable function $F(\alpha)$ .
$F(\alpha) = f(x_{k} + \alpha* d_k)$

Remark : the formula of given by the ppt miss a “-”
Remark: $d_k^T d_{k+1} = 0$

3. Iterative approach without derivative

The main idea of this method seems like a game:guess a number from 0 to 100.
在这里插入图片描述 Remark:Be careful of the selection of boundary interval.
if $f(x_1)<f(x_2)$ ,the minimum is in $x_{min},x_2]$
if $f(x_1)<f(x_2)$ ,the minimum is in $x_1,x_{max}]$

Here is the wrong code which shows we did not understand this method because we use diff in the code.

function [x,f] = dichotomous(x_start,x_end)
format long; 
n_reality = 0;
eps = 10e-6;
tolerance = 10e-5;
n_limit = 1000;
%%
while 1
    x_reality = x_end;
    if(abs(x_end - x_start) <= tolerance)
        fprintf('tolerance %.14f,the root of diff1 %.14f\n,iteration:%d\n',...
        tolerance,x_reality,n_reality);
        break;
    elseif(n_reality > n_limit) 
        disp('sorry,we can not find the value');
        break;
    %elseif(diff1(x_start)*diff1(x_end)>0)
    %    disp('please choose another x_start and x_end')
    %    break;
    else
        n_reality = n_reality + 1;
        % dichotomous method
        mid = x_start +(x_end-x_start)/2;
        x1 = mid-eps/2;
        x2 = mid+eps/2;
        if objective(x1)<objective(x2)
            x_end = x2;
            L0 = x_end-x_start;
        elseif objective(x1)>objective(x2)
            x_start = x1;
            L0 = x_end-x_start;
        else
            x_start = x1;
            x_end = x2;
        end
        x = (x_start+x_end)/2;
        f = objective(x);
        fprintf('iteration:%d\n£¬x_end= %.14f,y_end %.14f\n',n_reality,x,f);
    end
end
end

4. Unconstrained multivariable optimization problem

This part, generally we should add some new constraint to stop the process, it could be absolute test or relative test.

5. Optimality conditions

This part we get some necessary conditions and sufficient conditions for local optimality, global optimality.

6. Gradient properties
    6.1

A points $x_0$ on the contour lines, there is the fundamental property:
$d^Tgrad(f(x))=0$

It means that the gradient of function $f (x)$ is orthogonal to the contour line passing at $x_0$ .

    6.2 Descente direction

The product $d^Tgrad(f(x))=0$ represent the projection of the gradient on the direction .

7. Descent methods

The most important step for descent methods is

find a direction $d_k$
find an “optimal” step $\alpha$
update $x_{k+1}=x_k+\alpha_kd_k$

function [xbest,fbest]=newtonRaphson(x0)
y = objective(x0);
dy = diff1(x0);
d2y = diff2(x0);
x = x0;
h = 100;
eps = 1.0e-6;
i = 0;
while h>eps
    xnew = x -diff1(x)/diff2(x);
    h = abs(objective(xnew)-objective(x))/abs(objective(x));
    fprintf('n_reality=%3.0f, x=%12.14f,y=%12.14f\n',i,xnew,objective(xnew));
    x = xnew;
    i = i+1;
end
xbest = xnew;
fbest = objective(xbest);
end

   7.1 Steepest descent method

This approach is based on the fact that $grad(f(x_k))$ represent the steepest descent direction at $x_0$ .

    7.2 Gradient conjugate method

We can only use this method in quadratic function.
$\frac{1}{2}X^TAX+B^TX+C$
where $A = (n, n)$ symmetric and positive definition, $B = n$ components, $C = c o n s t$

8. Heuristic methods

This is a method that avoid the explicit calculation of the derivative of the objective function.We will mention it in next Chapter.

9. Analysis of the effect of equality constraints
    9.1 Non-linear equality constraint

The non-linear equality constraint is , when is parallel to about , it’s no longer possible to improve the criterion so is a local minimum.

10. ANALYSIS OF THE CONSTRAINTS EFFECT

After we draw the contour of our objective function and add the constraints,we can have a direct understanding about constraints effect.

no effect of on the minimum value
linear constraints
non-linear constraints
local minimum introduced by non-linear constraints

This part is also an analysis of the direction of points in different constraints.

    10.1 Admissible direction

A direction is said to be admissible if all small movements from the current point in this direction lead points in the admissible domain.
在这里插入图片描述

11. Lagrange formulation

This formulation transform the initial problem with n variables and m constraints to a system of equation with (n+m) equations and (m+n) variables, and introduced m new variables.
We still feel uncertain about this Lagrange formulation.
$L(\mathbf{x}, \lambda, \mu)=f(\mathbf{x})+\sum_{i=1}^{i=m} \mu_{i} h_{i}(\mathbf{x})+\sum_{i=1}^{i=m} \lambda_{i} g_{i}(\mathbf{x})$
$\nabla L\left(x, \lambda, \mu^{*}\right)=0$
expand it,
$\nabla f\left(x^{*}\right)+\sum_{i=1}^{i=m} \mu_{i} \nabla h_{i}(x)+\sum_{i=1}^{i=m} \lambda_{i} \nabla g_{i}(x)=0$
Different parameters set to zero express different types of constraint.l
which means if we want to find a optimal solution,the direction of $\nabla f(x)$ and $\nabla g(x)$ should be the same or opposite.

12. Kuhn-Tucker condition

This a important judgments based for the non-linear optimization problem that it has a optimal solution or not. Generally, if the optimization problem is satisfy Kuhn-Tucker condition, we could use Lagrange formulation to transform this problem. The figure below is from website.
在这里插入图片描述
Look at our example with inequality constraints,
min{ $f(x) = (x_1 -2)^2 + (x_2 -2)^2:g_1(x)=-x_1,g_2(x)=x_2,g_3(x)=x_1+x_2-1$ }
$\nabla L_{x_1} =2\left(x_{1}-2\right)+\lambda_{1}(-1)+\lambda_{2}(0)+\lambda_{3}=0$
$\nabla L_{x_1} =2\left(x_{2}-2\right)+\lambda_{1}(0)+\lambda_{2}(-1)+\lambda_{3}=0$
And what should we do next?(not very understand the last page(PPT))

13.  Summary

This chapter introduced a lot of methods about solve the single variable and multi-variable function, especially the way to judge and find the local minimum and global minimum. Lagrange formulation and Kuhn-Tucker condition is a simple and useful method to judge and compute the optimal solution.

disp('------newtonRaphson method-----');
[x1,f1]=newtonRaphson(0);
disp(['initial value is x0 = 0 xbest =',num2str(x1)]);
[x2,f2]=newtonRaphson(5);
disp(['initial value is x0 = 5 xbest =',num2str(x2)]);

disp('------secant method------------');
% A = 2,B = 10,tolenrance 10e-5,maxIteration1000
 [y_start,y_end,x_reality,n_reality] = arccut(2,10,10e-5,1000);
 
disp('------dichotomous method-------');
%x_sart = 1,x_end = 10
 [x4,f4] = dichotomous(1,10);