CHAPTER 2: FUNDAMENTALS APPROACH OF CLASSICAL OPTIMIZATION METHODS
1. Single variable problem exercise
1.1 Newton Raphson iterative method
This is a method to get the approximate solution of a function
f
(
x
)
f(x)
f(x) . This a possible way to obtain a solution
x
b
e
s
t
x_{best}
xbest of
f
′
(
x
)
=
0
f'(x) = 0
f′(x)=0 .The equation below shows how to update
x
x
x in each iteration.
x
k
+
1
=
x
k
−
f
′
(
x
k
)
f
′
′
(
x
k
)
x_{k+1} = x_k - \frac{f'(x_k)}{f''(x_k)}
xk+1=xk−f′′(xk)f′(xk)
Advantage:The velocity of convergence of this method is good.
Disadvantage:We may find different result if we start at different initial points.
1.2 Secant method iterative method
This is a method improved by Newton Raphson iterative method. It’s used to find the minimum of the function .Its exploratory domain is in the interval
[
x
A
,
x
B
]
[x_A,x_B]
[xA,xB] .
In this method, the second derivative is not necessary and it’s replaced by the
Δ
\Delta
Δ .
x
k
+
1
=
x
k
−
f
′
(
x
k
)
Δ
w
i
t
h
Δ
=
f
(
x
B
)
−
f
(
x
A
)
x
B
−
x
A
x_{k+1} = x_k - \frac{f'(x_k)}{\Delta} with \Delta=\frac{f(x_B)-f(x_A)}{x_B-x_A}
xk+1=xk−Δf′(xk)withΔ=xB−xAf(xB)−f(xA)
function [y_start,y_end,x_reality,n_reality] = arccut(x_start,x_end,tolerance,n_limit)
%%
% solve the root of f' = 0 using secant method
% x_start A
% x_end B
% tolerance
% n_limit number of the max itretion
%%
% y_start f'(A)
% y_end f'(B)
% x_reality
% n_reality final iteration
%%
format long;
n_reality = 0;
%%
while 1
x_reality = x_end;
if(abs(x_end - x_start) <= tolerance)
fprintf('tolerance %.14f,the root of diff1 %.14f\n,iteration:%d\n',...
tolerance,x_reality,n_reality);
break;
elseif(n_reality > n_limit)
disp('sorry,we can not find the value');
break;
else
n_reality = n_reality + 1;
y_start = diff1(x_start);
y_end = diff1(x_end);
fprintf('n_reality=%3.0f, x_end=%12.14f,y_end=%12.14f\n',n_reality,x_end,y_end);
% secant method
x_end = x_end - y_end / (y_end - y_start) * (x_end - x_start);
x_start = x_reality;
end
end
end
2. General interpretation of the monodirectional problem.
2.1 Single variable function
When the function is a single variable function
f
(
x
)
f(x)
f(x), it’s easy to obtain the following relation:
x
k
+
1
=
x
k
+
α
∗
d
k
x_{k+1} = x_{k} + \alpha* d_k
xk+1=xk+α∗dk
d
k
d_k
dkis generally derived from the derivative
d
k
=
−
g
r
a
d
(
f
)
d_k = -grad(f)
dk=−grad(f).
α
\alpha
α can be the second derivative.What happens if our single variable turns into multi-variable.
2.2 Multivariable function
We use a straight line expression(a fixed point and a direction vector) to change our multi-variances
X
=
[
x
1
,
x
2
]
X=[x_1,x_2]
X=[x1,x2] to a single variable
α
\alpha
α.
Thus all the points are given by our straight line expression and we could get a new single variable function
F
(
α
)
F(\alpha)
F(α) .
F
(
α
)
=
f
(
x
k
+
α
∗
d
k
)
F(\alpha) = f(x_{k} + \alpha* d_k)
F(α)=f(xk+α∗dk)
Remark : the formula of given by the ppt miss a “-”
Remark:
d
k
T
d
k
+
1
=
0
d_k^T d_{k+1} = 0
dkTdk+1=0
3. Iterative approach without derivative
The main idea of this method seems like a game:guess a number from 0 to 100.
Remark:Be careful of the selection of boundary interval.
if
f
(
x
1
)
<
f
(
x
2
)
f(x_1)<f(x_2)
f(x1)<f(x2),the minimum is in
[
x
m
i
n
,
x
2
]
[x_{min},x_2]
[xmin,x2]
if
f
(
x
1
)
<
f
(
x
2
)
f(x_1)<f(x_2)
f(x1)<f(x2),the minimum is in
[
x
1
,
x
m
a
x
]
[x_1,x_{max}]
[x1,xmax]
Here is the wrong code which shows we did not understand this method because we use diff in the code.
function [x,f] = dichotomous(x_start,x_end)
format long;
n_reality = 0;
eps = 10e-6;
tolerance = 10e-5;
n_limit = 1000;
%%
while 1
x_reality = x_end;
if(abs(x_end - x_start) <= tolerance)
fprintf('tolerance %.14f,the root of diff1 %.14f\n,iteration:%d\n',...
tolerance,x_reality,n_reality);
break;
elseif(n_reality > n_limit)
disp('sorry,we can not find the value');
break;
%elseif(diff1(x_start)*diff1(x_end)>0)
% disp('please choose another x_start and x_end')
% break;
else
n_reality = n_reality + 1;
% dichotomous method
mid = x_start +(x_end-x_start)/2;
x1 = mid-eps/2;
x2 = mid+eps/2;
if objective(x1)<objective(x2)
x_end = x2;
L0 = x_end-x_start;
elseif objective(x1)>objective(x2)
x_start = x1;
L0 = x_end-x_start;
else
x_start = x1;
x_end = x2;
end
x = (x_start+x_end)/2;
f = objective(x);
fprintf('iteration:%d\n£¬x_end= %.14f,y_end %.14f\n',n_reality,x,f);
end
end
end
4. Unconstrained multivariable optimization problem
This part, generally we should add some new constraint to stop the process, it could be absolute test or relative test.
5. Optimality conditions
This part we get some necessary conditions and sufficient conditions for local optimality, global optimality.
6. Gradient properties
6.1
A points
x
0
x_0
x0on the contour lines, there is the fundamental property:
d
T
g
r
a
d
(
f
(
x
)
)
=
0
d^Tgrad(f(x))=0
dTgrad(f(x))=0
It means that the gradient of function f ( x ) f(x) f(x) is orthogonal to the contour line passing at x 0 x_0 x0 .
6.2 Descente direction
The product d T g r a d ( f ( x ) ) = 0 d^Tgrad(f(x))=0 dTgrad(f(x))=0 represent the projection of the gradient on the direction .
7. Descent methods
The most important step for descent methods is
- find a direction d k d_k dk
- find an “optimal” step α \alpha α
- update x k + 1 = x k + α k d k x_{k+1}=x_k+\alpha_kd_k xk+1=xk+αkdk
function [xbest,fbest]=newtonRaphson(x0)
y = objective(x0);
dy = diff1(x0);
d2y = diff2(x0);
x = x0;
h = 100;
eps = 1.0e-6;
i = 0;
while h>eps
xnew = x -diff1(x)/diff2(x);
h = abs(objective(xnew)-objective(x))/abs(objective(x));
fprintf('n_reality=%3.0f, x=%12.14f,y=%12.14f\n',i,xnew,objective(xnew));
x = xnew;
i = i+1;
end
xbest = xnew;
fbest = objective(xbest);
end
7.1 Steepest descent method
This approach is based on the fact that − g r a d ( f ( x k ) ) -grad(f(x_k)) −grad(f(xk)) represent the steepest descent direction at x 0 x_0 x0.
7.2 Gradient conjugate method
We can only use this method in quadratic function.
f
=
1
2
X
T
A
X
+
B
T
X
+
C
f = \frac{1}{2}X^TAX+B^TX+C
f=21XTAX+BTX+C
where
A
=
(
n
,
n
)
A =(n,n)
A=(n,n)symmetric and positive definition,
B
=
n
B = n
B=ncomponents,
C
=
c
o
n
s
t
C=const
C=const
8. Heuristic methods
This is a method that avoid the explicit calculation of the derivative of the objective function.We will mention it in next Chapter.
9. Analysis of the effect of equality constraints
9.1 Non-linear equality constraint
The non-linear equality constraint is , when is parallel to about , it’s no longer possible to improve the criterion so is a local minimum.
10. ANALYSIS OF THE CONSTRAINTS EFFECT
After we draw the contour of our objective function and add the constraints,we can have a direct understanding about constraints effect.
- no effect of on the minimum value
- linear constraints
- non-linear constraints
- local minimum introduced by non-linear constraints
This part is also an analysis of the direction of points in different constraints.
10.1 Admissible direction
A direction is said to be admissible if all small movements from the current point in this direction lead points in the admissible domain.
11. Lagrange formulation
This formulation transform the initial problem with n variables and m constraints to a system of equation with (n+m) equations and (m+n) variables, and introduced m new variables.
We still feel uncertain about this Lagrange formulation.
L
(
x
,
λ
,
μ
)
=
f
(
x
)
+
∑
i
=
1
i
=
m
μ
i
h
i
(
x
)
+
∑
i
=
1
i
=
m
λ
i
g
i
(
x
)
L(\mathbf{x}, \lambda, \mu)=f(\mathbf{x})+\sum_{i=1}^{i=m} \mu_{i} h_{i}(\mathbf{x})+\sum_{i=1}^{i=m} \lambda_{i} g_{i}(\mathbf{x})
L(x,λ,μ)=f(x)+i=1∑i=mμihi(x)+i=1∑i=mλigi(x)
∇
L
(
x
,
λ
,
μ
∗
)
=
0
\nabla L\left(x, \lambda, \mu^{*}\right)=0
∇L(x,λ,μ∗)=0
expand it,
∇
f
(
x
∗
)
+
∑
i
=
1
i
=
m
μ
i
∇
h
i
(
x
)
+
∑
i
=
1
i
=
m
λ
i
∇
g
i
(
x
)
=
0
\nabla f\left(x^{*}\right)+\sum_{i=1}^{i=m} \mu_{i} \nabla h_{i}(x)+\sum_{i=1}^{i=m} \lambda_{i} \nabla g_{i}(x)=0
∇f(x∗)+i=1∑i=mμi∇hi(x)+i=1∑i=mλi∇gi(x)=0
Different parameters set to zero express different types of constraint.l
which means if we want to find a optimal solution,the direction of
∇
f
(
x
)
\nabla f(x)
∇f(x) and
∇
g
(
x
)
\nabla g(x)
∇g(x) should be the same or opposite.
12. Kuhn-Tucker condition
This a important judgments based for the non-linear optimization problem that it has a optimal solution or not. Generally, if the optimization problem is satisfy Kuhn-Tucker condition, we could use Lagrange formulation to transform this problem. The figure below is from website.
Look at our example with inequality constraints,
min{
f
(
x
)
=
(
x
1
−
2
)
2
+
(
x
2
−
2
)
2
:
g
1
(
x
)
=
−
x
1
,
g
2
(
x
)
=
x
2
,
g
3
(
x
)
=
x
1
+
x
2
−
1
f(x) = (x_1 -2)^2 + (x_2 -2)^2:g_1(x)=-x_1,g_2(x)=x_2,g_3(x)=x_1+x_2-1
f(x)=(x1−2)2+(x2−2)2:g1(x)=−x1,g2(x)=x2,g3(x)=x1+x2−1}
∇
L
x
1
=
2
(
x
1
−
2
)
+
λ
1
(
−
1
)
+
λ
2
(
0
)
+
λ
3
=
0
\nabla L_{x_1} =2\left(x_{1}-2\right)+\lambda_{1}(-1)+\lambda_{2}(0)+\lambda_{3}=0
∇Lx1=2(x1−2)+λ1(−1)+λ2(0)+λ3=0
∇
L
x
1
=
2
(
x
2
−
2
)
+
λ
1
(
0
)
+
λ
2
(
−
1
)
+
λ
3
=
0
\nabla L_{x_1} =2\left(x_{2}-2\right)+\lambda_{1}(0)+\lambda_{2}(-1)+\lambda_{3}=0
∇Lx1=2(x2−2)+λ1(0)+λ2(−1)+λ3=0
And what should we do next?(not very understand the last page(PPT))
13. Summary
This chapter introduced a lot of methods about solve the single variable and multi-variable function, especially the way to judge and find the local minimum and global minimum. Lagrange formulation and Kuhn-Tucker condition is a simple and useful method to judge and compute the optimal solution.
disp('------newtonRaphson method-----');
[x1,f1]=newtonRaphson(0);
disp(['initial value is x0 = 0 xbest =',num2str(x1)]);
[x2,f2]=newtonRaphson(5);
disp(['initial value is x0 = 5 xbest =',num2str(x2)]);
disp('------secant method------------');
% A = 2,B = 10,tolenrance 10e-5,maxIteration1000
[y_start,y_end,x_reality,n_reality] = arccut(2,10,10e-5,1000);
disp('------dichotomous method-------');
%x_sart = 1,x_end = 10
[x4,f4] = dichotomous(1,10);