关于信赖域方法的完整介绍,请参照博客信赖域方法。
优化问题简介
此处我们简要介绍理论计算:
给定无约束优化问题
min
x
∈
R
n
f
(
x
)
,
\min\limits_{x\in \mathbb{R}^n}f(x),
x∈Rnminf(x),
其中
f
(
x
)
:
R
n
→
R
f(x):\mathbb{R}^n\to \mathbb{R}
f(x):Rn→R 是一个
n
n
n维实值函数。
由Taylor定理知
f
(
x
k
+
p
)
=
f
k
+
g
k
T
p
+
1
2
p
T
f
(
x
k
+
t
p
)
p
,
f(x_k+p)=f_k+g_k^Tp+\frac{1}{2}p^Tf(x_k+tp)p,
f(xk+p)=fk+gkTp+21pTf(xk+tp)p,
其中
f
k
=
f
(
x
k
)
,
g
k
=
∇
f
(
x
k
)
,
t
∈
(
0
,
1
)
.
f_k=f(x_k),g_k=\nabla f(x_k),t \in (0,1).
fk=f(xk),gk=∇f(xk),t∈(0,1).
通过用
B
k
B_k
Bk来逼近上述式子中的二阶项, 我们得到模型函数
m
k
(
p
)
=
f
k
+
g
k
T
p
+
1
2
p
T
B
k
p
.
m_k(p)=f_k+g_k^Tp+\frac{1}{2}p^TB_kp.
mk(p)=fk+gkTp+21pTBkp.
则信赖域方法便是求解下述子问题
min
p
∈
R
n
m
k
(
p
)
=
f
k
+
g
k
T
p
+
1
2
p
T
B
k
p
s
.
t
.
∥
p
∥
≤
Δ
.
\min\limits_{p\in \mathbb{R}^n}\quad m_k(p)=f_k+g_k^Tp+\frac{1}{2}p^TB_kp\quad s.t.\Vert p\Vert \leq \Delta.
p∈Rnminmk(p)=fk+gkTp+21pTBkps.t.∥p∥≤Δ.
此处我们令
B
k
B_k
Bk为精确的Hessian, 采用
D
o
g
l
e
g
Dogleg
Dogleg 方法来求解此问题。
Dogleg方法
对于模型函数的无约束优化问题(忽略下标
k
k
k), 可以得到全局极小点
p
B
=
−
B
−
1
g
,
p_B=-B^{-1}g,
pB=−B−1g,
在
∇
f
(
x
)
\nabla f(x)
∇f(x)(原函数最速下降)方向得到此方向上的极小点
p
U
=
−
g
T
g
g
T
B
g
g
.
p_U=\frac-{g^Tg}{g^TBg}g.
pU=gTg−gTBgg.
若
∥
p
B
∥
≤
Δ
,
\Vert p_B\Vert\leq \Delta,
∥pB∥≤Δ, 取
p
=
p
B
p=p_B
p=pB 便是子问题的解; 若
∥
p
U
∥
≥
Δ
,
\Vert p_U\Vert \geq \Delta,
∥pU∥≥Δ, 取
p
=
Δ
∥
p
U
∥
p
U
p=\frac{\Delta}{\Vert p_U\Vert}p_U
p=∥pU∥ΔpU便是子问题的解; 否则,求解关于
τ
\tau
τ一元二次方程组
∥
p
U
+
(
τ
−
1
)
(
p
B
−
p
U
)
∥
=
Δ
\Vert p_U+(\tau-1)(p_B-p_U)\Vert=\Delta
∥pU+(τ−1)(pB−pU)∥=Δ便是此子问题的解。
算法框架
我们记
s
k
=
a
r
g
min
p
∈
R
n
m
k
(
p
)
=
f
k
+
g
k
T
p
+
1
2
p
T
B
k
p
s
.
t
.
∥
p
∥
≤
Δ
.
s_k =arg\min\limits_{p\in \mathbb{R}^n}\quad m_k(p)=f_k+g_k^Tp+\frac{1}{2}p^TB_kp\quad s.t.\Vert p\Vert \leq \Delta.
sk=argp∈Rnminmk(p)=fk+gkTp+21pTBkps.t.∥p∥≤Δ.
ρ
k
=
f
(
x
k
)
−
f
(
x
k
+
s
k
)
m
k
(
0
)
−
m
k
(
s
k
)
,
\rho_k=\frac{f(x_k)-f(x_k+s_k)}{m_k(0)-m_k(s_k)},
ρk=mk(0)−mk(sk)f(xk)−f(xk+sk),
则信赖域方法的算法框架为
一个简单的实例
我们先考虑一个实例
f(x) =100*(x(2)-x(1)^2)^2+(x(1)-1)^2
容易计算其极小值点为 ( 1 , 1 ) . (1,1). (1,1). 其 MATLAB 代码为
%% Trust Region Method for a simple example
% Using Dogleg method and exact Hessian matrix
function y = TrustRegion1()
% initialization
eps = 1e-8;
x0 = [500;500];
maxiter = 1000;
eta1 = 0.25;
eta2 = 0.75;
delta = 50;
delta_max =100;
eta = 0;
x = x0;
i = 0;
while i < maxiter
g = gradient(x);
if norm(g)<= eps
disp(i);
break;
end
h = hessian(x);
p = subproblem(g,h,delta);
rho =-(f(x)-f(x+p))/(g'*p+0.5*p'*h*p);
if rho < eta1
delta = eta1*delta;
elseif rho>eta2 && abs(norm(p)-delta)<1e-8
delta = min(2*delta,delta_max);
else
% do nothing;
end
if rho>eta
x = x+p;
else
% do nothing;
end
i = i+1;
end
y = x;
end
% solve subproblem m_k(p)=f_k+g_k^Tp+0.5*p^TB_kp, s.t. p <= Delta.
%% Dogleg Method
function p = subproblem(g,h,delta)
p_B = -inv(h)*g;
p_U = -g'*g/(g'*h*g)*g;
if norm(p_B)<= delta
p = p_B;
elseif norm(p_U) >= delta
p = delta/norm(p_U)*p_U;
else
p_BU = p_B-p_U;
tau1 = (-p_U'*p_BU+sqrt((p_U'*p_BU)^2-p_BU'*p_BU*(p_U'*p_U-delta^2)))/(p_BU'*p_BU);
if tau1>=1 && tau1<=2
tau = tau1;
else
tau = (-p_U'*p_BU+sqrt((p_U'*p_BU)^2-p_BU'*p_BU*(p_U'*p_U-delta^2)))/(p_BU'*p_BU);
end
p = p_U+tau*p_BU;
end
end
%objective function
function f = f(x)
f = 100*(x(1)^2 - x(2))^2 + (x(1) - 1)^2;
end
% The gradient of f(x)
function g = gradient(x)
dg_dx1 = 2*(x(1)-1)+400*x(1)*(x(1)^2-x(2));
dg_dx2 = 200*(x(2)-x(1)^2);
g = [dg_dx1;dg_dx2];
end
% The Hessian of f(x)
function h = hessian(x)
h_11 = 1200*x(1)^2-400*x(2)+2;
h_12 = -400*x(1);
h_21 = h_12;
h_22 = 200;
h = [h_11,h_12;h_21,h_22];
end
一般化的MATLAB程序
对上述程序稍加修改, 我们便得到了一个一般化的求解无约束优化问题的程序
function y = TrustRegion3(f,x0,delta,delta_max,maxiter,eps)
% f: function;
%x0: initial value;
%delta: the initial radius of trust region;
%delta_max: maximum redius of trust region;
%maxiter: maximum number of iteration;
% eps: Accuracy (the norm of gradient of f);
if nargin<1
eps = 1e-5;
maxiter =1000;
delta_max =100;
delta = 5;
x0 = [50;50];
f = @(x1,x2)100*(x2-x1^2)^2 + (1-x1)^2;
end
eps = 1e-8;
x0 = [500;500];
maxiter = 1000;
eta1 = 0.25;
eta2 = 0.75;
delta = 50;
delta_max =100;
eta = 0;
x = x0;
grad = mygradient(f);
syms x1 x2;
f_s = f(x1,x2);
hess = hessian(f_s);
i = 0;
while i < maxiter
g = grad(x(1),x(2));
if norm(g)<= eps
disp(i);
break;
end
h = subs(hess,{x1,x2},{x(1),x(2)});
h = double(h);
p = subproblem(g,h,delta);
rho =-(f(x(1),x(2))-f(x(1)+p(1),x(2)+p(2)))/(g'*p+0.5*p'*h*p);
if rho < eta1
delta = eta1*delta;
elseif rho>eta2 && abs(norm(p)-delta)<1e-8
delta = min(2*delta,delta_max);
else
% do nothing;
end
if rho>eta
x = x+p;
else
% do nothing;
end
i = i+1;
end
y = x;
end
%% Dogleg Method
function p = subproblem(g,h,delta)
p_B = -inv(h)*g;
p_U = -g'*g/(g'*h*g)*g;
if norm(p_B)<= delta
p = p_B;
elseif norm(p_U) >= delta
p = delta/norm(p_U)*p_U;
else
p_BU = p_B-p_U;
tau1 = (-p_U'*p_BU+sqrt((p_U'*p_BU)^2-p_BU'*p_BU*(p_U'*p_U-delta^2)))/(p_BU'*p_BU);
if tau1>=1 && tau1<=2
tau = tau1;
else
tau = (-p_U'*p_BU+sqrt((p_U'*p_BU)^2-p_BU'*p_BU*(p_U'*p_U-delta^2)))/(p_BU'*p_BU);
end
p = p_U+tau*p_BU;
end
end
function df_dx = mygradient(f)
syms x1 x2;
dfdx=[diff(f,x1);diff(f,x2)];
df_dx = matlabFunction(dfdx);
end