问题描述
采用Hebden,More-Sorensen(MS)和二维子空间算法求解Wood function, Extended Powell singular function,Trigonometric function。我们需要比较不同信赖域算法的作用以及信赖域算法和线搜索型方法的比较。我们的数值结果需要包括函数的最优值,如果条件允许的话还需要提供最优点,函数调用次数和迭代次数以及程序的运行结果,如果结果不满足条件,那么需要分析出现这种情况的可能原因。终止条件是
∥
f
(
X
k
+
1
)
−
f
(
X
k
)
∥
<
1
0
−
8
\parallel f(X_{k+1}) - f(X_{k}) \parallel < 10^{-8}
∥f(Xk+1)−f(Xk)∥<10−8。
Wood function
{
f
1
(
x
)
=
10
(
x
2
−
x
1
2
)
,
f
2
(
x
)
=
1
−
x
1
,
f
3
(
x
)
=
9
0
1
2
(
x
4
−
x
3
2
)
,
f
4
(
x
)
=
1
−
x
3
,
f
5
(
x
)
=
1
0
1
2
(
x
2
+
x
4
−
2
)
,
f
6
(
x
)
=
1
0
−
1
2
(
x
2
−
x
4
)
.
x
i
n
i
t
=
(
−
3
,
−
1
,
−
3
,
−
1
)
,
f
=
0
,
a
t
(
1
,
1
,
1
,
1
)
\left \{ \begin{array}{r l} f_{1}(x) = 10(x_2 - x_{1}^{2}),\\ f_{2}(x) = 1 - x_{1},\\ f_{3}(x) = 90^{\frac{1}{2}}(x_4 - x_{3}^{2}),\\ f_{4}(x) = 1 - x_3 ,\\ f_{5}(x) = 10^{\frac{1}{2}}(x_2 + x_4 - 2),\\ f_{6}(x) = 10^{-\frac{1}{2}}(x_2 - x_4).\\ \\ x_{init} = (-3,-1,-3,-1),\\ f = 0,\quad at \quad(1,1,1,1) \end{array} \right.
⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧f1(x)=10(x2−x12),f2(x)=1−x1,f3(x)=9021(x4−x32),f4(x)=1−x3,f5(x)=1021(x2+x4−2),f6(x)=10−21(x2−x4).xinit=(−3,−1,−3,−1),f=0,at(1,1,1,1)
Extended Powell singular function
{
n
v
a
r
i
a
b
l
e
,
n
i
s
a
m
u
l
t
i
p
l
e
o
f
4
,
f
4
l
−
3
(
x
)
=
x
4
l
−
3
+
10
x
l
−
2
,
f
4
l
−
2
(
x
)
=
5
1
2
(
x
4
l
−
1
−
x
4
l
)
,
f
4
l
−
1
(
x
)
=
(
x
4
l
−
2
−
2
x
4
l
−
1
)
2
,
f
4
l
(
x
)
=
1
0
1
2
(
x
4
l
−
3
−
x
4
l
)
2
.
x
i
n
i
t
=
(
ξ
J
)
,
ξ
4
J
−
3
=
3
,
ξ
4
J
−
2
=
−
1
,
ξ
4
J
−
1
=
0
,
ξ
4
J
=
1
,
f
=
0
a
t
x
i
n
i
t
.
\left \{ \begin{array}{r l} n \quad variable,n \quad is \quad a \quad multiple \quad of \quad 4,\\ f_{4l-3}(x) = x_{4l-3} + 10x_{l-2},\\ f_{4l-2}(x) = 5^{\frac{1}{2}}(x_{4l-1} - x_{4l}),\\ f_{4l-1}(x) = (x_{4l-2} - 2x_{4l-1})^{2},\\ f_{4l}(x) = 10^{\frac{1}{2}}(x_{4l-3} - x_{4l})^2 .\\ \\ x_{init} = (\xi_{J}),\\ \xi_{4J-3} = 3,\xi_{4J-2} = -1,\xi_{4J-1} = 0,\xi_{4J} = 1,\\ f = 0\quad at \quad x_{init}. \end{array} \right.
⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧nvariable,nisamultipleof4,f4l−3(x)=x4l−3+10xl−2,f4l−2(x)=521(x4l−1−x4l),f4l−1(x)=(x4l−2−2x4l−1)2,f4l(x)=1021(x4l−3−x4l)2.xinit=(ξJ),ξ4J−3=3,ξ4J−2=−1,ξ4J−1=0,ξ4J=1,f=0atxinit.
Trigonometric function
{
n
v
a
r
i
a
b
l
e
,
f
i
(
x
)
=
n
+
i
(
1
−
c
o
s
(
x
i
)
)
−
s
i
n
(
x
i
)
−
∑
j
n
c
o
s
(
x
j
)
,
x
i
n
i
t
=
(
1
n
,
1
n
,
…
,
1
n
)
,
f
=
0.
\left \{ \begin{array}{r l} n \quad variable,\\ f_{i}(x) = n + i(1 - cos(x_i)) - sin(x_i) - \sum_{j}^{n} cos(x_j),\\ x_{init} = (\frac{1}{n},\frac{1}{n},\ldots,\frac{1}{n}),\\ f = 0. \end{array} \right.
⎩⎪⎪⎨⎪⎪⎧nvariable,fi(x)=n+i(1−cos(xi))−sin(xi)−∑jncos(xj),xinit=(n1,n1,…,n1),f=0.
信赖域方法介绍
{ q k ( d ) = f k + g k T d + 1 2 d T G k d , γ k = f ( x k + d ) − f ( x k ) q ( d ) − q ( 0 ) . \left \{ \begin{array}{r l} q_{k}(d) = f_{k} + g_{k}^{T}d + \frac{1}{2}d^{T}G_{k}d,\\ \gamma_{k} = \frac{f(x_k + d) - f(x_k)}{q(d)-q(0)}. \end{array} \right. {qk(d)=fk+gkTd+21dTGkd,γk=q(d)−q(0)f(xk+d)−f(xk).
1:给定初始点
X
0
,
δ
0
>
0
X_0,\delta_0 > 0
X0,δ0>0,k = 0;
2:是否满足条件梯度为0(当k=0时),是否有
∥
f
(
X
k
+
1
)
−
f
(
X
k
)
∥
<
1
0
−
8
\parallel f(X_{k+1}) - f(X_{k}) \parallel < 10^{-8}
∥f(Xk+1)−f(Xk)∥<10−8。(当
k
>
0
k > 0
k>0);
3:求解子问题,给出下一个迭代点的方向
d
k
d_k
dk;
4:计算
γ
k
\gamma_k
γk,若
γ
k
≤
0.25
,
δ
k
+
1
=
δ
k
4
\gamma_k \leq 0.25,\delta_{k+1} = \frac{\delta_k}{4}
γk≤0.25,δk+1=4δk;若
γ
k
≥
0.25
\gamma_k \geq 0.25
γk≥0.25并且
∥
d
k
∥
=
δ
k
\parallel d_k \parallel = \delta_k
∥dk∥=δk,则
δ
k
+
1
=
2
δ
k
\delta_{k+1} = 2\delta_k
δk+1=2δk;否则
δ
k
+
1
=
δ
k
\delta_{k+1} = \delta_k
δk+1=δk;
5:若
γ
k
≤
0
\gamma_k \leq 0
γk≤0,则
x
k
+
1
=
x
k
x_{k+1}=x_k
xk+1=xk,否则
x
k
+
1
=
x
k
+
d
k
x_{k+1}=x_k + d_k
xk+1=xk+dk,
k
=
k
+
1
k = k+1
k=k+1,转步2。
以上步骤中的核心就是下降方向的寻找,基于此我们引入Hebden,More-Sorensen和二维子空间算法。
## Hebden
考虑方程
(
G
k
+
ν
I
)
d
=
g
k
(G_k + \nu I)d = g_{k}
(Gk+νI)d=gk,我们知道
d
=
d
(
ν
)
d=d(\nu)
d=d(ν),下面简要描述Hebden算法的过程:\par
首先,如果
∥
d
(
0
)
∥
≤
δ
k
\parallel d(0) \parallel \leq \delta_{k}
∥d(0)∥≤δk,则
d
k
=
d
(
0
)
d_k = d(0)
dk=d(0),否则的话,我们需要寻找一个
ν
\nu
ν,使得
∥
d
(
ν
)
∥
=
δ
k
\parallel d(\nu) \parallel = \delta_k
∥d(ν)∥=δk,根据这个思想我们定义一个函数:
φ
(
ν
)
=
∥
d
(
ν
)
∥
−
δ
k
\varphi (\nu) = \parallel d(\nu) \parallel - \delta_k
φ(ν)=∥d(ν)∥−δk,也就是说我们需要求出
φ
(
ν
)
\varphi(\nu)
φ(ν)的零点。为了节约篇幅我们直接给出零点的迭代公式,第j步迭代的公式为:
ν
j
+
1
=
ν
j
−
(
φ
(
ν
j
)
+
δ
k
)
∗
φ
(
ν
j
)
φ
′
(
ν
j
)
δ
k
\nu_{j+1}=\nu_j - \frac{(\varphi(\nu_j) + \delta_k)*\varphi(\nu_j)}{\varphi^{'}(\nu_j)\delta_k}
νj+1=νj−φ′(νj)δk(φ(νj)+δk)∗φ(νj)
求出
ν
\nu
ν以后,代入方程\把对应的方向求解出来即可。
More-Sorensen
这里修改函数
φ
\varphi
φ的定义为
φ
(
ν
)
=
1
δ
k
−
1
∥
d
(
ν
)
∥
\varphi(\nu)=\frac{1}{\delta_k} - \frac{1}{\parallel d(\nu) \parallel}
φ(ν)=δk1−∥d(ν)∥1,,第j步迭代的公式为:
ν
j
+
1
=
ν
j
−
φ
(
ν
)
φ
′
(
ν
)
\nu_{j+1}=\nu_j - \frac{\varphi(\nu)}{\varphi^{'}(\nu)}
νj+1=νj−φ′(ν)φ(ν)
二维子空间极小化方法
当
G
k
G_k
Gk正定的时候,求解
min
d
q
(
d
)
s.t.
∥
d
∥
≤
δ
,
p
∈
span
[
g
,
G
−
1
g
]
\min _{d} q(d) \quad \text { s.t. }\|d\| \leq \delta, p \in \operatorname{span}\left[g, G^{-1} g\right]
mindq(d) s.t. ∥d∥≤δ,p∈span[g,G−1g]
当
G
k
G_k
Gk不正定的时候,
min
d
q
(
d
)
s.t.
∥
d
∥
≤
δ
,
p
∈
span
[
g
,
(
G
+
ν
I
)
−
1
g
]
\min _{d} q(d) \quad \text { s.t. }\|d\| \leq \delta, p \in \operatorname{span}\left[g, (G + \nu I)^{-1} g\right]
mindq(d) s.t. ∥d∥≤δ,p∈span[g,(G+νI)−1g]
其中
ν
∈
(
−
λ
n
,
−
2
λ
n
)
\nu \in (-\lambda_n,-2\lambda_n)
ν∈(−λn,−2λn),
λ
n
\lambda_n
λn是
G
k
G_{k}
Gk最小的特征值。
数值实验
Extended Powell singular function
Trigonometric function
Wood function
关于这个函数,本人发现如果选取初始点
X
0
=
(
−
3
,
−
1
,
−
3
,
−
1
)
T
X_0 = (-3,-1,-3,-1)^T
X0=(−3,−1,−3,−1)T,上面使用的信赖域方法全都不可用,针对这个问题,本人尝试换一个初始点,采用np.random.randn函数生成一个随机点来开始迭代,这个时候会得到正确的解,下面这个表格展示的就是针对随机初始点的结果。另外就是关于线搜索方法求解,即使使用随机初始点,线搜索仍然没有收敛到精确解,反而在几步之内就停止迭代,经过本人调试发现线搜索的迭代过程中没有找到正确的步长,导致后面迭代基本不动,这里说明线搜索过程还需要进一步完善。如果修改代码使得线搜索能够找到合适的步长就变成下一个需要解决的问题。
四种方法得到最后的收敛解分别是:
(
0.999997150.999994281.000002851.00000572
)
T
(0.99999715 0.99999428 1.00000285 1.00000572)^T
(0.999997150.999994281.000002851.00000572)T,
(
0.999997150.999994281.000002851.00000572
)
T
(0.99999715 0.99999428 1.00000285 1.00000572)^T
(0.999997150.999994281.000002851.00000572)T,
(
0.999997150.999994281.000002851.00000572
)
T
(0.99999715 0.99999428 1.00000285 1.00000572)^T
(0.999997150.999994281.000002851.00000572)T,
(
0.466528390.039486521.394356131.64631839
)
T
(0.46652839 0.03948652 1.39435613 1.64631839)^T
(0.466528390.039486521.394356131.64631839)T。
代码
只放EPS函数的代码
import numpy as np
import time
def FF(X):
n = X.shape[1]
x0 = X[:,0:n:4];x1 = X[:,1:n:4]
x2 = X[:,2:n:4];x3 = X[:,3:n:4]
return ((x0 + 10*x1)**2 + 5*(x2 - x3)**2 + (x1 - 2*x2)**4 + 10*(x0 - x3)**4).sum()
def GG(X):
n = X.shape[1]
x0 = X[:,0:n:4];x1 = X[:,1:n:4]
x2 = X[:,2:n:4];x3 = X[:,3:n:4]
vec = np.zeros_like(X)
vec[:,0:n:4] = 2*(x0 + 10*x1) + 40*(x0 - x3)**3
vec[:,1:n:4] = 20*(x0 + 10*x1) + 4*(x1 - 2*x2)**3
vec[:,2:n:4] = 10*(x2 - x3) - 8*(x1 - 2*x2)**3
vec[:,3:n:4] = -10*(x2 - x3) - 40*(x0 - x3)**3
return vec
def HH(X):
n = X.shape[1]
x0 = X[:,0:n:4];x1 = X[:,1:n:4]
x2 = X[:,2:n:4];x3 = X[:,3:n:4]
diag = np.zeros(n)
diag[0:n:4] = 2 + 120*(x0 - x3)**2;diag[1:n:4] = 200 + 12*(x1 - 2*x2)**2
diag[2:n:4] = 10 + 48*(x1 - 2*x2)**2;diag[3:n:4] = 10 + 120*(x0 - x3)**2
xie1 = np.zeros(n - 1);xie3 = np.zeros(n - 3)
xie1[0:n - 1:4] = 20;xie1[1:n - 1:4] = -24*(x1 - 2*x2)**2;xie1[2:n - 1:4] = - 10
xie3[0:n - 3:4] = -120*(x0 - x3)**2
return np.diag(diag,0) + np.diag(xie1,1) + np.diag(xie1,-1) + np.diag(xie3,3) + np.diag(xie3,-3)
def gold(FF,GG,d,X):
rho = 0.3
am = 0;aM = 1;al = 0.5*(am + aM)
for i in range(100):
rig1 = FF(X) + rho*al*GG(X)@d.T
rig2 = FF(X) + (1 - rho)*al*GG(X)@d.T
lef = FF(X + al*d)
if lef <= rig1:
if lef >= rig2:
break
else:
am = al
al = am + 0.618*(aM - am)
else:
aM = al
al = am + 0.618*(aM - am)
al_k = al
#print(i)
return al_k,2*(i + 1)
def QQ(X,d):#d:[n,1]
return GG(X)@d + 0.5*d.T@HH(X)@d
def CG(A,b):#只能求解对称正定矩阵
x = np.zeros_like(b)
eps = 1e-5
k = 0
r = b - A@x;rho = r.T@r
p = np.zeros_like(r)
while (np.sqrt(rho) > eps*np.linalg.norm(b)) and (k < 20*b.shape[0]):
k = k + 1
if k == 1:
p = r
else:
beta = rho/rho_h;p = r + beta*p
w = A@p;alpha = rho/(p.T@w);x = x + alpha*p
r = r - alpha*w;rho_h = rho;rho = r.T@r
#print(k)
return x
def nubest(delta,HH,GG,X,N):#针对Hebden方法求解nu
n = X.shape[1]
d = CG(HH(X),-GG(X).T)
k = 0
nu = 0
while (np.linalg.norm(d) > delta) and (k < N):
k = k + 1
d = CG(HH(X) + nu*np.eye(n,n),-GG(X).T)#[n,1]
d_ = CG(HH(X) + nu*np.eye(n,n),- d)#[n,1]
nu = nu - (np.linalg.norm(d) - delta)*np.linalg.norm(d)**2/(delta*(d*d_).sum())
return nu
def nuMS(delta,HH,GG,X,N):#针对MS方法求解nu
n = X.shape[1]
d = CG(HH(X),-GG(X).T)
k = 0
nu = 0
while (np.linalg.norm(d) > delta) and (k < N):
k = k + 1
d = CG(HH(X) + nu*np.eye(n,n),-GG(X).T)#[n,1]
d_ = CG(HH(X) + nu*np.eye(n,n),- d)#[n,1]
nu = nu + (delta - np.linalg.norm(d))*np.linalg.norm(d)**2/(delta*(d*d_).sum())
return nu
def solution(FF,GG,HH,X,N,mode):
tic = time.time()
X_new = X.copy()
n = X.shape[1]
delta = 10
fun_iter = 0
if mode == 'Hebden':
for k in range(N):
nu = nubest(delta,HH,GG,X,2*n)
d = CG(HH(X) + nu*np.eye(n,n),-GG(X).T)
gama = (FF(X + d.T) - FF(X)).sum()/(QQ(X,d).sum())
fun_iter += 2
#print(QQ(X,d))
if gama <= 0.25:
delta = 0.25*delta
else:
if gama >= 0.75 and abs(np.linalg.norm(d) - delta) < 1e-5:
delta = 2*delta
else:
delta = 1.0*delta
if gama > 0:
X_new = X + d.T
else:
X_new = X
#print('gama = %.4f,delta=%.4e,d=%.4e,fenzi = %.4f'%(gama,delta,np.linalg.norm(d),FF(X) - FF(X + d.T)))
if gama > 0 and abs(FF(X_new) - FF(X)) < 1e-8:
break
else:
X = X_new
if k%20 == 0:
print('the iteration:%d,gama = %.4f,fenzi = %.4f'%(k,gama,FF(X) - FF(X + d.T)))
ela = time.time() - tic
print('the end iteration:%d,the grad_2 = %.5e,the time:%.3f,number:%d'
%(k + 1,np.linalg.norm(GG(X_new)),ela,fun_iter))
if mode == 'erwei':
for k in range(N):
if min(np.linalg.eig(HH(X))[0]) > 0:
nu = 0
else:
nu = nubest(delta,HH,GG,X,2*n)
if np.linalg.norm(CG(HH(X) + nu*np.eye(n,n),-GG(X).T)) <= delta:
d = CG(HH(X) + nu*np.eye(n,n),-GG(X).T)
else:
d_old = -GG(X).T*GG(X)@GG(X).T/(GG(X)@(HH(X) + nu*np.eye(n,n))@GG(X).T)
x = d_old;y = CG(HH(X) + nu*np.eye(n,n),-GG(X).T) - d_old
if np.linalg.norm(d_old) >= delta:
d = d_old*delta/np.linalg.norm(d_old)
else:
beta = (- x.T@y + np.sqrt(delta*y.T@y))/(y.T@y)
d = x + beta*y
gama = (FF(X + d.T) - FF(X)).sum()/(QQ(X,d).sum())
fun_iter += 2
#print(QQ(X,d))
if gama <= 0.25:
delta = 0.25*delta
else:
if gama >= 0.75 and abs(np.linalg.norm(d) - delta) < 1e-5:
delta = 2*delta
else:
delta = 1.0*delta
if gama > 0:
X_new = X + d.T
else:
X_new = X
#print('gama = %.4f,delta=%.4e,d=%.4e,fenzi = %.4f'%(gama,delta,np.linalg.norm(d),FF(X) - FF(X + d.T)))
if gama > 0 and abs(FF(X_new) - FF(X)) < 1e-8:
break
else:
X = X_new
if k%20 == 0:
print('the iteration:%d,gama = %.4f,fenzi = %.4f'%(k,gama,FF(X) - FF(X + d.T)))
ela = time.time() - tic
print('the end iteration:%d,the grad_2 = %.5e,the time:%.3f,number:%d'
%(k + 1,np.linalg.norm(GG(X_new)),ela,fun_iter))
if mode == 'MS':
for k in range(N):
nu = nuMS(delta,HH,GG,X,2*n)
d = CG(HH(X) + nu*np.eye(n,n),-GG(X).T)
gama = (FF(X + d.T) - FF(X)).sum()/(QQ(X,d).sum())
fun_iter += 2
#print(QQ(X,d))
if gama <= 0.25:
delta = 0.25*delta
else:
if gama >= 0.75 and abs(np.linalg.norm(d) - delta) < 1e-5:
delta = 2*delta
else:
delta = 1.0*delta
if gama > 0:
X_new = X + d.T
else:
X_new = X
#print('gama = %.4f,delta=%.4e,d=%.4e,fenzi = %.4f'%(gama,delta,np.linalg.norm(d),FF(X) - FF(X + d.T)))
if gama > 0 and abs(FF(X_new) - FF(X)) < 1e-8:
break
else:
X = X_new
if k%20 == 0:
print('the iteration:%d,gama = %.4f,fenzi = %.4f'%(k,gama,FF(X) - FF(X + d.T)))
ela = time.time() - tic
print('the end iteration:%d,the grad_2 = %.5e,the time:%.3f,number:%d'
%(k + 1,np.linalg.norm(GG(X_new)),ela,fun_iter))
if mode == 'newton':
for k in range(N):
d = CG(HH(X),-GG(X).T).T
al,ite = gold(FF,GG,d,X)
X_new = X + al*d
fun_iter += 2
fun_iter += ite
if abs(FF(X_new) - FF(X)) < 1e-8:
break
else:
X = X_new
if k%20 == 0:
print('the iteration:%d'%(k))
ela = time.time() - tic
print('the end iteration:%d,the grad_2 = %.5e,the time:%.3f,number:%d'
%(k + 1,np.linalg.norm(GG(X_new)),ela,fun_iter))
return X_new
n = 100
X = np.zeros([1,n])
for i in range(int(n/4)):
X[:,4*i] = 3.0
X[:,4*i + 1] = -1
X[:,4*i + 3] = 1
N = 300
MODE = ['Hebden','erwei','MS','newton']
for mode in MODE:
X_new = solution(FF,GG,HH,X,N,mode)
print('the mode=%s,the value = %.5e'%(mode,FF(X_new)))
print('--------------------')