(抄了个题,只做了以前上课写的题目,哪天有空了做一下)
4.1. Let
f
∈
C
L
1
,
1
(
R
n
)
f\in C_{L}^{1,1}\left(\mathbb{R}^n\right)
f∈CL1,1(Rn) and let
{
x
k
}
k
≥
0
\left\{\mathbf{x}_{k}\right\}_{k\ge 0}
{xk}k≥0 be the sequence generated by the gradient method with a constant stepsize
t
k
=
1
L
t_k=\frac{1}{L}
tk=L1.Assume that
x
k
→
x
∗
\mathbf{x}_{k}\to \mathbf{x}^*
xk→x∗.Show that if
∇
f
(
x
k
)
≠
0
\nabla f\left(\mathbf{x}_{k}\right)\neq 0
∇f(xk)=0 for all
k
≥
0
k\ge 0
k≥0,then
x
∗
\mathbf{x}^*
x∗ is not a local maximum point.
4.2. [9, Exercise 1.3.3] Consider the minimization problem
min
{
x
T
Q
x
:
x
∈
R
2
}
\min \left\{\mathbf{x}^T\mathbf{Qx}:\mathbf{x}\in\mathbb{R}^2\right\}
min{xTQx:x∈R2}
where
Q
\mathbf{Q}
Q is a positive definite
2
×
2
2\times 2
2×2 matrix. Suppose we use the diagonal scaling matrix
D
=
(
Q
11
−
1
0
0
Q
22
−
1
)
\mathbf{D}=\begin{pmatrix} \mathbf{Q}_{11}^{-1}& 0\\ 0& \mathbf{Q}_{22}^{-1}\\ \end{pmatrix}
D=(Q11−100Q22−1)
Show that the above scaling matrix improves the condition number of
Q
\mathbf{Q}
Q in the sense that
χ
(
D
1
2
Q
D
1
2
)
≤
χ
(
Q
)
\chi\left(\mathbf{D}^{\frac{1}{2}}\mathbf{Q}\mathbf{D}^{\frac{1}{2}}\right)\le \chi\left(\mathbf{Q}\right)
χ(D21QD21)≤χ(Q)
解:
设
Q
=
(
Q
11
Q
12
Q
12
Q
22
)
\mathbf{Q}=\begin{pmatrix} Q_{11}&Q_{12}\\ Q_{12}&Q_{22}\\ \end{pmatrix}
Q=(Q11Q12Q12Q22)
D
1
2
Q
D
1
2
=
(
1
Q
12
Q
11
Q
22
Q
12
Q
11
Q
22
1
)
\mathbf{D}^{\frac{1}{2}}\mathbf{Q}\mathbf{D}^{\frac{1}{2}}=\begin{pmatrix} 1&\frac{Q_{12}}{\sqrt{Q_{11}Q_{22}}}\\ \frac{Q_{12}}{\sqrt{Q_{11}Q_{22}}}&1\\ \end{pmatrix}
D21QD21=(1Q11Q22Q12Q11Q22Q121)
因为
Q
≻
0
Q\succ 0
Q≻0,所以
Q
11
,
Q
22
>
0
,
Q
11
Q
22
>
Q
12
2
Q_{11},Q_{22}>0,\quad Q_{11}Q_{22}>Q_{12}^2
Q11,Q22>0,Q11Q22>Q122
因为
k
Q
≻
0
,
(
k
>
0
)
k\mathbf{Q}\succ0,\quad\left(k>0\right)
kQ≻0,(k>0)
并且
χ
(
Q
)
=
χ
(
k
Q
)
,
(
k
>
0
)
\chi\left(\mathbf{Q}\right)=\chi\left(k\mathbf{Q}\right),\quad \left(k>0\right)
χ(Q)=χ(kQ),(k>0)
考虑
A
=
(
α
1
1
β
)
,
α
,
β
>
0
,
α
β
>
1
\mathbf{A}=\begin{pmatrix} \alpha&1\\ 1&\beta\\ \end{pmatrix},\quad \alpha,\beta>0,\alpha\beta>1
A=(α11β),α,β>0,αβ>1
可以手算出
A
\mathbf{A}
A的特征值,于是
χ
(
A
)
=
α
+
β
+
(
α
−
β
)
2
+
4
α
+
β
−
(
α
−
β
)
2
+
4
=
1
+
(
α
−
β
)
2
+
4
(
α
+
β
)
2
1
−
(
α
−
β
)
2
+
4
(
α
+
β
)
2
\chi\left(\mathbf{A}\right)=\frac{\alpha+\beta+\sqrt{\left(\alpha-\beta\right)^2+4}}{\alpha+\beta-\sqrt{\left(\alpha-\beta\right)^2+4}}=\frac{1+\sqrt{\frac{\left(\alpha-\beta\right)^2+4}{\left(\alpha+\beta\right)^2}}}{1-\sqrt{\frac{\left(\alpha-\beta\right)^2+4}{\left(\alpha+\beta\right)^2}}}
χ(A)=α+β−(α−β)2+4α+β+(α−β)2+4=1−(α+β)2(α−β)2+41+(α+β)2(α−β)2+4
令
C
=
1
Q
12
Q
=
(
α
1
1
β
)
\mathbf{C}=\frac{1}{Q_{12}}\mathbf{Q}=\begin{pmatrix} \alpha&1\\ 1&\beta \end{pmatrix}
C=Q121Q=(α11β)
其中
α
=
Q
11
Q
12
,
β
=
Q
22
Q
12
\alpha=\frac{Q_{11}}{Q_{12}},\beta=\frac{Q_{22}}{Q_{12}}
α=Q12Q11,β=Q12Q22
于是
P
=
α
β
Q
12
D
1
2
Q
D
1
2
=
(
α
β
1
1
α
β
)
\mathbf{P}=\frac{\sqrt{\alpha\beta}}{Q_{12}}\mathbf{D}^{\frac{1}{2}}\mathbf{Q}\mathbf{D}^{\frac{1}{2}}= \begin{pmatrix} \sqrt{\alpha\beta} & 1\\ 1& \sqrt{\alpha\beta} \end{pmatrix}
P=Q12αβD21QD21=(αβ11αβ)
于是
χ
(
C
)
=
χ
(
Q
)
=
1
+
(
α
−
β
)
2
+
4
(
α
+
β
)
2
1
−
(
α
−
β
)
2
+
4
(
α
+
β
)
2
\chi\left(\mathbf{C}\right)=\chi\left(\mathbf{Q}\right)=\frac{1+\sqrt{\frac{\left(\alpha-\beta\right)^2+4}{\left(\alpha+\beta\right)^2}}}{1-\sqrt{\frac{\left(\alpha-\beta\right)^2+4}{\left(\alpha+\beta\right)^2}}}
χ(C)=χ(Q)=1−(α+β)2(α−β)2+41+(α+β)2(α−β)2+4
χ
(
P
)
=
χ
(
D
1
2
Q
D
1
2
)
=
1
+
1
α
β
1
−
1
α
β
\chi\left(\mathbf{P}\right)=\chi\left(\mathbf{D}^{\frac{1}{2}}\mathbf{Q}\mathbf{D}^{\frac{1}{2}}\right)=\frac{1+\sqrt{\frac{1}{\alpha\beta}}}{1-\sqrt{\frac{1}{\alpha\beta}}}
χ(P)=χ(D21QD21)=1−αβ11+αβ1
f
(
t
)
=
1
+
t
1
−
t
f(t)=\frac{1+\sqrt{t}}{1-\sqrt{t}}
f(t)=1−t1+t在
(
0
,
1
)
\left(0,1\right)
(0,1)上单调递增
因为
(
α
−
β
)
2
+
4
(
α
+
β
)
2
>
1
α
β
\frac{\left(\alpha-\beta\right)^2+4}{\left(\alpha+\beta\right)^2}>\frac{1}{\alpha\beta}
(α+β)2(α−β)2+4>αβ1
所以
χ
(
C
)
>
χ
(
P
)
\chi\left(\mathbf{C}\right)>\chi\left(\mathbf{P}\right)
χ(C)>χ(P)
所以
χ
(
D
1
2
Q
D
1
2
)
≤
χ
(
Q
)
\chi\left(\mathbf{D}^{\frac{1}{2}}\mathbf{Q}\mathbf{D}^{\frac{1}{2}}\right)\le \chi\left(\mathbf{Q}\right)
χ(D21QD21)≤χ(Q)
4.3. Consider the quadratic minimization problem
min
{
x
T
A
x
:
x
∈
R
5
}
\min\left\{\mathbf{x}^T\mathbf{Ax}:\mathbf{x}\in\mathbb{R}^5\right\}
min{xTAx:x∈R5}
where
A
\mathbf{A}%
A is the
5
×
5
5\times 5
5×5 Hillbert matrix defined by
A
i
,
j
=
1
i
+
j
−
1
,
i
,
j
=
1
,
2
,
3
,
4
,
5
\mathbf{A}_{i,j}=\frac{1}{i+j-1},\quad i,j=1,2,3,4,5
Ai,j=i+j−11,i,j=1,2,3,4,5
The matrix can he constructed via the MATLAB command
A=hilb(5)
Run the following methods and compare the number of iterations required by each of the methods when the initial vector is x 0 = ( 1 , 2 , 3 , 4 , 5 ) T \mathbf{x}_0=\left(1,2,3,4,5\right)^T x0=(1,2,3,4,5)T to obtain a solution x \mathbf{x} x with ∥ ∇ f ( x ) ∥ ≤ 1 0 − 4 \|\nabla f\left(\mathbf{x}\right)\|\le 10^{-4} ∥∇f(x)∥≤10−4:
- gradient method with backtracking stepsize rule and parameters α = 0.5 , β = 0.5 , s = 1 \alpha=0.5,\beta=0.5,s=1 α=0.5,β=0.5,s=1;
- gradient method with backtracking stepsize rule and parameters α = 0.1 , β = 0.5 , s = 1 \alpha=0.1,\beta=0.5,s=1 α=0.1,β=0.5,s=1;
- gradient method with exact line search;
- diagonally scaled gradient method with diagonal elements D i i = 1 A i i , i = 1 , 2 , 3 , 4 , 5 D_{ii}=\frac{1}{\mathbf{A}_{ii}},i=1,2,3,4,5 Dii=Aii1,i=1,2,3,4,5 and exact line search;
- diagonally scaled gradient method with diagonal elements D i i = 1 A i i , i = 1 , 2 , 3 , 4 , 5 D_{ii}=\frac{1}{\mathbf{A}_{ii}},i=1,2,3,4,5 Dii=Aii1,i=1,2,3,4,5 and backtracking line search with parameters α = 0.1 , β = 0.5 , s = 1 \alpha=0.1,\beta=0.5,s=1 α=0.1,β=0.5,s=1.
解:
function [x,fun_val]=gradient_method_backtracking(f,g,x0,s,alpha,...
beta,epsilon)
% Gradient method with backtracking stepsize rule
%
% INPUT
%=======================================
% f ......... objective function
% g ......... gradient of the objective function
% x0......... initial point
% s ......... initial choice of stepsize
% alpha ..... tolerance parameter for the stepsize selection
% beta ...... the constant in which the stepsize is multiplied
% at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
% of min f(x)
% fun_val ... optimal function value
x=x0;
grad=g(x);
fun_val=f(x);
iter=0;
while (norm(grad)>epsilon)
iter=iter+1;
t=s;
while (fun_val-f(x-t*grad)<alpha*t*norm(grad)^2)
t=beta*t;
end
x=x-t*grad;
fun_val=f(x);
grad=g(x);
fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...
iter,norm(grad),fun_val);
end
function [x,fun_val]=gradient_method_quadratic(A,b,x0,epsilon)
% INPUT
% ======================·
% A ....... the positive definite matrix associated with the
% objective function
% b ....... a column vector associated with the linear part of the
% objective function
% x0 ...... starting point of the method
% epsilon . tolerance parameter
% OUTPUT
% =======================
% x ....... an optimal solution (up to a tolerance) of
% min(x^T A x+2 b^T x)
% fun_val . the optimal function value up to a tolerance
x=x0;
iter=0;
grad=2*(A*x+b);
while (norm(grad)>epsilon)
iter=iter+1;
t=norm(grad)^2/(2*grad'*A*grad);
x=x-t*grad;
grad=2*(A*x+b);
fun_val=x'*A*x+2*b'*x;
fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f\n',...
iter,norm(grad),fun_val);
end
function [x,fun_val]=gradient_scaled_quadratic(A,b,D,x0,epsilon)
% INPUT
% ======================
% A ....... the positive definite matrix associated
% with the objective function
% b ....... a column vector associated with the linear part
% of the objective function
% D ....... scaling matrix
% x0 ...... starting point of the method
% epsilon . tolerance parameter
% OUTPUT
% =======================
% x ....... an optimal solution (up to a tolerance) ...
% of min(x^T A x+2 b^T x)
% fun_val . the optimal function value up to a tolerance
x=x0;
iter=0;
grad=2*(A*x+b);
while (norm(grad)>epsilon)
iter=iter+1;
t=grad'*D*grad/(2*(grad'*D')*A*(D*grad));
x=x-t*D*grad;
grad=2*(A*x+b);
fun_val=x'*A*x+2*b'*x;
fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...
iter,norm(grad),fun_val);
end
function [x,fun_val]=gradient_scaled_quadratic_backtracking(A,b,D,x0,s,...
alpha,beta,epsilon)
% INPUT
% ======================
% A ....... the positive definite matrix associated
% with the objective function
% b ....... a column vector associated with the linear part
% of the objective function
% D ....... scaling matrix
% x0....... initial point
% s ....... initial choice of stepsize
% alpha ... tolerance parameter for the stepsize selection
% beta .... the constant in which the stepsize is multiplied
% at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
% =======================
% x ....... an optimal solution (up to a tolerance) ...
% of min(x^T A x+2 b^T x)
% fun_val . the optimal function value up to a tolerance
x=x0;
iter=0;
grad=2*(A*x+b);
fun_val=x'*A*x+2*b'*x;
while (norm(grad)>epsilon)
iter=iter+1;
t=s;
while (fun_val-((x-t*D*grad)'*A*(x-t*D*grad)+2*b'*(x-t*D*grad))<alpha*t*grad'*D*grad)
t=beta*t;
end
x=x-t*D*grad;
grad=2*(A*x+b);
fun_val=x'*A*x+2*b'*x;
fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...
iter,norm(grad),fun_val);
end
A=hilb(5);
b=zeros(size(A,2),1);
D=diag(1./diag(A));
f=@(x)x'*A*x;
g=@(x)2*A*x;
s=1;
alpha=0.5;
beta=0.5;
epsilon=1e-4;
x0=[1,2,3,4,5]';
s=1;
alpha=0.5;
beta=0.5;
gradient_method_backtracking(f,g,x0,s,alpha,beta,epsilon);
s=1;
alpha=0.1;
beta=0.5;
gradient_method_backtracking(f,g,x0,s,alpha,beta,epsilon);
gradient_method_quadratic(A,b,x0,epsilon);
s=1;
alpha=0.1;
beta=0.5;
gradient_scaled_quadratic_backtracking(A,b,D,x0,s,alpha,beta,epsilon);
回溯法:
α
=
0.5
,
β
=
0.5
,
s
=
1
\alpha=0.5,\beta=0.5,s=1
α=0.5,β=0.5,s=1需要3301
回溯法:
α
=
0.1
,
β
=
0.5
,
s
=
1
\alpha=0.1,\beta=0.5,s=1
α=0.1,β=0.5,s=1需要3732
精确线搜索:需要1271
diagonally scaled+精确线搜索:需要235
diagonally scaled+回溯法:需要104
4.4. Consider the Fermat-Weber problem
min
x
∈
R
n
{
f
(
x
)
=
∑
i
=
1
m
ω
i
∥
x
−
a
i
∥
}
\min_{\mathbf{x}\in\mathbb{R}^n}\left\{f\left(\mathbf{x}\right)=\sum_{i=1}^{m}\omega_i\|\mathbf{x}-\mathbf{a}_i\|\right\}
x∈Rnmin{f(x)=i=1∑mωi∥x−ai∥}
where
ω
1
,
⋯
,
ω
m
>
0
\omega_1,\cdots,\omega_m>0
ω1,⋯,ωm>0 and
a
1
,
⋯
,
a
m
∈
R
n
\mathbf{a}_1,\cdots,\mathbf{a}_m\in\mathbb{R}^n
a1,⋯,am∈Rn are m different points. Let
p
∈
argmin
i
=
1
,
2
,
⋯
,
m
f
(
a
i
)
p\in \operatorname{argmin}_{i=1,2,\cdots,m} f\left(\mathbf{a}_i\right)
p∈argmini=1,2,⋯,mf(ai)
Suppose that
∥
∑
i
≠
p
ω
i
a
p
−
a
i
∥
a
p
−
a
i
∥
∥
>
ω
p
\|\sum_{i\neq p}\omega_i\frac{\mathbf{a}_p-\mathbf{a}_i}{\|\mathbf{a}_p-\mathbf{a}_i\|}\|>\omega_{p}
∥i=p∑ωi∥ap−ai∥ap−ai∥>ωp
(i)Show that there exists a direction
d
∈
R
n
\mathbf{d}\in\mathbb{R}^n
d∈Rn such that
f
′
(
a
p
;
d
)
<
0
f'\left(\mathbf{a}_p;d\right)<0
f′(ap;d)<0
(ii)Show that there exists
x
0
∈
R
n
\mathbf{x}_0\in\mathbb{R}^n
x0∈Rn satisfying
f
(
x
0
)
<
min
{
f
(
a
1
)
,
⋯
,
f
(
a
p
)
}
f\left(\mathbf{x}_0\right)<\min\left\{f\left(\mathbf{a}_1\right),\cdots,f\left(\mathbf{a}_p\right)\right\}
f(x0)<min{f(a1),⋯,f(ap)}.Explain how to compute such a vector.
解:
(i)
f
′
(
a
p
;
d
)
=
lim
t
→
0
+
f
(
a
p
+
t
d
)
−
f
(
a
p
)
t
=
lim
t
→
0
+
∑
i
≠
p
ω
i
(
∥
a
p
+
t
d
−
a
i
∥
−
∥
a
p
−
a
i
∥
)
+
ω
p
∥
t
d
∥
t
≤
lim
t
→
0
+
∑
i
=
1
m
ω
i
∥
t
d
∥
t
=
0
\begin{aligned} f'\left(\mathbf{a}_p;d\right)&=\lim\limits_{t\to 0^+}\frac{f\left(\mathbf{a}_p+t\mathbf{d}\right)-f\left(\mathbf{a}_p\right)}{t}\\ &=\lim\limits_{t\to 0^+}\frac{\sum_{i\neq p}\omega_i\left(\|\mathbf{a}_p+t\mathbf{d}-\mathbf{a}_i\|-\|\mathbf{a}_p-\mathbf{a}_i\|\right)+\omega_p\|t\mathbf{d}\|}{t}\\ &\le\lim\limits_{t\to 0^+}\frac{\sum_{i=1}^{m}\omega_i\|t\mathbf{d}\|}{t} =0 \end{aligned}
f′(ap;d)=t→0+limtf(ap+td)−f(ap)=t→0+limt∑i=pωi(∥ap+td−ai∥−∥ap−ai∥)+ωp∥td∥≤t→0+limt∑i=1mωi∥td∥=0
然后不会了
(ii)
4.5. In the “source localization problem” we are given
m
m
m locations of sensors
a
1
,
⋯
,
a
m
∈
R
n
\mathbf{a}_1,\cdots,\mathbf{a}_m\in\mathbb{R}^n
a1,⋯,am∈Rn and approximate distances between the sensors and an unknown “source” located at
x
∈
R
n
\mathbf{x}\in \mathbb{R}^n
x∈Rn:
d
i
≈
∥
x
−
a
i
∥
\mathbf{d}_i\approx\|\mathbf{x}-\mathbf{a}_i\|
di≈∥x−ai∥
The problem is to find and estimate
x
\mathbf{x}
x given the locations
a
1
,
⋯
,
a
m
\mathbf{a}_1,\cdots,\mathbf{a}_m
a1,⋯,am and the approximate distances
d
1
,
⋯
,
d
m
d_1,\cdots,d_m
d1,⋯,dm.A natural formulation as an optimization problem is to consider the nonlinear least squares problem
(SL)
min
{
f
(
x
)
≡
∑
i
=
1
m
(
∥
x
−
a
i
∥
−
d
i
)
2
}
\text { (SL) } \min \left\{f(\mathbf{x}) \equiv \sum_{i=1}^{m}\left(\left\|\mathbf{x}-\mathbf{a}_{i}\right\|-d_{i}\right)^{2}\right\}
(SL) min{f(x)≡i=1∑m(∥x−ai∥−di)2}
We will denote the set of sensors of
A
≡
{
a
1
,
⋯
,
a
m
}
\mathscr{A}\equiv \left\{\mathbf{a}_1,\cdots,\mathbf{a}_m\right\}
A≡{a1,⋯,am}
(i)Show that the optimality condition
∇
f
(
x
)
=
0
(
x
∉
A
)
\nabla f\left(\mathbf{x}\right)=0\left(\mathbf{x}\notin\mathscr{A}\right)
∇f(x)=0(x∈/A) is the same as
x
=
1
m
{
∑
i
=
1
m
a
i
+
∑
i
=
1
m
d
i
x
−
a
i
∥
x
−
a
i
∥
}
\mathbf{x}=\frac{1}{m}\left\{\sum_{i=1}^{m} \mathbf{a}_{i}+\sum_{i=1}^{m} d_{i} \frac{\mathbf{x}-\mathbf{a}_{i}}{\left\|\mathbf{x}-\mathbf{a}_{i}\right\|}\right\}
x=m1{i=1∑mai+i=1∑mdi∥x−ai∥x−ai}
(ii)Show that the corresponding fixed point method
x
k
+
1
=
1
m
{
∑
i
=
1
m
a
i
+
∑
i
=
1
m
d
i
x
k
−
a
i
∥
x
k
−
a
i
∥
}
\mathbf{x}_{k+1}=\frac{1}{m}\left\{\sum_{i=1}^{m} \mathbf{a}_{i}+\sum_{i=1}^{m} d_{i} \frac{\mathbf{x}_{k}-\mathbf{a}_{i}}{\left\|\mathbf{x}_{k}-\mathbf{a}_{i}\right\|}\right\}
xk+1=m1{i=1∑mai+i=1∑mdi∥xk−ai∥xk−ai}
is a gradient method, assuming that
x
k
∉
A
\mathbf{x}_k\notin \mathscr{A}
xk∈/A for all
k
≥
0
k\ge 0
k≥0.What is the stepsize?
解:
(i)
∇
f
(
x
)
=
∑
i
=
1
m
(
∥
x
−
a
i
∥
−
d
i
)
(
x
−
a
i
)
∥
x
−
a
∥
=
0
⇒
x
=
1
m
{
∑
i
=
1
m
a
i
+
∑
i
=
1
m
d
i
x
−
a
i
∥
x
−
a
i
∥
}
\nabla f\left(\mathbf{x}\right)=\sum_{i=1}^{m}\frac{\left(\left\|\mathbf{x}-\mathbf{a}_{i}\right\|-d_{i}\right)\left(\mathbf{x}-\mathbf{a}_i\right)}{\|\mathbf{x}-\mathbf{a}\|}=0\Rightarrow \mathbf{x}=\frac{1}{m}\left\{\sum_{i=1}^{m} \mathbf{a}_{i}+\sum_{i=1}^{m} d_{i} \frac{\mathbf{x}-\mathbf{a}_{i}}{\left\|\mathbf{x}-\mathbf{a}_{i}\right\|}\right\}
∇f(x)=i=1∑m∥x−a∥(∥x−ai∥−di)(x−ai)=0⇒x=m1{i=1∑mai+i=1∑mdi∥x−ai∥x−ai}
(ii)
x
k
+
1
=
x
k
−
1
m
∇
f
(
x
k
)
\mathbf{x}_{k+1}=\mathbf{x}_k-\frac{1}{m}\nabla f\left(\mathbf{x}_k\right)
xk+1=xk−m1∇f(xk)
是一个梯度方法,固定步长
1
m
\frac{1}{m}
m1
4.6. Another formulation of the source localization problem consists of minimizing the following objective function:
(SL2)
min
x
∈
R
n
{
f
(
x
)
≡
∑
i
=
1
m
(
∥
x
−
a
i
∥
2
−
d
i
2
)
2
}
\text { (SL2) } \min _{\mathbf{x} \in \mathbb{R}^{n}}\left\{f(\mathbf{x}) \equiv \sum_{i=1}^{m}\left(\left\|\mathbf{x}-\mathbf{a}_{i}\right\|^{2}-d_{i}^{2}\right)^{2}\right\}
(SL2) x∈Rnmin{f(x)≡i=1∑m(∥x−ai∥2−di2)2}
This is of course a nonlinear least squares problem, and thus the Gauss-Newton method can be employed in order to solve it. We will assume that
n
=
2
n=2
n=2.
(i)Show that as long as all the points
a
1
,
⋯
,
a
m
\mathbf{a}_1,\cdots,\mathbf{a}_m
a1,⋯,am do not reside on the same line in the plane, the method is well-defined, meaning that the linear least squares problem solved at each iteration has a unique solution.
(ii)Write a MATLAB function that implements the damped Gauss-Newton
method employed on problem (SL2) with a backtracking line search strategy with parameters
s
=
1
,
α
=
β
=
0.5
,
ϵ
=
1
0
−
4
s=1,\alpha=\beta=0.5,\epsilon =10^{-4}
s=1,α=β=0.5,ϵ=10−4.Run the function on the two-dimensional problem
(
n
=
2
)
\left(n=2\right)
(n=2) with 5 anchors
(
m
=
5
)
\left(m=5\right)
(m=5) and data generated by the MATLAB commands
randn('seed',317);
A=randn(2,5);
x=randn(2,1);
d=sqrt(sum((A-x*ones(1,5).^2)))+0.05*randn(1,5);
d=d';
The columns of the
2
×
5
2\times 5
2×5 matrix
A
\mathbf{A}
A are the locations of the five sensors,
x
\mathbf{x}
x is the “true” location of the source, and
d
\mathbf{d}
d is the vector of noisy measurements between the source and the sensors. Compare your results (e.g., number of iterations) to the gradient method with backtracking and parameters
s
=
1
,
α
=
β
=
0.5
,
ϵ
=
1
0
−
4
s=1,\alpha=\beta=0.5,\epsilon=10^{-4}
s=1,α=β=0.5,ϵ=10−4.Start both methods with the initial vector
(
1000
,
−
500
)
T
\left(1000,-500\right)^{T}
(1000,−500)T.
4.7. Let
f
(
x
)
=
x
T
A
x
+
2
b
T
x
+
c
f\left(\mathbf{x}\right)=\mathbf{x}^T\mathbf{Ax}+2\mathbf{b}^T\mathbf{x}+c
f(x)=xTAx+2bTx+c,where
A
\mathbf{A}
A is a symmetric
n
×
n
n\times n
n×n matrix,
b
∈
R
n
\mathbf{b}\in\mathbb{R}^n
b∈Rn,and
c
∈
R
c\in \mathbb{R}
c∈R.Show that the smallest Lipschitz constant of
∇
f
\nabla f
∇f is
2
∥
A
∥
2\|\mathbf{A}\|
2∥A∥.
解:
∇
f
(
x
)
=
2
A
x
+
2
b
\nabla f\left(\mathbf{x}\right)=2\mathbf{Ax}+2\mathbf{b}
∇f(x)=2Ax+2b
∥
∇
f
(
x
)
−
∇
(
y
)
∥
=
∥
2
A
(
x
−
y
)
∥
≤
2
∥
A
∥
∥
x
−
y
∥
\|\nabla f\left(\mathbf{x}\right)-\nabla\left(\mathbf{y}\right)\|=\|2\mathbf{A}\left(\mathbf{x}-\mathbf{y}\right)\|\le 2\|\mathbf{A}\|\|\mathbf{x}-\mathbf{y}\|
∥∇f(x)−∇(y)∥=∥2A(x−y)∥≤2∥A∥∥x−y∥
所以
L
≤
2
∥
A
∥
L\le 2\|\mathbf{A}\|
L≤2∥A∥
当
A
x
=
λ
1
x
\mathbf{Ax}=\lambda_1\mathbf{x}
Ax=λ1x时,
∥
∇
f
(
x
)
−
∇
f
(
0
)
∥
=
2
∥
A
x
∥
=
2
λ
1
∥
x
∥
=
2
∥
A
∥
∥
x
−
0
∥
\|\nabla f\left(\mathbf{x}\right)-\nabla f\left(0\right)\|=2\|\mathbf{Ax}\|=2\lambda_1\|\mathbf{x}\|=2\|\mathbf{A}\|\|\mathbf{x}-0\|
∥∇f(x)−∇f(0)∥=2∥Ax∥=2λ1∥x∥=2∥A∥∥x−0∥
所以
L
=
2
∥
A
∥
L=2\|\mathbf{A}\|
L=2∥A∥
4.8. Let f : R n → R f:\mathbb{R}^n\to\mathbb{R} f:Rn→R be given by f ( x ) = 1 + ∥ x ∥ 2 f\left(\mathbf{x}\right)=\sqrt{1+\|\mathbf{x}\|^2} f(x)=1+∥x∥2.Show that f ∈ C 1 1 , 1 f\in C_{1}^{1,1} f∈C11,1.
解:
∇
f
(
x
)
=
x
1
+
∥
x
∥
2
\nabla f\left(\mathbf{x}\right)=\frac{\mathbf{x}}{\sqrt{1+\|\mathbf{x}\|^2}}
∇f(x)=1+∥x∥2x
∇
2
f
(
x
)
=
(
1
+
x
T
x
)
I
−
x
x
T
(
1
+
∥
x
∥
2
)
3
2
\nabla^2 f\left(\mathbf{x}\right)=\frac{\left(1+\mathbf{x}^T\mathbf{x}\right)\mathbf{I}-\mathbf{x}\mathbf{x}^T}{\left(1+\|\mathbf{x}\|^2\right)^{\frac{3}{2}}}
∇2f(x)=(1+∥x∥2)23(1+xTx)I−xxT
注意到
x
x
T
\mathbf{x}\mathbf{x}^T
xxT的特征值为
n
−
1
n-1
n−1个0和1个
x
T
x
\mathbf{x}^T\mathbf{x}
xTx
于是
∥
x
T
x
I
−
x
x
T
∥
=
x
T
x
\|\mathbf{x}^T\mathbf{x}\mathbf{I}-\mathbf{x}\mathbf{x}^T\|=\mathbf{x}^T\mathbf{x}
∥xTxI−xxT∥=xTx
于是
∥
∇
2
f
(
x
)
∥
≤
∥
I
∥
+
∥
x
T
x
I
−
x
x
T
∥
(
1
+
∥
x
∥
2
)
3
2
=
1
+
x
T
x
(
1
+
∥
x
∥
2
)
3
2
≤
1
\|\nabla^2 f\left(\mathbf{x}\right)\|\le\frac{\|\mathbf{I}\|+\|\mathbf{x}^T\mathbf{x}\mathbf{I}-\mathbf{x}\mathbf{x}^T\|}{\left(1+\|\mathbf{x}\|^2\right)^{\frac{3}{2}}}= \frac{1+\mathbf{x}^T\mathbf{x}}{{\left(1+\|\mathbf{x}\|^2\right)^{\frac{3}{2}}}}\le 1
∥∇2f(x)∥≤(1+∥x∥2)23∥I∥+∥xTxI−xxT∥=(1+∥x∥2)231+xTx≤1
所以 L = 1 L=1 L=1
4.9. Let
f
∈
C
L
1
,
1
(
R
m
)
f\in C_{L}^{1,1}\left(\mathbb{R}^m\right)
f∈CL1,1(Rm), and let
A
∈
R
m
×
n
,
b
∈
R
m
\mathbf{A}\in\mathbb{R}^{m\times n},\mathbf{b}\in\mathbb{R}^m
A∈Rm×n,b∈Rm.Show that the function
g
:
R
n
→
R
g:\mathbb{R}^{n}\to \mathbb{R}
g:Rn→R defined by
g
(
x
)
=
f
(
A
x
+
b
)
g\left(\mathbf{x}\right)=f\left(\mathbf{Ax}+\mathbf{b}\right)
g(x)=f(Ax+b) satisfies
g
∈
C
L
~
1
,
1
(
R
n
)
g\in C_{\tilde{L}}^{1,1}\left(\mathbb{R}^n\right)
g∈CL~1,1(Rn),where
L
~
=
∥
A
∥
2
L
\tilde{L}=\|\mathbf{A}\|^2L
L~=∥A∥2L.
解:
∇
g
(
x
)
=
A
T
∇
f
(
A
x
+
b
)
\nabla g\left(\mathbf{x}\right)=\mathbf{A}^T\nabla f\left(\mathbf{Ax}+b\right)
∇g(x)=AT∇f(Ax+b)
∥
∇
g
(
x
)
−
∇
g
(
y
)
∥
=
∥
A
T
(
∇
f
(
A
x
+
b
)
−
∇
f
(
A
y
+
b
)
)
∥
≤
∥
A
T
∥
∥
A
(
x
−
y
)
∥
≤
∥
A
∥
2
L
∥
x
−
y
∥
\|\nabla g\left(\mathbf{x}\right)-\nabla g\left(\mathbf{y}\right)\|=\|\mathbf{A}^T\left(\nabla f\left(\mathbf{Ax}+\mathbf{b}\right)-\nabla f\left(\mathbf{Ay}+\mathbf{b}\right)\right)\|\le\|\mathbf{A}^T\|\|\mathbf{A}\left(\mathbf{x}-\mathbf{y}\right)\|\le \|\mathbf{A}\|^2L\|\mathbf{x}-\mathbf{y}\|
∥∇g(x)−∇g(y)∥=∥AT(∇f(Ax+b)−∇f(Ay+b))∥≤∥AT∥∥A(x−y)∥≤∥A∥2L∥x−y∥
所以
L
~
=
∥
A
∥
2
L
\tilde{L}=\|\mathbf{A}\|^2L
L~=∥A∥2L
4.10. Give an example of a function
f
∈
C
L
1
,
1
(
R
)
f\in C_{L}^{1,1}\left(\mathbb{R}\right)
f∈CL1,1(R) and a starting point
x
0
∈
R
\mathbf{x}_0\in\mathbb{R}
x0∈R such that the problem min
f
(
x
)
f\left(\mathbf{x}\right)
f(x) has an optimal solution and the gradient method with constant stepsize
t
=
2
L
t=\frac{2}{L}
t=L2 diverges.
4.11. Suppose that
f
∈
C
L
1
,
1
(
R
n
)
f\in C_{L}^{1,1}\left(\mathbb{R}^{n}\right)
f∈CL1,1(Rn) and assume that
∇
2
f
(
x
)
⪰
0
\nabla^2 f\left(\mathbf{x}\right)\succeq 0
∇2f(x)⪰0 for any
x
∈
R
n
\mathbf{x}\in\mathbb{R}^n
x∈Rn. Suppose that the optimal value of the problem
min
x
∈
R
n
f
(
x
)
\min_{\mathbf{x}\in\mathbb{R}^n} f\left(\mathbf{x}\right)
minx∈Rnf(x) is
f
∗
f^*
f∗. Let
{
x
k
}
k
≥
0
\left\{\mathbf{x}_k\right\}_{k\ge 0}
{xk}k≥0 be thesequence generated by the gradient method with constant stepsize
1
L
\frac{1}{L}
L1.Show that if
{
x
k
}
k
≥
0
\left\{\mathbf{x}_k\right\}_{k\ge 0}
{xk}k≥0 is bounded,then
f
(
x
k
)
→
f
∗
f\left(\mathbf{x}_k\right)\to f^*
f(xk)→f∗ as
k
→
∞
k\to \infty
k→∞.