Introduction to nonlinear optimization第四章习题

(抄了个题,只做了以前上课写的题目,哪天有空了做一下)
4.1. Let f ∈ C L 1 , 1 ( R n ) f\in C_{L}^{1,1}\left(\mathbb{R}^n\right) fCL1,1(Rn) and let { x k } k ≥ 0 \left\{\mathbf{x}_{k}\right\}_{k\ge 0} {xk}k0 be the sequence generated by the gradient method with a constant stepsize t k = 1 L t_k=\frac{1}{L} tk=L1.Assume that x k → x ∗ \mathbf{x}_{k}\to \mathbf{x}^* xkx.Show that if ∇ f ( x k ) ≠ 0 \nabla f\left(\mathbf{x}_{k}\right)\neq 0 f(xk)=0 for all k ≥ 0 k\ge 0 k0,then x ∗ \mathbf{x}^* x is not a local maximum point.

4.2. [9, Exercise 1.3.3] Consider the minimization problem
min ⁡ { x T Q x : x ∈ R 2 } \min \left\{\mathbf{x}^T\mathbf{Qx}:\mathbf{x}\in\mathbb{R}^2\right\} min{xTQx:xR2}
where Q \mathbf{Q} Q is a positive definite 2 × 2 2\times 2 2×2 matrix. Suppose we use the diagonal scaling matrix
D = ( Q 11 − 1 0 0 Q 22 − 1 ) \mathbf{D}=\begin{pmatrix} \mathbf{Q}_{11}^{-1}& 0\\ 0& \mathbf{Q}_{22}^{-1}\\ \end{pmatrix} D=(Q11100Q221)
Show that the above scaling matrix improves the condition number of Q \mathbf{Q} Q in the sense that
χ ( D 1 2 Q D 1 2 ) ≤ χ ( Q ) \chi\left(\mathbf{D}^{\frac{1}{2}}\mathbf{Q}\mathbf{D}^{\frac{1}{2}}\right)\le \chi\left(\mathbf{Q}\right) χ(D21QD21)χ(Q)

解:

Q = ( Q 11 Q 12 Q 12 Q 22 ) \mathbf{Q}=\begin{pmatrix} Q_{11}&Q_{12}\\ Q_{12}&Q_{22}\\ \end{pmatrix} Q=(Q11Q12Q12Q22)
D 1 2 Q D 1 2 = ( 1 Q 12 Q 11 Q 22 Q 12 Q 11 Q 22 1 ) \mathbf{D}^{\frac{1}{2}}\mathbf{Q}\mathbf{D}^{\frac{1}{2}}=\begin{pmatrix} 1&\frac{Q_{12}}{\sqrt{Q_{11}Q_{22}}}\\ \frac{Q_{12}}{\sqrt{Q_{11}Q_{22}}}&1\\ \end{pmatrix} D21QD21=(1Q11Q22 Q12Q11Q22 Q121)
因为 Q ≻ 0 Q\succ 0 Q0,所以
Q 11 , Q 22 > 0 , Q 11 Q 22 > Q 12 2 Q_{11},Q_{22}>0,\quad Q_{11}Q_{22}>Q_{12}^2 Q11,Q22>0,Q11Q22>Q122
因为
k Q ≻ 0 , ( k > 0 ) k\mathbf{Q}\succ0,\quad\left(k>0\right) kQ0,(k>0)
并且
χ ( Q ) = χ ( k Q ) , ( k > 0 ) \chi\left(\mathbf{Q}\right)=\chi\left(k\mathbf{Q}\right),\quad \left(k>0\right) χ(Q)=χ(kQ),(k>0)
考虑
A = ( α 1 1 β ) , α , β > 0 , α β > 1 \mathbf{A}=\begin{pmatrix} \alpha&1\\ 1&\beta\\ \end{pmatrix},\quad \alpha,\beta>0,\alpha\beta>1 A=(α11β),α,β>0,αβ>1
可以手算出 A \mathbf{A} A的特征值,于是
χ ( A ) = α + β + ( α − β ) 2 + 4 α + β − ( α − β ) 2 + 4 = 1 + ( α − β ) 2 + 4 ( α + β ) 2 1 − ( α − β ) 2 + 4 ( α + β ) 2 \chi\left(\mathbf{A}\right)=\frac{\alpha+\beta+\sqrt{\left(\alpha-\beta\right)^2+4}}{\alpha+\beta-\sqrt{\left(\alpha-\beta\right)^2+4}}=\frac{1+\sqrt{\frac{\left(\alpha-\beta\right)^2+4}{\left(\alpha+\beta\right)^2}}}{1-\sqrt{\frac{\left(\alpha-\beta\right)^2+4}{\left(\alpha+\beta\right)^2}}} χ(A)=α+β(αβ)2+4 α+β+(αβ)2+4 =1(α+β)2(αβ)2+4 1+(α+β)2(αβ)2+4

C = 1 Q 12 Q = ( α 1 1 β ) \mathbf{C}=\frac{1}{Q_{12}}\mathbf{Q}=\begin{pmatrix} \alpha&1\\ 1&\beta \end{pmatrix} C=Q121Q=(α11β)
其中 α = Q 11 Q 12 , β = Q 22 Q 12 \alpha=\frac{Q_{11}}{Q_{12}},\beta=\frac{Q_{22}}{Q_{12}} α=Q12Q11,β=Q12Q22
于是 P = α β Q 12 D 1 2 Q D 1 2 = ( α β 1 1 α β ) \mathbf{P}=\frac{\sqrt{\alpha\beta}}{Q_{12}}\mathbf{D}^{\frac{1}{2}}\mathbf{Q}\mathbf{D}^{\frac{1}{2}}= \begin{pmatrix} \sqrt{\alpha\beta} & 1\\ 1& \sqrt{\alpha\beta} \end{pmatrix} P=Q12αβ D21QD21=(αβ 11αβ )
于是
χ ( C ) = χ ( Q ) = 1 + ( α − β ) 2 + 4 ( α + β ) 2 1 − ( α − β ) 2 + 4 ( α + β ) 2 \chi\left(\mathbf{C}\right)=\chi\left(\mathbf{Q}\right)=\frac{1+\sqrt{\frac{\left(\alpha-\beta\right)^2+4}{\left(\alpha+\beta\right)^2}}}{1-\sqrt{\frac{\left(\alpha-\beta\right)^2+4}{\left(\alpha+\beta\right)^2}}} χ(C)=χ(Q)=1(α+β)2(αβ)2+4 1+(α+β)2(αβ)2+4

χ ( P ) = χ ( D 1 2 Q D 1 2 ) = 1 + 1 α β 1 − 1 α β \chi\left(\mathbf{P}\right)=\chi\left(\mathbf{D}^{\frac{1}{2}}\mathbf{Q}\mathbf{D}^{\frac{1}{2}}\right)=\frac{1+\sqrt{\frac{1}{\alpha\beta}}}{1-\sqrt{\frac{1}{\alpha\beta}}} χ(P)=χ(D21QD21)=1αβ1 1+αβ1
f ( t ) = 1 + t 1 − t f(t)=\frac{1+\sqrt{t}}{1-\sqrt{t}} f(t)=1t 1+t ( 0 , 1 ) \left(0,1\right) (0,1)上单调递增
因为 ( α − β ) 2 + 4 ( α + β ) 2 > 1 α β \frac{\left(\alpha-\beta\right)^2+4}{\left(\alpha+\beta\right)^2}>\frac{1}{\alpha\beta} (α+β)2(αβ)2+4>αβ1
所以 χ ( C ) > χ ( P ) \chi\left(\mathbf{C}\right)>\chi\left(\mathbf{P}\right) χ(C)>χ(P)
所以
χ ( D 1 2 Q D 1 2 ) ≤ χ ( Q ) \chi\left(\mathbf{D}^{\frac{1}{2}}\mathbf{Q}\mathbf{D}^{\frac{1}{2}}\right)\le \chi\left(\mathbf{Q}\right) χ(D21QD21)χ(Q)

4.3. Consider the quadratic minimization problem
min ⁡ { x T A x : x ∈ R 5 } \min\left\{\mathbf{x}^T\mathbf{Ax}:\mathbf{x}\in\mathbb{R}^5\right\} min{xTAx:xR5}
where A \mathbf{A}% A is the 5 × 5 5\times 5 5×5 Hillbert matrix defined by
A i , j = 1 i + j − 1 , i , j = 1 , 2 , 3 , 4 , 5 \mathbf{A}_{i,j}=\frac{1}{i+j-1},\quad i,j=1,2,3,4,5 Ai,j=i+j11,i,j=1,2,3,4,5
The matrix can he constructed via the MATLAB command

A=hilb(5)

Run the following methods and compare the number of iterations required by each of the methods when the initial vector is x 0 = ( 1 , 2 , 3 , 4 , 5 ) T \mathbf{x}_0=\left(1,2,3,4,5\right)^T x0=(1,2,3,4,5)T to obtain a solution x \mathbf{x} x with ∥ ∇ f ( x ) ∥ ≤ 1 0 − 4 \|\nabla f\left(\mathbf{x}\right)\|\le 10^{-4} f(x)104:

  • gradient method with backtracking stepsize rule and parameters α = 0.5 , β = 0.5 , s = 1 \alpha=0.5,\beta=0.5,s=1 α=0.5,β=0.5,s=1;
  • gradient method with backtracking stepsize rule and parameters α = 0.1 , β = 0.5 , s = 1 \alpha=0.1,\beta=0.5,s=1 α=0.1,β=0.5,s=1;
  • gradient method with exact line search;
  • diagonally scaled gradient method with diagonal elements D i i = 1 A i i , i = 1 , 2 , 3 , 4 , 5 D_{ii}=\frac{1}{\mathbf{A}_{ii}},i=1,2,3,4,5 Dii=Aii1,i=1,2,3,4,5 and exact line search;
  • diagonally scaled gradient method with diagonal elements D i i = 1 A i i , i = 1 , 2 , 3 , 4 , 5 D_{ii}=\frac{1}{\mathbf{A}_{ii}},i=1,2,3,4,5 Dii=Aii1,i=1,2,3,4,5 and backtracking line search with parameters α = 0.1 , β = 0.5 , s = 1 \alpha=0.1,\beta=0.5,s=1 α=0.1,β=0.5,s=1.

解:

function [x,fun_val]=gradient_method_backtracking(f,g,x0,s,alpha,...
    beta,epsilon)
% Gradient method with backtracking stepsize rule
%
% INPUT
%=======================================
% f ......... objective function
% g ......... gradient of the objective function
% x0......... initial point
% s ......... initial choice of stepsize
% alpha ..... tolerance parameter for the stepsize selection
% beta ...... the constant in which the stepsize is multiplied
%             at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
%             of min f(x)
% fun_val ... optimal function value
x=x0;
grad=g(x);
fun_val=f(x);
iter=0;
while (norm(grad)>epsilon)
    iter=iter+1;
    t=s;
    while (fun_val-f(x-t*grad)<alpha*t*norm(grad)^2)
        t=beta*t;
    end
    x=x-t*grad;
    fun_val=f(x);
    grad=g(x);
    fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...
        iter,norm(grad),fun_val);
end
function [x,fun_val]=gradient_method_quadratic(A,b,x0,epsilon)
% INPUT
% ======================·
% A ....... the positive definite matrix associated with the
%           objective function
% b ....... a column vector associated with the linear part of the
%           objective function
% x0 ...... starting point of the method
%           epsilon . tolerance parameter
% OUTPUT
% =======================
% x ....... an optimal solution (up to a tolerance) of
%           min(x^T A x+2 b^T x)
% fun_val . the optimal function value up to a tolerance

x=x0;
iter=0;
grad=2*(A*x+b);
while (norm(grad)>epsilon)
     iter=iter+1;
     t=norm(grad)^2/(2*grad'*A*grad);
     x=x-t*grad;
     grad=2*(A*x+b);
     fun_val=x'*A*x+2*b'*x;
     fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f\n',...
         iter,norm(grad),fun_val);
end
function [x,fun_val]=gradient_scaled_quadratic(A,b,D,x0,epsilon)
% INPUT
% ======================
% A ....... the positive definite matrix associated 
%           with the objective function
% b ....... a column vector associated with the linear part
%           of the objective function
% D ....... scaling matrix
% x0 ...... starting point of the method
% epsilon . tolerance parameter
% OUTPUT
% =======================
% x ....... an optimal solution (up to a tolerance) ...
%           of min(x^T A x+2 b^T x)
% fun_val . the optimal function value up to a tolerance

x=x0;
iter=0;
grad=2*(A*x+b);
while (norm(grad)>epsilon)
    iter=iter+1;
    t=grad'*D*grad/(2*(grad'*D')*A*(D*grad));
    x=x-t*D*grad;
    grad=2*(A*x+b);
    fun_val=x'*A*x+2*b'*x;
    fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...
        iter,norm(grad),fun_val);
end

function [x,fun_val]=gradient_scaled_quadratic_backtracking(A,b,D,x0,s,...
    alpha,beta,epsilon)
% INPUT
% ======================
% A ....... the positive definite matrix associated 
%           with the objective function
% b ....... a column vector associated with the linear part
%           of the objective function
% D ....... scaling matrix
% x0....... initial point
% s ....... initial choice of stepsize
% alpha ... tolerance parameter for the stepsize selection
% beta .... the constant in which the stepsize is multiplied
%           at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
% =======================
% x ....... an optimal solution (up to a tolerance) ...
%           of min(x^T A x+2 b^T x)
% fun_val . the optimal function value up to a tolerance

x=x0;
iter=0;
grad=2*(A*x+b);
fun_val=x'*A*x+2*b'*x;
while (norm(grad)>epsilon)
    iter=iter+1;
    t=s;
    while (fun_val-((x-t*D*grad)'*A*(x-t*D*grad)+2*b'*(x-t*D*grad))<alpha*t*grad'*D*grad)
        t=beta*t;
    end
    x=x-t*D*grad;
    grad=2*(A*x+b);
    fun_val=x'*A*x+2*b'*x;
    fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...
        iter,norm(grad),fun_val);
end
A=hilb(5);
b=zeros(size(A,2),1);
D=diag(1./diag(A));
f=@(x)x'*A*x;
g=@(x)2*A*x;
s=1;
alpha=0.5;
beta=0.5;
epsilon=1e-4;
x0=[1,2,3,4,5]';
s=1;
alpha=0.5;
beta=0.5;
gradient_method_backtracking(f,g,x0,s,alpha,beta,epsilon);

s=1;
alpha=0.1;
beta=0.5;
gradient_method_backtracking(f,g,x0,s,alpha,beta,epsilon);

gradient_method_quadratic(A,b,x0,epsilon);

s=1;
alpha=0.1;
beta=0.5;
gradient_scaled_quadratic_backtracking(A,b,D,x0,s,alpha,beta,epsilon);

回溯法: α = 0.5 , β = 0.5 , s = 1 \alpha=0.5,\beta=0.5,s=1 α=0.5,β=0.5,s=1需要3301
回溯法: α = 0.1 , β = 0.5 , s = 1 \alpha=0.1,\beta=0.5,s=1 α=0.1,β=0.5,s=1需要3732
精确线搜索:需要1271
diagonally scaled+精确线搜索:需要235
diagonally scaled+回溯法:需要104

4.4. Consider the Fermat-Weber problem
min ⁡ x ∈ R n { f ( x ) = ∑ i = 1 m ω i ∥ x − a i ∥ } \min_{\mathbf{x}\in\mathbb{R}^n}\left\{f\left(\mathbf{x}\right)=\sum_{i=1}^{m}\omega_i\|\mathbf{x}-\mathbf{a}_i\|\right\} xRnmin{f(x)=i=1mωixai}
where ω 1 , ⋯   , ω m > 0 \omega_1,\cdots,\omega_m>0 ω1,,ωm>0 and a 1 , ⋯   , a m ∈ R n \mathbf{a}_1,\cdots,\mathbf{a}_m\in\mathbb{R}^n a1,,amRn are m different points. Let
p ∈ argmin ⁡ i = 1 , 2 , ⋯   , m f ( a i ) p\in \operatorname{argmin}_{i=1,2,\cdots,m} f\left(\mathbf{a}_i\right) pargmini=1,2,,mf(ai)
Suppose that
∥ ∑ i ≠ p ω i a p − a i ∥ a p − a i ∥ ∥ > ω p \|\sum_{i\neq p}\omega_i\frac{\mathbf{a}_p-\mathbf{a}_i}{\|\mathbf{a}_p-\mathbf{a}_i\|}\|>\omega_{p} i=pωiapaiapai>ωp
(i)Show that there exists a direction d ∈ R n \mathbf{d}\in\mathbb{R}^n dRn such that f ′ ( a p ; d ) < 0 f'\left(\mathbf{a}_p;d\right)<0 f(ap;d)<0
(ii)Show that there exists x 0 ∈ R n \mathbf{x}_0\in\mathbb{R}^n x0Rn satisfying f ( x 0 ) < min ⁡ { f ( a 1 ) , ⋯   , f ( a p ) } f\left(\mathbf{x}_0\right)<\min\left\{f\left(\mathbf{a}_1\right),\cdots,f\left(\mathbf{a}_p\right)\right\} f(x0)<min{f(a1),,f(ap)}.Explain how to compute such a vector.

解:
(i)
f ′ ( a p ; d ) = lim ⁡ t → 0 + f ( a p + t d ) − f ( a p ) t = lim ⁡ t → 0 + ∑ i ≠ p ω i ( ∥ a p + t d − a i ∥ − ∥ a p − a i ∥ ) + ω p ∥ t d ∥ t ≤ lim ⁡ t → 0 + ∑ i = 1 m ω i ∥ t d ∥ t = 0 \begin{aligned} f'\left(\mathbf{a}_p;d\right)&=\lim\limits_{t\to 0^+}\frac{f\left(\mathbf{a}_p+t\mathbf{d}\right)-f\left(\mathbf{a}_p\right)}{t}\\ &=\lim\limits_{t\to 0^+}\frac{\sum_{i\neq p}\omega_i\left(\|\mathbf{a}_p+t\mathbf{d}-\mathbf{a}_i\|-\|\mathbf{a}_p-\mathbf{a}_i\|\right)+\omega_p\|t\mathbf{d}\|}{t}\\ &\le\lim\limits_{t\to 0^+}\frac{\sum_{i=1}^{m}\omega_i\|t\mathbf{d}\|}{t} =0 \end{aligned} f(ap;d)=t0+limtf(ap+td)f(ap)=t0+limti=pωi(ap+tdaiapai)+ωptdt0+limti=1mωitd=0
然后不会了
(ii)

4.5. In the “source localization problem” we are given m m m locations of sensors a 1 , ⋯   , a m ∈ R n \mathbf{a}_1,\cdots,\mathbf{a}_m\in\mathbb{R}^n a1,,amRn and approximate distances between the sensors and an unknown “source” located at x ∈ R n \mathbf{x}\in \mathbb{R}^n xRn:
d i ≈ ∥ x − a i ∥ \mathbf{d}_i\approx\|\mathbf{x}-\mathbf{a}_i\| dixai
The problem is to find and estimate x \mathbf{x} x given the locations a 1 , ⋯   , a m \mathbf{a}_1,\cdots,\mathbf{a}_m a1,,am and the approximate distances d 1 , ⋯   , d m d_1,\cdots,d_m d1,,dm.A natural formulation as an optimization problem is to consider the nonlinear least squares problem
 (SL)  min ⁡ { f ( x ) ≡ ∑ i = 1 m ( ∥ x − a i ∥ − d i ) 2 } \text { (SL) } \min \left\{f(\mathbf{x}) \equiv \sum_{i=1}^{m}\left(\left\|\mathbf{x}-\mathbf{a}_{i}\right\|-d_{i}\right)^{2}\right\}  (SL) min{f(x)i=1m(xaidi)2}
We will denote the set of sensors of A ≡ { a 1 , ⋯   , a m } \mathscr{A}\equiv \left\{\mathbf{a}_1,\cdots,\mathbf{a}_m\right\} A{a1,,am}
(i)Show that the optimality condition ∇ f ( x ) = 0 ( x ∉ A ) \nabla f\left(\mathbf{x}\right)=0\left(\mathbf{x}\notin\mathscr{A}\right) f(x)=0(x/A) is the same as
x = 1 m { ∑ i = 1 m a i + ∑ i = 1 m d i x − a i ∥ x − a i ∥ } \mathbf{x}=\frac{1}{m}\left\{\sum_{i=1}^{m} \mathbf{a}_{i}+\sum_{i=1}^{m} d_{i} \frac{\mathbf{x}-\mathbf{a}_{i}}{\left\|\mathbf{x}-\mathbf{a}_{i}\right\|}\right\} x=m1{i=1mai+i=1mdixaixai}
(ii)Show that the corresponding fixed point method
x k + 1 = 1 m { ∑ i = 1 m a i + ∑ i = 1 m d i x k − a i ∥ x k − a i ∥ } \mathbf{x}_{k+1}=\frac{1}{m}\left\{\sum_{i=1}^{m} \mathbf{a}_{i}+\sum_{i=1}^{m} d_{i} \frac{\mathbf{x}_{k}-\mathbf{a}_{i}}{\left\|\mathbf{x}_{k}-\mathbf{a}_{i}\right\|}\right\} xk+1=m1{i=1mai+i=1mdixkaixkai}
is a gradient method, assuming that x k ∉ A \mathbf{x}_k\notin \mathscr{A} xk/A for all k ≥ 0 k\ge 0 k0.What is the stepsize?

解:
(i)
∇ f ( x ) = ∑ i = 1 m ( ∥ x − a i ∥ − d i ) ( x − a i ) ∥ x − a ∥ = 0 ⇒ x = 1 m { ∑ i = 1 m a i + ∑ i = 1 m d i x − a i ∥ x − a i ∥ } \nabla f\left(\mathbf{x}\right)=\sum_{i=1}^{m}\frac{\left(\left\|\mathbf{x}-\mathbf{a}_{i}\right\|-d_{i}\right)\left(\mathbf{x}-\mathbf{a}_i\right)}{\|\mathbf{x}-\mathbf{a}\|}=0\Rightarrow \mathbf{x}=\frac{1}{m}\left\{\sum_{i=1}^{m} \mathbf{a}_{i}+\sum_{i=1}^{m} d_{i} \frac{\mathbf{x}-\mathbf{a}_{i}}{\left\|\mathbf{x}-\mathbf{a}_{i}\right\|}\right\} f(x)=i=1mxa(xaidi)(xai)=0x=m1{i=1mai+i=1mdixaixai}
(ii)
x k + 1 = x k − 1 m ∇ f ( x k ) \mathbf{x}_{k+1}=\mathbf{x}_k-\frac{1}{m}\nabla f\left(\mathbf{x}_k\right) xk+1=xkm1f(xk)
是一个梯度方法,固定步长 1 m \frac{1}{m} m1

4.6. Another formulation of the source localization problem consists of minimizing the following objective function:
 (SL2)  min ⁡ x ∈ R n { f ( x ) ≡ ∑ i = 1 m ( ∥ x − a i ∥ 2 − d i 2 ) 2 } \text { (SL2) } \min _{\mathbf{x} \in \mathbb{R}^{n}}\left\{f(\mathbf{x}) \equiv \sum_{i=1}^{m}\left(\left\|\mathbf{x}-\mathbf{a}_{i}\right\|^{2}-d_{i}^{2}\right)^{2}\right\}  (SL2) xRnmin{f(x)i=1m(xai2di2)2}
This is of course a nonlinear least squares problem, and thus the Gauss-Newton method can be employed in order to solve it. We will assume that n = 2 n=2 n=2.
(i)Show that as long as all the points a 1 , ⋯   , a m \mathbf{a}_1,\cdots,\mathbf{a}_m a1,,am do not reside on the same line in the plane, the method is well-defined, meaning that the linear least squares problem solved at each iteration has a unique solution.
(ii)Write a MATLAB function that implements the damped Gauss-Newton
method employed on problem (SL2) with a backtracking line search strategy with parameters s = 1 , α = β = 0.5 , ϵ = 1 0 − 4 s=1,\alpha=\beta=0.5,\epsilon =10^{-4} s=1,α=β=0.5,ϵ=104.Run the function on the two-dimensional problem ( n = 2 ) \left(n=2\right) (n=2) with 5 anchors ( m = 5 ) \left(m=5\right) (m=5) and data generated by the MATLAB commands

randn('seed',317);
A=randn(2,5);
x=randn(2,1);
d=sqrt(sum((A-x*ones(1,5).^2)))+0.05*randn(1,5);
d=d';

The columns of the 2 × 5 2\times 5 2×5 matrix A \mathbf{A} A are the locations of the five sensors, x \mathbf{x} x is the “true” location of the source, and d \mathbf{d} d is the vector of noisy measurements between the source and the sensors. Compare your results (e.g., number of iterations) to the gradient method with backtracking and parameters s = 1 , α = β = 0.5 , ϵ = 1 0 − 4 s=1,\alpha=\beta=0.5,\epsilon=10^{-4} s=1,α=β=0.5,ϵ=104.Start both methods with the initial vector ( 1000 , − 500 ) T \left(1000,-500\right)^{T} (1000,500)T.
4.7. Let f ( x ) = x T A x + 2 b T x + c f\left(\mathbf{x}\right)=\mathbf{x}^T\mathbf{Ax}+2\mathbf{b}^T\mathbf{x}+c f(x)=xTAx+2bTx+c,where A \mathbf{A} A is a symmetric n × n n\times n n×n matrix, b ∈ R n \mathbf{b}\in\mathbb{R}^n bRn,and c ∈ R c\in \mathbb{R} cR.Show that the smallest Lipschitz constant of ∇ f \nabla f f is 2 ∥ A ∥ 2\|\mathbf{A}\| 2A.
解:
∇ f ( x ) = 2 A x + 2 b \nabla f\left(\mathbf{x}\right)=2\mathbf{Ax}+2\mathbf{b} f(x)=2Ax+2b
∥ ∇ f ( x ) − ∇ ( y ) ∥ = ∥ 2 A ( x − y ) ∥ ≤ 2 ∥ A ∥ ∥ x − y ∥ \|\nabla f\left(\mathbf{x}\right)-\nabla\left(\mathbf{y}\right)\|=\|2\mathbf{A}\left(\mathbf{x}-\mathbf{y}\right)\|\le 2\|\mathbf{A}\|\|\mathbf{x}-\mathbf{y}\| f(x)(y)=2A(xy)2Axy
所以 L ≤ 2 ∥ A ∥ L\le 2\|\mathbf{A}\| L2A

A x = λ 1 x \mathbf{Ax}=\lambda_1\mathbf{x} Ax=λ1x时, ∥ ∇ f ( x ) − ∇ f ( 0 ) ∥ = 2 ∥ A x ∥ = 2 λ 1 ∥ x ∥ = 2 ∥ A ∥ ∥ x − 0 ∥ \|\nabla f\left(\mathbf{x}\right)-\nabla f\left(0\right)\|=2\|\mathbf{Ax}\|=2\lambda_1\|\mathbf{x}\|=2\|\mathbf{A}\|\|\mathbf{x}-0\| f(x)f(0)=2Ax=2λ1x=2Ax0
所以 L = 2 ∥ A ∥ L=2\|\mathbf{A}\| L=2A

4.8. Let f : R n → R f:\mathbb{R}^n\to\mathbb{R} f:RnR be given by f ( x ) = 1 + ∥ x ∥ 2 f\left(\mathbf{x}\right)=\sqrt{1+\|\mathbf{x}\|^2} f(x)=1+x2 .Show that f ∈ C 1 1 , 1 f\in C_{1}^{1,1} fC11,1.

解:
∇ f ( x ) = x 1 + ∥ x ∥ 2 \nabla f\left(\mathbf{x}\right)=\frac{\mathbf{x}}{\sqrt{1+\|\mathbf{x}\|^2}} f(x)=1+x2 x
∇ 2 f ( x ) = ( 1 + x T x ) I − x x T ( 1 + ∥ x ∥ 2 ) 3 2 \nabla^2 f\left(\mathbf{x}\right)=\frac{\left(1+\mathbf{x}^T\mathbf{x}\right)\mathbf{I}-\mathbf{x}\mathbf{x}^T}{\left(1+\|\mathbf{x}\|^2\right)^{\frac{3}{2}}} 2f(x)=(1+x2)23(1+xTx)IxxT
注意到 x x T \mathbf{x}\mathbf{x}^T xxT的特征值为 n − 1 n-1 n1个0和1个 x T x \mathbf{x}^T\mathbf{x} xTx
于是 ∥ x T x I − x x T ∥ = x T x \|\mathbf{x}^T\mathbf{x}\mathbf{I}-\mathbf{x}\mathbf{x}^T\|=\mathbf{x}^T\mathbf{x} xTxIxxT=xTx
于是
∥ ∇ 2 f ( x ) ∥ ≤ ∥ I ∥ + ∥ x T x I − x x T ∥ ( 1 + ∥ x ∥ 2 ) 3 2 = 1 + x T x ( 1 + ∥ x ∥ 2 ) 3 2 ≤ 1 \|\nabla^2 f\left(\mathbf{x}\right)\|\le\frac{\|\mathbf{I}\|+\|\mathbf{x}^T\mathbf{x}\mathbf{I}-\mathbf{x}\mathbf{x}^T\|}{\left(1+\|\mathbf{x}\|^2\right)^{\frac{3}{2}}}= \frac{1+\mathbf{x}^T\mathbf{x}}{{\left(1+\|\mathbf{x}\|^2\right)^{\frac{3}{2}}}}\le 1 2f(x)(1+x2)23I+xTxIxxT=(1+x2)231+xTx1

所以 L = 1 L=1 L=1

4.9. Let f ∈ C L 1 , 1 ( R m ) f\in C_{L}^{1,1}\left(\mathbb{R}^m\right) fCL1,1(Rm), and let A ∈ R m × n , b ∈ R m \mathbf{A}\in\mathbb{R}^{m\times n},\mathbf{b}\in\mathbb{R}^m ARm×n,bRm.Show that the function g : R n → R g:\mathbb{R}^{n}\to \mathbb{R} g:RnR defined by g ( x ) = f ( A x + b ) g\left(\mathbf{x}\right)=f\left(\mathbf{Ax}+\mathbf{b}\right) g(x)=f(Ax+b) satisfies g ∈ C L ~ 1 , 1 ( R n ) g\in C_{\tilde{L}}^{1,1}\left(\mathbb{R}^n\right) gCL~1,1(Rn),where L ~ = ∥ A ∥ 2 L \tilde{L}=\|\mathbf{A}\|^2L L~=A2L.
解:
∇ g ( x ) = A T ∇ f ( A x + b ) \nabla g\left(\mathbf{x}\right)=\mathbf{A}^T\nabla f\left(\mathbf{Ax}+b\right) g(x)=ATf(Ax+b)
∥ ∇ g ( x ) − ∇ g ( y ) ∥ = ∥ A T ( ∇ f ( A x + b ) − ∇ f ( A y + b ) ) ∥ ≤ ∥ A T ∥ ∥ A ( x − y ) ∥ ≤ ∥ A ∥ 2 L ∥ x − y ∥ \|\nabla g\left(\mathbf{x}\right)-\nabla g\left(\mathbf{y}\right)\|=\|\mathbf{A}^T\left(\nabla f\left(\mathbf{Ax}+\mathbf{b}\right)-\nabla f\left(\mathbf{Ay}+\mathbf{b}\right)\right)\|\le\|\mathbf{A}^T\|\|\mathbf{A}\left(\mathbf{x}-\mathbf{y}\right)\|\le \|\mathbf{A}\|^2L\|\mathbf{x}-\mathbf{y}\| g(x)g(y)=AT(f(Ax+b)f(Ay+b))ATA(xy)A2Lxy
所以 L ~ = ∥ A ∥ 2 L \tilde{L}=\|\mathbf{A}\|^2L L~=A2L

4.10. Give an example of a function f ∈ C L 1 , 1 ( R ) f\in C_{L}^{1,1}\left(\mathbb{R}\right) fCL1,1(R) and a starting point x 0 ∈ R \mathbf{x}_0\in\mathbb{R} x0R such that the problem min f ( x ) f\left(\mathbf{x}\right) f(x) has an optimal solution and the gradient method with constant stepsize t = 2 L t=\frac{2}{L} t=L2 diverges.
4.11. Suppose that f ∈ C L 1 , 1 ( R n ) f\in C_{L}^{1,1}\left(\mathbb{R}^{n}\right) fCL1,1(Rn) and assume that ∇ 2 f ( x ) ⪰ 0 \nabla^2 f\left(\mathbf{x}\right)\succeq 0 2f(x)0 for any x ∈ R n \mathbf{x}\in\mathbb{R}^n xRn. Suppose that the optimal value of the problem min ⁡ x ∈ R n f ( x ) \min_{\mathbf{x}\in\mathbb{R}^n} f\left(\mathbf{x}\right) minxRnf(x) is f ∗ f^* f. Let { x k } k ≥ 0 \left\{\mathbf{x}_k\right\}_{k\ge 0} {xk}k0 be thesequence generated by the gradient method with constant stepsize 1 L \frac{1}{L} L1.Show that if { x k } k ≥ 0 \left\{\mathbf{x}_k\right\}_{k\ge 0} {xk}k0 is bounded,then f ( x k ) → f ∗ f\left(\mathbf{x}_k\right)\to f^* f(xk)f as k → ∞ k\to \infty k.

  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Nightmare004

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值