Introduction to nonlinear optimization第四章习题

Nightmare004

已于 2022-02-10 11:05:02 修改

阅读量1.7k

点赞数 2

分类专栏：数学文章标签：线性代数

于 2022-02-06 15:52:35 首次发布

本文链接：https://blog.csdn.net/qq_39942341/article/details/122794374

版权

数学专栏收录该内容

143 篇文章 18 订阅

订阅专栏

(抄了个题，只做了以前上课写的题目，哪天有空了做一下）
4.1. Let $f\in C_{L}^{1,1}\left(\mathbb{R}^n\right)$ and let $\left\{\mathbf{x}_{k}\right\}_{k\ge 0}$ be the sequence generated by the gradient method with a constant stepsize $t_k=\frac{1}{L}$ .Assume that $\mathbf{x}_{k}\to \mathbf{x}^*$ .Show that if $\nabla f\left(\mathbf{x}_{k}\right)\neq 0$ for all $k\ge 0$ ,then $\mathbf{x}^*$ is not a local maximum point.

4.2. [9, Exercise 1.3.3] Consider the minimization problem
$\min \left\{\mathbf{x}^T\mathbf{Qx}:\mathbf{x}\in\mathbb{R}^2\right\}$
where $\mathbf{Q}$ is a positive definite $2\times 2$ matrix. Suppose we use the diagonal scaling matrix
$\mathbf{D}=\begin{pmatrix} \mathbf{Q}_{11}^{-1}& 0\\ 0& \mathbf{Q}_{22}^{-1}\\ \end{pmatrix}$
Show that the above scaling matrix improves the condition number of $\mathbf{Q}$ in the sense that
$\chi\left(\mathbf{D}^{\frac{1}{2}}\mathbf{Q}\mathbf{D}^{\frac{1}{2}}\right)\le \chi\left(\mathbf{Q}\right)$

解：
设
$\mathbf{Q}=\begin{pmatrix} Q_{11}&Q_{12}\\ Q_{12}&Q_{22}\\ \end{pmatrix}$
$\mathbf{D}^{\frac{1}{2}}\mathbf{Q}\mathbf{D}^{\frac{1}{2}}=\begin{pmatrix} 1&\frac{Q_{12}}{\sqrt{Q_{11}Q_{22}}}\\ \frac{Q_{12}}{\sqrt{Q_{11}Q_{22}}}&1\\ \end{pmatrix}$
因为 $Q\succ 0$ ,所以
$Q_{11},Q_{22}>0,\quad Q_{11}Q_{22}>Q_{12}^2$
因为
$k\mathbf{Q}\succ0,\quad\left(k>0\right)$
并且
$\chi\left(\mathbf{Q}\right)=\chi\left(k\mathbf{Q}\right),\quad \left(k>0\right)$
考虑
$\mathbf{A}=\begin{pmatrix} \alpha&1\\ 1&\beta\\ \end{pmatrix},\quad \alpha,\beta>0,\alpha\beta>1$
可以手算出 $\mathbf{A}$ 的特征值，于是
$\chi\left(\mathbf{A}\right)=\frac{\alpha+\beta+\sqrt{\left(\alpha-\beta\right)^2+4}}{\alpha+\beta-\sqrt{\left(\alpha-\beta\right)^2+4}}=\frac{1+\sqrt{\frac{\left(\alpha-\beta\right)^2+4}{\left(\alpha+\beta\right)^2}}}{1-\sqrt{\frac{\left(\alpha-\beta\right)^2+4}{\left(\alpha+\beta\right)^2}}}$

令 $\mathbf{C}=\frac{1}{Q_{12}}\mathbf{Q}=\begin{pmatrix} \alpha&1\\ 1&\beta \end{pmatrix}$
其中 $\alpha=\frac{Q_{11}}{Q_{12}},\beta=\frac{Q_{22}}{Q_{12}}$
于是 $\mathbf{P}=\frac{\sqrt{\alpha\beta}}{Q_{12}}\mathbf{D}^{\frac{1}{2}}\mathbf{Q}\mathbf{D}^{\frac{1}{2}}= \begin{pmatrix} \sqrt{\alpha\beta} & 1\\ 1& \sqrt{\alpha\beta} \end{pmatrix}$
于是
$\chi\left(\mathbf{C}\right)=\chi\left(\mathbf{Q}\right)=\frac{1+\sqrt{\frac{\left(\alpha-\beta\right)^2+4}{\left(\alpha+\beta\right)^2}}}{1-\sqrt{\frac{\left(\alpha-\beta\right)^2+4}{\left(\alpha+\beta\right)^2}}}$

$\chi\left(\mathbf{P}\right)=\chi\left(\mathbf{D}^{\frac{1}{2}}\mathbf{Q}\mathbf{D}^{\frac{1}{2}}\right)=\frac{1+\sqrt{\frac{1}{\alpha\beta}}}{1-\sqrt{\frac{1}{\alpha\beta}}}$
$f(t)=\frac{1+\sqrt{t}}{1-\sqrt{t}}$ 在 $\left(0,1\right)$ 上单调递增
因为 $\frac{\left(\alpha-\beta\right)^2+4}{\left(\alpha+\beta\right)^2}>\frac{1}{\alpha\beta}$
所以 $\chi\left(\mathbf{C}\right)>\chi\left(\mathbf{P}\right)$
所以
$\chi\left(\mathbf{D}^{\frac{1}{2}}\mathbf{Q}\mathbf{D}^{\frac{1}{2}}\right)\le \chi\left(\mathbf{Q}\right)$

4.3. Consider the quadratic minimization problem
$\min\left\{\mathbf{x}^T\mathbf{Ax}:\mathbf{x}\in\mathbb{R}^5\right\}$
where $\mathbf{A}%$ is the $5\times 5$ Hillbert matrix defined by
$\mathbf{A}_{i,j}=\frac{1}{i+j-1},\quad i,j=1,2,3,4,5$
The matrix can he constructed via the MATLAB command

A=hilb(5)

Run the following methods and compare the number of iterations required by each of the methods when the initial vector is $\mathbf{x}_0=\left(1,2,3,4,5\right)^T$ to obtain a solution $\mathbf{x}$ with $\|\nabla f\left(\mathbf{x}\right)\|\le 10^{-4}$ :

gradient method with backtracking stepsize rule and parameters $\alpha=0.5,\beta=0.5,s=1$ ;
gradient method with backtracking stepsize rule and parameters $\alpha=0.1,\beta=0.5,s=1$ ;
gradient method with exact line search;
diagonally scaled gradient method with diagonal elements $D_{ii}=\frac{1}{\mathbf{A}_{ii}},i=1,2,3,4,5$ and exact line search;
diagonally scaled gradient method with diagonal elements $D_{ii}=\frac{1}{\mathbf{A}_{ii}},i=1,2,3,4,5$ and backtracking line search with parameters $\alpha=0.1,\beta=0.5,s=1$ .

解：

function [x,fun_val]=gradient_method_backtracking(f,g,x0,s,alpha,...
    beta,epsilon)
% Gradient method with backtracking stepsize rule
%
% INPUT
%=======================================
% f ......... objective function
% g ......... gradient of the objective function
% x0......... initial point
% s ......... initial choice of stepsize
% alpha ..... tolerance parameter for the stepsize selection
% beta ...... the constant in which the stepsize is multiplied
%             at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
%             of min f(x)
% fun_val ... optimal function value
x=x0;
grad=g(x);
fun_val=f(x);
iter=0;
while (norm(grad)>epsilon)
    iter=iter+1;
    t=s;
    while (fun_val-f(x-t*grad)<alpha*t*norm(grad)^2)
        t=beta*t;
    end
    x=x-t*grad;
    fun_val=f(x);
    grad=g(x);
    fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...
        iter,norm(grad),fun_val);
end

function [x,fun_val]=gradient_method_quadratic(A,b,x0,epsilon)
% INPUT
% ======================·
% A ....... the positive definite matrix associated with the
%           objective function
% b ....... a column vector associated with the linear part of the
%           objective function
% x0 ...... starting point of the method
%           epsilon . tolerance parameter
% OUTPUT
% =======================
% x ....... an optimal solution (up to a tolerance) of
%           min(x^T A x+2 b^T x)
% fun_val . the optimal function value up to a tolerance

x=x0;
iter=0;
grad=2*(A*x+b);
while (norm(grad)>epsilon)
     iter=iter+1;
     t=norm(grad)^2/(2*grad'*A*grad);
     x=x-t*grad;
     grad=2*(A*x+b);
     fun_val=x'*A*x+2*b'*x;
     fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f\n',...
         iter,norm(grad),fun_val);
end

function [x,fun_val]=gradient_scaled_quadratic(A,b,D,x0,epsilon)
% INPUT
% ======================
% A ....... the positive definite matrix associated 
%           with the objective function
% b ....... a column vector associated with the linear part
%           of the objective function
% D ....... scaling matrix
% x0 ...... starting point of the method
% epsilon . tolerance parameter
% OUTPUT
% =======================
% x ....... an optimal solution (up to a tolerance) ...
%           of min(x^T A x+2 b^T x)
% fun_val . the optimal function value up to a tolerance

x=x0;
iter=0;
grad=2*(A*x+b);
while (norm(grad)>epsilon)
    iter=iter+1;
    t=grad'*D*grad/(2*(grad'*D')*A*(D*grad));
    x=x-t*D*grad;
    grad=2*(A*x+b);
    fun_val=x'*A*x+2*b'*x;
    fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...
        iter,norm(grad),fun_val);
end

function [x,fun_val]=gradient_scaled_quadratic_backtracking(A,b,D,x0,s,...
    alpha,beta,epsilon)
% INPUT
% ======================
% A ....... the positive definite matrix associated 
%           with the objective function
% b ....... a column vector associated with the linear part
%           of the objective function
% D ....... scaling matrix
% x0....... initial point
% s ....... initial choice of stepsize
% alpha ... tolerance parameter for the stepsize selection
% beta .... the constant in which the stepsize is multiplied
%           at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
% =======================
% x ....... an optimal solution (up to a tolerance) ...
%           of min(x^T A x+2 b^T x)
% fun_val . the optimal function value up to a tolerance

x=x0;
iter=0;
grad=2*(A*x+b);
fun_val=x'*A*x+2*b'*x;
while (norm(grad)>epsilon)
    iter=iter+1;
    t=s;
    while (fun_val-((x-t*D*grad)'*A*(x-t*D*grad)+2*b'*(x-t*D*grad))<alpha*t*grad'*D*grad)
        t=beta*t;
    end
    x=x-t*D*grad;
    grad=2*(A*x+b);
    fun_val=x'*A*x+2*b'*x;
    fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...
        iter,norm(grad),fun_val);
end

A=hilb(5);
b=zeros(size(A,2),1);
D=diag(1./diag(A));
f=@(x)x'*A*x;
g=@(x)2*A*x;
s=1;
alpha=0.5;
beta=0.5;
epsilon=1e-4;
x0=[1,2,3,4,5]';

s=1;
alpha=0.5;
beta=0.5;
gradient_method_backtracking(f,g,x0,s,alpha,beta,epsilon);

s=1;
alpha=0.1;
beta=0.5;
gradient_method_backtracking(f,g,x0,s,alpha,beta,epsilon);

gradient_method_quadratic(A,b,x0,epsilon);

s=1;
alpha=0.1;
beta=0.5;
gradient_scaled_quadratic_backtracking(A,b,D,x0,s,alpha,beta,epsilon);

回溯法： $\alpha=0.5,\beta=0.5,s=1$ 需要3301
回溯法： $\alpha=0.1,\beta=0.5,s=1$ 需要3732
精确线搜索：需要1271
diagonally scaled+精确线搜索：需要235
diagonally scaled+回溯法：需要104

4.4. Consider the Fermat-Weber problem
$\min_{\mathbf{x}\in\mathbb{R}^n}\left\{f\left(\mathbf{x}\right)=\sum_{i=1}^{m}\omega_i\|\mathbf{x}-\mathbf{a}_i\|\right\}$
where $\omega_1,\cdots,\omega_m>0$ and $\mathbf{a}_1,\cdots,\mathbf{a}_m\in\mathbb{R}^n$ are m different points. Let
$p\in \operatorname{argmin}_{i=1,2,\cdots,m} f\left(\mathbf{a}_i\right)$
Suppose that
$\|\sum_{i\neq p}\omega_i\frac{\mathbf{a}_p-\mathbf{a}_i}{\|\mathbf{a}_p-\mathbf{a}_i\|}\|>\omega_{p}$
(i)Show that there exists a direction $\mathbf{d}\in\mathbb{R}^n$ such that $f'\left(\mathbf{a}_p;d\right)<0$
(ii)Show that there exists $\mathbf{x}_0\in\mathbb{R}^n$ satisfying $f\left(\mathbf{x}_0\right)<\min\left\{f\left(\mathbf{a}_1\right),\cdots,f\left(\mathbf{a}_p\right)\right\}$ .Explain how to compute such a vector.

解：
(i)
$\begin{aligned} f'\left(\mathbf{a}_p;d\right)&=\lim\limits_{t\to 0^+}\frac{f\left(\mathbf{a}_p+t\mathbf{d}\right)-f\left(\mathbf{a}_p\right)}{t}\\ &=\lim\limits_{t\to 0^+}\frac{\sum_{i\neq p}\omega_i\left(\|\mathbf{a}_p+t\mathbf{d}-\mathbf{a}_i\|-\|\mathbf{a}_p-\mathbf{a}_i\|\right)+\omega_p\|t\mathbf{d}\|}{t}\\ &\le\lim\limits_{t\to 0^+}\frac{\sum_{i=1}^{m}\omega_i\|t\mathbf{d}\|}{t} =0 \end{aligned}$
然后不会了
(ii)

4.5. In the “source localization problem” we are given $m$ locations of sensors $\mathbf{a}_1,\cdots,\mathbf{a}_m\in\mathbb{R}^n$ and approximate distances between the sensors and an unknown “source” located at $\mathbf{x}\in \mathbb{R}^n$ :
$\mathbf{d}_i\approx\|\mathbf{x}-\mathbf{a}_i\|$
The problem is to find and estimate $\mathbf{x}$ given the locations $\mathbf{a}_1,\cdots,\mathbf{a}_m$ and the approximate distances $d_1,\cdots,d_m$ .A natural formulation as an optimization problem is to consider the nonlinear least squares problem
$\text { (SL) } \min \left\{f(\mathbf{x}) \equiv \sum_{i=1}^{m}\left(\left\|\mathbf{x}-\mathbf{a}_{i}\right\|-d_{i}\right)^{2}\right\}$
We will denote the set of sensors of $\mathscr{A}\equiv \left\{\mathbf{a}_1,\cdots,\mathbf{a}_m\right\}$
(i)Show that the optimality condition $\nabla f\left(\mathbf{x}\right)=0\left(\mathbf{x}\notin\mathscr{A}\right)$ is the same as
$\mathbf{x}=\frac{1}{m}\left\{\sum_{i=1}^{m} \mathbf{a}_{i}+\sum_{i=1}^{m} d_{i} \frac{\mathbf{x}-\mathbf{a}_{i}}{\left\|\mathbf{x}-\mathbf{a}_{i}\right\|}\right\}$
(ii)Show that the corresponding fixed point method
$\mathbf{x}_{k+1}=\frac{1}{m}\left\{\sum_{i=1}^{m} \mathbf{a}_{i}+\sum_{i=1}^{m} d_{i} \frac{\mathbf{x}_{k}-\mathbf{a}_{i}}{\left\|\mathbf{x}_{k}-\mathbf{a}_{i}\right\|}\right\}$
is a gradient method, assuming that $\mathbf{x}_k\notin \mathscr{A}$ for all $k\ge 0$ .What is the stepsize?

解：
(i)
$\nabla f\left(\mathbf{x}\right)=\sum_{i=1}^{m}\frac{\left(\left\|\mathbf{x}-\mathbf{a}_{i}\right\|-d_{i}\right)\left(\mathbf{x}-\mathbf{a}_i\right)}{\|\mathbf{x}-\mathbf{a}\|}=0\Rightarrow \mathbf{x}=\frac{1}{m}\left\{\sum_{i=1}^{m} \mathbf{a}_{i}+\sum_{i=1}^{m} d_{i} \frac{\mathbf{x}-\mathbf{a}_{i}}{\left\|\mathbf{x}-\mathbf{a}_{i}\right\|}\right\}$
(ii)
$\mathbf{x}_{k+1}=\mathbf{x}_k-\frac{1}{m}\nabla f\left(\mathbf{x}_k\right)$
是一个梯度方法，固定步长 $\frac{1}{m}$

4.6. Another formulation of the source localization problem consists of minimizing the following objective function:
$\text { (SL2) } \min _{\mathbf{x} \in \mathbb{R}^{n}}\left\{f(\mathbf{x}) \equiv \sum_{i=1}^{m}\left(\left\|\mathbf{x}-\mathbf{a}_{i}\right\|^{2}-d_{i}^{2}\right)^{2}\right\}$
This is of course a nonlinear least squares problem, and thus the Gauss-Newton method can be employed in order to solve it. We will assume that $n = 2$ .
(i)Show that as long as all the points $\mathbf{a}_1,\cdots,\mathbf{a}_m$ do not reside on the same line in the plane, the method is well-defined, meaning that the linear least squares problem solved at each iteration has a unique solution.
(ii)Write a MATLAB function that implements the damped Gauss-Newton
method employed on problem (SL2) with a backtracking line search strategy with parameters $s=1,\alpha=\beta=0.5,\epsilon =10^{-4}$ .Run the function on the two-dimensional problem $\left(n=2\right)$ with 5 anchors $\left(m=5\right)$ and data generated by the MATLAB commands

randn('seed',317);
A=randn(2,5);
x=randn(2,1);
d=sqrt(sum((A-x*ones(1,5).^2)))+0.05*randn(1,5);
d=d';

The columns of the $2\times 5$ matrix $\mathbf{A}$ are the locations of the five sensors, $\mathbf{x}$ is the “true” location of the source, and $\mathbf{d}$ is the vector of noisy measurements between the source and the sensors. Compare your results (e.g., number of iterations) to the gradient method with backtracking and parameters $s=1,\alpha=\beta=0.5,\epsilon=10^{-4}$ .Start both methods with the initial vector $\left(1000,-500\right)^{T}$ .
4.7. Let $f\left(\mathbf{x}\right)=\mathbf{x}^T\mathbf{Ax}+2\mathbf{b}^T\mathbf{x}+c$ ,where $\mathbf{A}$ is a symmetric $n\times n$ matrix, $\mathbf{b}\in\mathbb{R}^n$ ,and $c\in \mathbb{R}$ .Show that the smallest Lipschitz constant of $\nabla f$ is $2\|\mathbf{A}\|$ .
解：
$\nabla f\left(\mathbf{x}\right)=2\mathbf{Ax}+2\mathbf{b}$
$\|\nabla f\left(\mathbf{x}\right)-\nabla\left(\mathbf{y}\right)\|=\|2\mathbf{A}\left(\mathbf{x}-\mathbf{y}\right)\|\le 2\|\mathbf{A}\|\|\mathbf{x}-\mathbf{y}\|$
所以 $L\le 2\|\mathbf{A}\|$

当 $\mathbf{Ax}=\lambda_1\mathbf{x}$ 时， $\|\nabla f\left(\mathbf{x}\right)-\nabla f\left(0\right)\|=2\|\mathbf{Ax}\|=2\lambda_1\|\mathbf{x}\|=2\|\mathbf{A}\|\|\mathbf{x}-0\|$
所以 $L=2\|\mathbf{A}\|$

4.8. Let $f:\mathbb{R}^n\to\mathbb{R}$ be given by $f\left(\mathbf{x}\right)=\sqrt{1+\|\mathbf{x}\|^2}$ .Show that $f\in C_{1}^{1,1}$ .

解：
$\nabla f\left(\mathbf{x}\right)=\frac{\mathbf{x}}{\sqrt{1+\|\mathbf{x}\|^2}}$
$\nabla^2 f\left(\mathbf{x}\right)=\frac{\left(1+\mathbf{x}^T\mathbf{x}\right)\mathbf{I}-\mathbf{x}\mathbf{x}^T}{\left(1+\|\mathbf{x}\|^2\right)^{\frac{3}{2}}}$
注意到 $\mathbf{x}\mathbf{x}^T$ 的特征值为 $n - 1$ 个0和1个 $\mathbf{x}^T\mathbf{x}$
于是 $\|\mathbf{x}^T\mathbf{x}\mathbf{I}-\mathbf{x}\mathbf{x}^T\|=\mathbf{x}^T\mathbf{x}$
于是
$\|\nabla^2 f\left(\mathbf{x}\right)\|\le\frac{\|\mathbf{I}\|+\|\mathbf{x}^T\mathbf{x}\mathbf{I}-\mathbf{x}\mathbf{x}^T\|}{\left(1+\|\mathbf{x}\|^2\right)^{\frac{3}{2}}}= \frac{1+\mathbf{x}^T\mathbf{x}}{{\left(1+\|\mathbf{x}\|^2\right)^{\frac{3}{2}}}}\le 1$

所以 $L = 1$

4.9. Let $f\in C_{L}^{1,1}\left(\mathbb{R}^m\right)$ , and let $\mathbf{A}\in\mathbb{R}^{m\times n},\mathbf{b}\in\mathbb{R}^m$ .Show that the function $g:\mathbb{R}^{n}\to \mathbb{R}$ defined by $g\left(\mathbf{x}\right)=f\left(\mathbf{Ax}+\mathbf{b}\right)$ satisfies $g\in C_{\tilde{L}}^{1,1}\left(\mathbb{R}^n\right)$ ,where $\tilde{L}=\|\mathbf{A}\|^2L$ .
解：
$\nabla g\left(\mathbf{x}\right)=\mathbf{A}^T\nabla f\left(\mathbf{Ax}+b\right)$
$\|\nabla g\left(\mathbf{x}\right)-\nabla g\left(\mathbf{y}\right)\|=\|\mathbf{A}^T\left(\nabla f\left(\mathbf{Ax}+\mathbf{b}\right)-\nabla f\left(\mathbf{Ay}+\mathbf{b}\right)\right)\|\le\|\mathbf{A}^T\|\|\mathbf{A}\left(\mathbf{x}-\mathbf{y}\right)\|\le \|\mathbf{A}\|^2L\|\mathbf{x}-\mathbf{y}\|$
所以 $\tilde{L}=\|\mathbf{A}\|^2L$

4.10. Give an example of a function $f\in C_{L}^{1,1}\left(\mathbb{R}\right)$ and a starting point $\mathbf{x}_0\in\mathbb{R}$ such that the problem min $f\left(\mathbf{x}\right)$ has an optimal solution and the gradient method with constant stepsize $t=\frac{2}{L}$ diverges.
4.11. Suppose that $f\in C_{L}^{1,1}\left(\mathbb{R}^{n}\right)$ and assume that $\nabla^2 f\left(\mathbf{x}\right)\succeq 0$ for any $\mathbf{x}\in\mathbb{R}^n$ . Suppose that the optimal value of the problem $\min_{\mathbf{x}\in\mathbb{R}^n} f\left(\mathbf{x}\right)$ is $f^*$ . Let $\left\{\mathbf{x}_k\right\}_{k\ge 0}$ be thesequence generated by the gradient method with constant stepsize $\frac{1}{L}$ .Show that if $\left\{\mathbf{x}_k\right\}_{k\ge 0}$ is bounded,then $f\left(\mathbf{x}_k\right)\to f^*$ as $k\to \infty$ .

Nightmare004

关注

2
点赞
踩
5

收藏

觉得还不错? 一键收藏
打赏
0
评论
Introduction to nonlinear optimization第四章习题

4.1. Let f∈CL1,1(Rn)f\in C_{L}^{1,1}\left(\mathbb{R}^n\right)f∈CL1,1(Rn) and let {xk}k≥0\left\{\mathbf{x}_{k}\right\}_{k\ge 0}{xk}k≥0 be the sequence generated by the gradient method with a constant stepsize tk=1Lt_k=\frac{1}{L}tk=L1.Assume that xk→x∗
复制链接

扫一扫