[笔记] Convex Optimization 2015.09.23

最新推荐文章于 2021-09-06 11:05:00 发布

AlgoComp

最新推荐文章于 2021-09-06 11:05:00 发布

阅读量794

点赞数

分类专栏：笔记文章标签： Convex optimization

本文链接：https://blog.csdn.net/LiJiancheng0614/article/details/48746481

版权

笔记专栏收录该内容

12 篇文章 0 订阅

订阅专栏

Positive (semi)definite metrics

Notations: $A \in S^n$ means $A \in \mathbb{R}^{n \times n}$ , $A$ symmetric
$A \ge 0$ or $A \in S_+^n$ means $A$ positive semidefinite ( $x^TAx \ge 0, \forall x \in \mathbb{R}^n$ )
$A \gt 0$ or $A \in S_{++}^n$ means $A$ positive definite ( $x^TAx \gt 0, \forall x \in \mathbb{R}^n \backslash \{0\}$ )
Reminder: A matrix $A$ is symmetric iff it is orthogonally diagonalizable: $A = P^TDP, P^P = I$
Facts: $A \ge 0 \iff D \ge 0 \iff$ all eigenvalues of A are nonnegative
$A \gt 0 \iff D \gt 0 \iff$ all eigenvalues of A are positive
Proof: If $D \ge 0$ , then $\exists D' \ge 0, s.t. D = D'^2$
Then $x^TAx = x^TP^TD'D'Px = (D'Px)^TD'Px = \lVert D'Px \rVert _2^2 \ge 0$
Notations: Outer product (of $x, y$ ): $xy^T = \begin{bmatrix} \\ x\\ \\ \end{bmatrix} \begin{bmatrix} & y & \\ \end{bmatrix} = \begin{bmatrix} x_1y_1 & \cdots & x_1y_n \\ \vdots & & \vdots \\ x_ny_1 & \cdots & x_ny_n\\ \end{bmatrix}$
Let $A = [a_1^T \cdots a_n^T]^T$ , $B = [b_1 ... b_n]$
Then $BA = \begin{bmatrix} b_1 & \cdots & b_n\\ \end{bmatrix} \begin{bmatrix} a_1^T\\ \vdots \\ a_n^T\\ \end{bmatrix} = b_1a_1^T + … + b_na_n^T$
Lemma: If $A \ge 0$ and $A_{ii} = 0$ , then the ith row and column of $A$ are zero.
Picture:
$\begin{bmatrix} & & 0 & \\ & & 0 & \\ 0 & 0 & 0 & 0 \\ & & 0 & \\ \end{bmatrix}$
Proof: WLOG $i = 1$ , If $A_{1j} \neq 0$ , then let $x = \vec{e_1} + \lambda \vec{e_j}$ .
Then $x^TAx = \lambda ^2 A_{jj} + 2 \lambda A_{1j}$
$\frac{d}{d \lambda} x^TAx \underset{\lambda \neq 0}{=} (2 \lambda A_{jj} + 2A_{1j}) |_{\lambda = 0} = 2A_{1j} \neq 0, (x^TAx)|_{\lambda = 0} = 0$
Lemma: Any matrix of the form $B^TB$ is positive semidefinite, $B \in \mathbb{R}^{m \times n}$
Proof: Trivial ( $x^TB^TBx = \lVert Bx \rVert _2^2$ )
Fact: For every $A \in S_+^n$ there exists $B \in \mathbb{R}^{n \times n}$ , $B$ upper triangle, such that $A = B^TB$ (Cholesky decomposition)
$B^TB = \begin{bmatrix} | & & & \\ | & | & & \\ | & | & | & \\ | & | & | & |\\ \end{bmatrix} \begin{bmatrix} - & - & - & -\\ & - & - & - \\ & & - & - \\ & & & - \\ \end{bmatrix} = A =$
$\begin{bmatrix} | \\ | \\ | \\ | \\ \end{bmatrix} \begin{bmatrix} - & - & - & - \\ \end{bmatrix} + \begin{bmatrix} \\ | \\ | \\ | \\ \end{bmatrix} \begin{bmatrix} & - & - & - \\ \end{bmatrix} + \cdots =$
$\begin{bmatrix} \ast & \ast & \ast & \ast \\ \ast & \ast & \ast & \ast \\ \ast & \ast & \ast & \ast \\ \ast & \ast & \ast & \ast \\ \end{bmatrix} + \begin{bmatrix} & & & \\ & \ast & \ast & \ast \\ & \ast & \ast & \ast \\ & \ast & \ast & \ast \\ \end{bmatrix} + ... + \begin{bmatrix} & & & \\ & & & \\ & & & \\ & & & \ast \\ \end{bmatrix}$
Proof: By lemma, can assume $A_{11} \neq 0$ , in fact, can assume that $A_{11} = 1$ , So
$A = \begin{bmatrix} 1 & a_{12}^T \\ a_{12} & A_{22} \\ \end{bmatrix}, B = \begin{bmatrix} 1 & & a_{12}^T & \\ & - & - & - \\ & & - & - \\ & & & - \\ \end{bmatrix}$
$\begin{bmatrix} 1 \\ a_{12} \\ \end{bmatrix} \begin{bmatrix} 1 & a_{12}^T \\ \end{bmatrix} = \begin{bmatrix} 1 & a_{12}^T \\ a_{12} & a_{12}a_{12}^T \\ \end{bmatrix}$
Have $A - \begin{bmatrix} 1 \\ a_{12} \\ \end{bmatrix} \begin{bmatrix} 1 & a_{12}^T \\ \end{bmatrix} = \begin{bmatrix} 0 & 0 \\ 0 & A_{22} - a_{12}a_{12}^T \\ \end{bmatrix}$
So need to show that:
$S = A_{22} - a_{12}a_{12}^T \in S_+^{n - 1}$ is positive semidefinite
Let $x \in \mathbb{R}^{n - 1}$ ,
Then $x^TSx = x^TAx - x^Ta_{12}a_{12}^Tx = x^TA - (a_{12}^Tx)^2$
Let $y = \begin{bmatrix} -a_{12}^Tx \\ x \\ \\ \end{bmatrix},$
$y^TAy = (a_{12}^Tx)^2 + x^TA_{22}x - (a_{12}^Tx)^2 - (a_{12}^Tx)^2 = x^TSx \ge 0$
Basic Fact: $x \rightarrow \sqrt{x^TAx}$ is a norm if $A \gt 0$
Proof: $\lVert x + y \rVert \le \lVert x \rVert + \lVert y \rVert$
$\sqrt{(x + y)^TA(x + y)} \le \sqrt{x^TAx} + \sqrt{y^TAy}$
$x^TAx + y^TAy + 2x^TAy \le x^TAx + y^TAy + 2 \sqrt{x^TAx} \sqrt{y^TAy}$
$(yA^Tx)(yA^Tx) \le (x^TAx)(y^TAy)$
Let $z = Bx, w = By$ ,
Then $(w^Tz)(w^Tz) \le (z^Tz)(w^Tw) = \lVert z \rVert _2^2 \lVert w \rVert _2^2$
Basic Fact: The function $x \rightarrow x^TAx$ is convex if $A \ge 0$
Proof: Same as for the function $x^2$ (At some point use that $(x + y)^TA(x + y) \ge 0$ )
Definition: An ellipsod in $\mathbb{R}^n$ is a set of the form $\mathcal{E} = \{x : (x - x_0)^TA^{-1}(x - x_0) \le 1\}$
for some $A \gt 0$
Have $A = P^TDP$ , put $x - x_0 = P^Tu$
$A^{-1} = P^TD^{-1}P, (u^TP)A^{-1}(P^Tu) = u^TD^{-1}u = \sum_{i = 1}^n \frac{1}{\lambda_i} u_i^2$
Lemma: A set $\mathcal{E}$ is an ellipsoid iff and only if there exists an invertible matrix $Z \in \mathbb{R}^{n \times n}, s.t. \mathcal{E} = \{x_0 + Zu : \lVert u \rVert _2 \le 1\}$
Proof: $\{x_0 + Zu : \lVert u \rVert _2 \le 1\} = \{ y : \lVert Z^{-1}(y - x_0) \rVert _2 \le 1\} = \{y : (y - x_0)^TZ^{-1T}Z^{-1}(y - x_0) \le 1\}$

Hyperplane Seperation Theorem:
- Theorem: If $C, D \in \mathbb{R}^n$ are convex sets, disjoint, then there exists $a \in \mathbb{R}^n \backslash {0}, b \in \mathbb{R}, s.t. a^Tx \le b, \forall x \in C; a^Tx \ge b, \forall x \in D$
- Proof: when $C, D$ are closed and $D$ is bonded,
Let $d(C, D) = \underset{x \in C, y \in D} {inf} \lVert x - y \rVert _2$ ,
Then $\exists (x_n)_{n = 1}^{\infty} \subseteq C, (y_n)_{n = 1}^{\infty} \subseteq D, s.t. \lim_{n \to \infty} \lVert x_n - y_n \rVert _2 = d(C, D)$
Even, can assume $(y_n)_{n = 1}^{\infty}$ is convergent.
Then $\lim_{n \to \infty} y_n \in D$ because $D$ is closed
can also assume that $x_n$ convergent
Now, let $c = \lim_{n \to \infty} x_n \in C, d = \lim_{n \to \infty} y_n \in D$
Then $\lVert c - d \rVert _2 = \lim_{n \to \infty} \lVert x_n - y_n \rVert _2 = d(C, D) \gt 0$
Choose $a = d - c, b = (d - c)^T(\frac{d + c}{2})$
Assume by contradiction that $\exists x_0 \in D, s.t. a^Tx_0 \lt b$
Hope is that some point of form $d + \lambda(x_0 - d)$ is even closer to $c$ than $d$ is,
for $\lambda \in [0, 1], (d + \lambda (x_0 - d) - c)^T (d + \lambda (x_0 - d) - c)$
$= d^Td - 2d^Tc + c^Tc + 2 \lambda (x_0 - d)^T (d - c) + {\lambda}^2(x_0 - d)^T(x_0 - d)$
$= \lVert d - c \rVert _2^2 + 2 \lambda (x_0 - d)^T (d - c) + {\lambda}^2(x_0 - d)^T(x_0 - d)$
And $(d - c)^T(\frac{d - c}{2}) = \frac{\lVert d - c \rVert _2^2}{2} \ge 0, a^Tx_0 \lt b$
$\implies (x_0 - d)^T - (d - c)^T(\frac{d + c}{2}) \lt 0$
So $(x_0 - d)^T - (d - c)^T(\frac{d + c}{2}) - (d - c)^T(\frac{d - c}{2}) \lt 0$
$\implies \lambda = 0, \frac{d}{d \lambda} \cdots = (x_0 - d)^T(d - c) + 2 \lambda (x_0 - d)^T(x_0 - d) = (x_0 - d)^T(d - c) \lt 0$

$\min \lVert Ax - b \rVert _2^2$ where $A \in \mathbb{R}^{m \times n}, b \in \mathbb{R}^m$
Have $\lVert Ax - b \rVert _2^2 = (Ax - b)^T(Ax - b) = x^TA^TAx - 2b^TAx + b^Tb$
- Definition: $\nabla f(x_1, \cdots, x_n) = [\frac{\partial}{\partial x_1} f, ..., \frac{\partial}{\partial x_n} f]^T$
$\nabla b^Tb = 0, \nabla - 2b^TAx = -2A^Tb, \nabla x^TA^TAx = 2A^TAx (A^TAx - A^Tb = 0)$