[笔记] Convex Optimization 2015.11.04

最新推荐文章于 2021-09-06 23:39:29 发布

AlgoComp

最新推荐文章于 2021-09-06 23:39:29 发布

阅读量974

点赞数

分类专栏：笔记文章标签： convex optimization

本文链接：https://blog.csdn.net/LiJiancheng0614/article/details/49745353

版权

笔记专栏收录该内容

12 篇文章 0 订阅

订阅专栏

Schur Complement: Consider $\begin{bmatrix} A & B \\ B^T & C \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} u \\ v \end{bmatrix}$ with $A$ invertible, $C$ symmetric.
Have $Ax + By = u \implies x = A^{-1} (u - By)$
also $B^Tx + Cy = v \implies \begin{align*} v = & B^T A^{-1} (u - By) + Cy \\ = & B^T A^{-1} u + (C - B^T A^{-1}B)y \\ = & B^T A^{-1} u + Sy \end{align*}$
where $S := C - B^T A^{-1}B$ is the Schur Complement of $C$ in the matrix $X := \begin{bmatrix} A & B \\ B^T & C \end{bmatrix}$

Corollary: If $X$ is invertible, then $S$ is invertible.
And $y = S^{-1}(v - B^T A^{-1} u) = S^{-1}v - S^{-1} B^T A^{-1} u$
also $x = A^{-1} (u - B S^{-1} v + B S^{-1} B^T A^{-1} u) = (A^{-1} + A^{-1} B S^{-1} B^T A^{-1}) u - A^{-1} B S^{-1} v$
Since $\begin{bmatrix} x \\ y \end{bmatrix} = X^{-1} \begin{bmatrix} u \\ v \end{bmatrix}$ this implies $X^{-1} = \begin{bmatrix} A^{-1} + A^{-1} B S^{-1} B^T A^{-1} & -A^{-1} B S^{-1} \\ -S^{-1} B^T A^{-1} & S^{-1} \end{bmatrix}$

Consider $\inf _u \begin{bmatrix} u^T & v^T \end{bmatrix} \begin{bmatrix} A & B \\ B^T & C \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix} = \inf _u u^T A u + 2v^T B u + v^T C v$
for fixed $v$ , $A, C$ symmetric
gradient = $2Au + (2v^T B)^T$ , optimum achieved when $Au = -B^T v$ .
Assume $A^{-1}$ exists, optimum achieved at $u = -A^{-1} B^T v$
optimum value of the $\inf$ : $v^T B A^{-1} A A^{-1} B^T v - 2v^T B A^{-1} B^T v + v^T C v = -v^T B A^{-1} B^T v + v^T C v = v^T(C - B A^{-1} B^T)v = v^T S v$ .

Lessons: If $S \not\succeq 0$ , then $X = \begin{bmatrix} A & B \\ B^T & C \end{bmatrix} \not\succeq 0$ .
And $S \succeq 0 \implies X \succeq 0$ .
Conclusion: $A \succ 0 \implies (X \succeq 0 \iff S \succeq 0)$
Example: Consider $f(x, Y) = x^T Y^{-1} x \;\; dom \, f = \mathbb{R}^n \times S_{++}^n$
Claim: $f$ is convex (meaning: $\forall (x_1, Y_1), (x_2, Y_2) \in dom \, f$ and $\forall \theta \in [0, 1]$ , $f(\theta x_1 + (1 - \theta) x_2, \theta Y_1 + (1 - \theta) Y_2) \le \theta f(x_1, Y_1) + (1 - \theta) f(x_2, Y_2)$
Proof: Show that $epi(f)$ is a convex set.
$\begin{align*} epi(f) = & \{ (x, Y, t) \in \mathbb{R} \times S_{++}^n \times \mathbb{R} : f(x, Y) \le t \} \\ = & \{ (x, Y, t) : x^T Y^{-1} x \le t \} \\ = & \{ (x, Y, t) : t - x^T Y^{-1} x \ge 0 \} \\ = & \{ (x, Y, t) : \begin{bmatrix} Y & x \\ x^T & t \end{bmatrix} \succeq 0 \} & \text{by Schur Complement} \\ = & S_+^{n + 1} \text{ is convex} \end{align*}$
(General version of $x^2 / y$ .)

Jensen’s Inequality:
1. If $f$ is a convex function then $f(\theta _1 x_1 + \cdots + \theta _n x_n) \le \theta _1 f(x_1) + \cdots \theta _n f(x_n)$ for all $x_1, \cdots, x_n \in dom \, f$ , $\theta _1 + \cdots + \theta _n = 1, \theta _1, \cdots, \theta _n \ge 0$
2. $f(E[x]) \le E[f(x)]$ for all row vector $x$ s.t. $x \in dom \, f$ with probability 1.

If $x_1, x_2, \cdots \in C$ , then $\sum_{i = 1}^n \theta _i x_i \in C$ for $\theta _1 + \theta _ 2 + \cdots = 1$ .
Say $x$ is a row vector s.t. $x = x_i$ with probability $\theta _i$ .
Let $y_i = f(x_i)$ , $epi(f) = \{(x, t) : f(x) \le t\}$
( $(\sum \theta _i x_i, \sum \theta_i y_i) \in epi(f)$ ?)
$f(\sum \theta _i x_i) \le \sum \theta_i f(x_i)$

Example: Let $a, b \ge 0$ , then $\sqrt{ab} \ge \frac{a + b}{2}$ .
Proof: By $f(x) = x^2$ is convex,
$\begin{align*} f(E[x]) = & f(\frac{\sqrt{a} + \sqrt{b}}{2}) \\ = & \frac{a + b + 2\sqrt{ab}}{4} \\ \le & E[f(x)] \\ = & \frac{f(\sqrt{a}) + f(\sqrt{b})}{2} \\ = & \frac{a + b}{2} \end{align*}$
Example: Let $x_1, \cdots, x_n \gt 0$ , and $\theta _1, \cdots, \theta _n$ “the used”,
then $x_1^{\theta _1} x_2^{\theta _2} \cdots x_n^{\theta _n} \le \theta _1 x_1 + \theta _2 x_2 + \cdots + \theta _n x_n$
Example: Let $a, b \gt 0, p, q \gt 1, \frac{1}{p} + \frac{1}{q} = 1, a^{\frac{1}{p}} + b^{\frac{1}{q}} \le \frac{1}{p} a + \frac{1}{q} b$

Hölder’s inequality: $x^T y \le \lVert x \rVert _p \cdot \lVert y \rVert _q$ for $\frac{1}{p} + \frac{1}{q} = 1, p, q \gt 1$ .
1. can normalize: $\lVert x \rVert _p = 1, \lVert y \rVert _q = 1$ .
2. $x^T y = \sum_{i = 1}^n x_i y_i = \sum_{i = 1}^n (x_i^p)^{\frac{1}{p}} (y_i^q)^{\frac{1}{q}} \le \sum_{i = 1}^n \left( \frac{1}{p} x_i^p + \frac{1}{q} y_i^q \right) = 1 = \lVert x \rVert _p \cdot \lVert y \rVert _q$

$a^\theta b^{1 - \theta} \le \theta a + (1 - \theta) b$
$\begin{align*} xy \le & \theta x^{\frac{1}{\theta}} + (1 - \theta) y^{\frac{1}{1 - \theta}} \\ = & \int _0^x t^{\frac{1}{\theta} - 1} dt + \int _0^y t^{\frac{1}{1 - \theta} - 1} dt \\ = & \int _0^x t^{\frac{1 - \theta}{\theta}} dt + \int _0^y t^{\frac{\theta}{1 - \theta}} dt \end{align*}$

Let $\mathcal{F}$ be some collection of convex functions.
Let $g(x) = \sup _{f \in \mathcal{F}} f(x)$ .
$g(\theta x + (1 - \theta) y) = \sup _{f \in \mathcal{F}} f(\theta x + (1 - \theta) y) \le \sup _{f \in \mathcal{F}} \theta f(x) + (1 - \theta) f(y)$
$g(f(\theta x + (1 - \theta) y)) \le g(\theta f(x) + (1 - \theta) f(y)) \le \theta g(f(x)) + (1 - \theta) g(f(y))$