凸优化5：凸函数 Convex funtions

Anhongzhan

已于 2022-01-22 12:42:53 修改

阅读量474

点赞数

文章标签：机器学习线性代数

于 2022-01-22 12:39:51 首次发布

本文链接：https://blog.csdn.net/ahzahz/article/details/122635731

版权

1、凸函数的定义1

凸函数：若函数 $f:R^n \rightarrow R$ 是凸函数，则他需要满足两个条件：

$\space f$ 是凸集
$\forall \space x,y \in dom \space f,0 \leq \theta \leq 1$ ，有 $f(\theta x+(1-\theta)y) \leq \theta f(x)+(1-\theta)f(y)$

凸函数的扩展：若函数 $f:R^n \rightarrow R$ 是凸函数， $\space f=C \subseteq R^n$ ，则凸函数的扩展

$\widetilde{f}=\left\{ \begin{matrix} f(x),x\in dom \space f\\ +\infty ,x \notin dom \space f \end{matrix} \right.$
仍为凸函数

Ex1：示性函数是凸函数

凸集 $\subseteq R^n$
$示性函数f_c(x)=\left\{ \begin{matrix} 无定义,x \notin C \\ 0,\space\space x \in C \end{matrix} \right.为凸函数$
示性函数的扩展
$I_c(x)=\left\{ \begin{matrix} +\infty,x \notin C \\ 0,\space\space x \in C \end{matrix} \right.为凸函数$

2、凸函数的定义2：高维到低维

函数是凸的，当且仅当其在与其定义域相交的任何直线上都是凸的
$\Leftrightarrow\\ \forall \space x \in dom \space f, \forall \space v, g(t)=f(x+tv)为凸，其中dom \space g=\{t \mid x+tv \in dom \space f\}$

3、凸函数的定义3：一阶条件（若函数一阶倒数存在）

设 $f:R^n \rightarrow R$ 可微，即梯度 $\triangledown f$ 在 $\space f$ 上均存在，则 $f$ 为凸函数等价于：

$\space f$ 为凸
$\geq f(x) + \triangledown f^T(x)(y-x),\forall \space x,y \in dom \space f$

下图为等价定义的简单理解：

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-02lNecOU-1642825921344)(D:\StudyFiles\ConvexOptimization\note\5.凸函数\5-1.png)]$

证明一阶条件：

首先考虑一维情况： $\rightarrow R$ 为凸函数 $\Leftrightarrow dom \space f$ 为凸且 $\geq f(x)+f'(x)(y-x)$

充分性证明：
$f为凸，\forall \space x,y \in dom \space f 为凸\\ \forall \space t,0 \leq t \leq 1,x+t(y-x)\in dom \space f\\ f(x+t(y-x)) \leq (1-t)f(x)+tf(y)\\ tf(y) \geq tf(x)+f(x+t(y-x))-f(x)\\ f(y) \geq f(x)+\frac{f(x+t(y-x))-f(x)}{t}，对于\forall \space t都成立\\ 两侧对t取极限，有：f(y) \geq f(x)+f'(x)(y-x)得证\\$
必要性证明：
$设\forall \space x \neq y,x,y\in dom \space f,\space 0 \leq \theta \leq 1\\ 构造z=\theta x+(1-\theta)y \in dom \space f\\ \left\{ \begin{matrix} f(x) \geq f(z)+f'(z)(x-z)\\ f(y) \geq f(z)+f'(z)(y-z)\\ \end{matrix} \right.\\ \Rightarrow \theta f(x)+(1-\theta)f(y)\geq f(z)+[\theta(x-z)+(1-\theta)(y-z)]f'(z)\\ \geq f(z)+(\theta x+(1-\theta)y-z)f'(z)\\ \Rightarrow\theta f(x)+(1-\theta)f(y)\geq f(z)\\ 当x=y时，z=\theta x+(1-\theta)y=x\\ f(x) \geq f(z)恒成立，得证$

下面扩展到高维情况：

即 $\space f \space is \space convex \Leftrightarrow dom \space f \space is \space convex \space and \space f(y) \geq f(x) + \triangledown f^T(x)(y-x),\forall \space x,y \in dom \space f$

充分性证明：
$\Rightarrow dom \space f为凸，且f(y) \geq f(x) + \triangledown f^T(x)(y-x)\\ f为凸函数，则由第一个定义可知，dom \space f一定为凸\\ g(t)=f(ty+(1-t)x),ty+(1-t)x是仿射组合\\ g(t)=f(x+t(y-x))\\ g'(t)=\triangledown f^T(ty+(1-t)x)(y-x)\\ 由定义2以及一维情况有：\\ g(t_1)\geq g(t_2)+g'(t_2)(t_1-t_2)\\ \forall \space t_1,t_2都成立，令t_1=1,t_2=0\\ g(1)\geq g(0)+g'(0)\\ \Rightarrow f(y) \geq f(x)+ \triangledown f^T(x)(y-x)得证$
必要性证明：
$\space f为凸，且f(y) \geq f(x) + \triangledown f^T(x)(y-x)\Rightarrow f为凸函数 \\ \forall \space x,y \in dom \space f,ty+(1-t)x \in dom \space f\\ \widetilde{t}y+(1-\widetilde{t})x \in dom \space f\\ 将上述两点代入已知得：\\ f(ty+(1-t)x)\geq f(\widetilde{t}y+(1-\widetilde{t})x)+\triangledown f(\widetilde{t}y+(1-\widetilde{t})x)(ty+(1-t)x-\widetilde{t}y-(1-\widetilde{t})x)\\ \Rightarrow f(ty+(1-t)x)\geq f(\widetilde{t}y+(1-\widetilde{t})x)+\triangledown f(\widetilde{t}y+(1-\widetilde{t})x)(y-x)(t-\widetilde{t})\\ 定义函数g(t)=f(ty+(1-t)x),g(\widetilde{t})=f(\widetilde{t}y+(1-\widetilde{t})x)\\ g'(\widetilde{t})=\triangledown f^T(\widetilde{t} y+(1-\widetilde{t})x)(y-x)\\ \Rightarrow g(t) \geq g(\widetilde{t})+g'(\widetilde{t})(t-\widetilde{t})\\ 由定义2可知，f为凸函数$

4、凸函数的定义4：二阶条件

若 $f:R^n \rightarrow R$ 二阶可微，则 $f$ 为凸函数 $\Leftrightarrow dom \space f$ 为凸， $\triangledown^2 f(x) \succeq 0,\forall \space x \in dom \space f$

此处需要知道，关于二阶条件与严格凸的关系：
$\triangledown^2 f(x) \succ 0 \Rightarrow 严格凸\\ 严格凸 \nRightarrow \triangledown^2 f(x) \succ 0$
Ex1：二次函数
$f:R^n \rightarrow R,dom \space f=R^n\\ f(x)=\frac{1}{2}x^TPx+q^Tx+r,P\in S^n,q\in R^n,r\in R\\ \triangledown^2f(x)=P$
二次函数 $\triangledown^2f(x) \succ 0 \Leftrightarrow 严格凸$

Ex2：
$f(x)=\frac{1}{x^2},x \neq 0,x \in R\\ f''(x)=6x^{-4}\\ f(x)虽正定，但f(x)不是凸函数，因为dom \space f不是凸集$

Ex3：仿射函数

$f(x)=Ax+b,{\triangledown}^2f(x)=0 \Rightarrow$ 凸函数

Ex4：指数函数

$f(x)=e^{ax},x\in R, \triangledown^2f(x)=a^2e^{ax}\Rightarrow$ 凸函数

Ex5：幂函数
$f(x)=x^a,x\in R_{++},f''(x)=a(a-1)x^{a-2}\\ \triangledown^2f(x)=\left\{ \begin{matrix} \geq 0,a \geq 1 \space or \space a \leq 0\\ \leq 0,0 \leq a \leq 1 \end{matrix} \right.$

Ex6：绝对值幂函数
$f(x)=|x|^P,x \in R\\ P较为合适时：f'(x)=\left\{ \begin{matrix} Px^{P-1},x \geq 0\\ -P(-x)^{P-1},x<0 \end{matrix} \right.\\ f''(x)=\left\{ \begin{matrix} P(P-1)x^{P-2},x \geq 0\\ P(P-1)x^{P-2},x<0 \end{matrix} \right.\\ P>1时，函数为凸\\ P=1时，|x|不可导，但是仍为凸$
故 $P\geq 1$ 时，绝对值幂函数为凸

Ex7：对数函数
$f(x)=\log(x),x\in R_{++}\\ f'(x)=\frac{1}{x},f''(x)=-\frac{1}{x^2}<0$
严格凹函数

Ex8：负熵
$f(x)=x\log(x),x\in R_{++}\\ f'(x)=1+\log(x),f''(x)=\frac{1}{x}>0$
严格凸函数

Ex9：范数

$R^n$ 空间范数 $P(x),x\in R^n$ ，范数定义 $\left\{ \begin{matrix} P(x) \geq 0 \space and \space P(x)=0 \Leftrightarrow x=0\\ P(ax)=|a|P(x)\\P(x+y) \leq P(x)+P(y)\end{matrix} \right.$

$\forall \space x,y \in R^n,\forall \space 0 \leq \theta \leq 1\\ P(\theta x+(1-\theta)y) \leq P(\theta x)+P((1-\theta)y)=\theta P(x)+(1-\theta)P(y)\\$
一定是凸函数

Ex10：零范数（不是范数）

$x||_0=$ 非0元素的数目，不是凸函数

Ex11：极大值函数
$f(x)=max\{x_1,\cdots,x_n\},x \in R^n\\ \forall \space x,y \in R^n,0 \leq \theta \leq 1\\ f(\theta x+(1-\theta)y)=max\{\theta x_i+(1-\theta)y_i\},i=1,\cdots,n\\ \leq \theta max\{x_1\}+(1-\theta)max\{y_i\},i=1,\cdots,n\\ =\theta f(x)+(1-\theta)f(y)$
为凸函数

Ex12： $l o g - s u m - u p$ 函数，解析逼近
$f(x)=\log(e^{x_1}+\cdots+e^{x_n}),x \in R^n\\ max\{x_1,\cdots,x_n\} \leq f(x) \leq max\{x_1,\cdots,x_n\}+\log n\\ \frac{\partial{f}}{\partial{x_i}}=\frac{e^{x_i}}{e^{x_1}+\cdots+e^{x_n}}\\ i \neq j, \frac{\partial^2{f}}{\partial{x_i}\partial{x_j}}=\frac{-e^{x_i}e^{x_j}}{(e^{x_1}+\cdots+e^{x_n})^2}\\ i=j,\frac{\partial^2{f}}{\partial{x_i}\partial{x_j}}=\frac{-e^{x_i}e^{x_i}+e^{x_i}(e^{x_1}+\cdots+e^{x_n})}{(e^{x_1}+\cdots+e^{x_n})^2}\\ Let \space\space z=[e^{x_1},\cdots,e^{x_n}]^T\\ H=\frac{1}{(1^Tz)^2}\{\left[ \begin{matrix} e^{x_1}(e^{x_1}+\cdots+e^{x_n}) & \cdots & 0 \\ \vdots & \ddots & \vdots\\0 & \cdots & e^{x_1}(e^{x_1}+\cdots+e^{x_n}) \end{matrix} \right]-\left[ \begin{matrix} e^{x_1} \\ \vdots \\ e^{x_n} \end{matrix} \right]{\left[ \begin{matrix} e^{x_1} \cdots e^{x_n} \end{matrix} \right]}\}\\ =\frac{1}{1^Tz}((1^Tz)diag\{z\}-zz^T)\\ =\frac{1}{1^Tz} k\\ \forall \space v \in R^n,v^Tkv \geq 0\\ v^Tkv=(1^Tz)v^Tdiag\{z\}v-v^Tzz^Tv\\ =(\sum_i z_i)(\sum_i v^2_iz_i)-(\sum_i v_iz_i)^2\\ a_i=v_i \sqrt{z_i},b_i=\sqrt{z_i}\\ =(b^Tb)(a^Ta)-(a^Tb)^2 \geq 0(Cauchy-Schwarts不等式)得证$

Ex13：几何平均

$f(x)=(x_1 \cdots x_n)^{\frac{1}{n}},x \in R^n_{++}$ ，是凹函数

Ex14：行列式对数
$f(x)=\log det(x),dom \space f=S^n_{++}\\ n=1时，凹函数\\ n>1时，\forall \space z \in S^n_{++},\forall \space t \in R,v \in S^n\\ z+tv \in S^n_{++}\\ g(t)=f(z+tv)=\log det(z+tv)\\ =\log det\{z^\frac{1}{2}(I+tz^{-\frac{1}{2}}vz^{-\frac{1}{2}})z^\frac{1}{2}\}\\ =\log det(z)+\sum^n_{i=1} \log(1+t\lambda_i)$
其中： $\lambda_i:z^{-\frac{1}{2}}vz^{-\frac{1}{2}}$ 的第 $i$ 个特征值， $z^{-\frac{1}{2}}vz^{-\frac{1}{2}}$ 是对称矩阵，可以分解成 $\varLambda Q^T$ ,其中 $QQ^T=I,detQ=detQ^T=1$
$det(I+tz^{-\frac{1}{2}}vz^{-\frac{1}{2}})=det(QQ^T+Qt\varLambda Q^T)\\ =detQ \cdot det(I+t \varLambda) \cdot detQ^T\\ =det(I+t\varLambda)=\Pi_{i=1}^n(1+t\lambda_i)$