闭函数
设函数
f
:
E
→
[
−
∞
,
∞
]
f:\mathbb{E}\to \left[-\infty,\infty\right]
f:E→[−∞,∞]
如果
f
f
f的上镜图是闭的,则
f
f
f是闭函数(closed function)
适当函数
设函数
f
:
E
→
[
−
∞
,
∞
]
f:\mathbb{E}\to \left[-\infty,\infty\right]
f:E→[−∞,∞]
如果
∀
x
∈
E
,
f
(
x
)
>
−
∞
\forall \mathbf{x}\in \mathbb{E},f\left(\mathbf{x}\right)>-\infty
∀x∈E,f(x)>−∞
且
∃
x
∈
E
,
f
(
x
)
<
+
∞
\exists \mathbf{x}\in\mathbb{E},f\left(\mathbb{x}\right)<+\infty
∃x∈E,f(x)<+∞
则
f
f
f是适当函数(proper function)
共轭函数
设
f
:
E
→
[
−
∞
,
∞
]
f:\mathbb{E}\to \left[-\infty,\infty\right]
f:E→[−∞,∞],则
f
f
f的共轭函数
f
∗
:
E
∗
→
[
−
∞
,
∞
]
f^*:\mathbb{E}^*\to\left[-\infty,\infty\right]
f∗:E∗→[−∞,∞]定义为
f
∗
(
y
)
=
max
x
∈
E
{
⟨
y
,
x
⟩
−
f
(
x
)
}
y
∈
E
∗
f^*\left(\mathbf{y}\right)=\max_{\mathbf{x}\in\mathbb{E}}\left\{\left\langle\mathbf{y},\mathbf{x}\right\rangle -f\left(\mathbf{x}\right)\right\}\quad \mathbf{y}\in\mathbb{E}^*
f∗(y)=x∈Emax{⟨y,x⟩−f(x)}y∈E∗
如果
f
:
E
→
(
−
∞
,
∞
]
f:\mathbb{E}\to \left(-\infty,\infty\right]
f:E→(−∞,∞],则
f
f
f的共轭函数
f
∗
f^*
f∗是闭的且凸的
设
g
(
x
)
=
α
f
(
x
)
g\left(\mathbf{x}\right)=\alpha f\left(\mathbf{x}\right)
g(x)=αf(x)
则
g
∗
(
y
)
=
α
f
∗
(
y
α
)
g^*\left(\mathbf{y}\right)=\alpha f^*\left(\frac{\mathbf{y}}{\alpha}\right)
g∗(y)=αf∗(αy)
证明:
g
∗
(
y
)
=
max
x
{
⟨
y
,
x
⟩
−
g
(
x
)
}
=
max
x
{
⟨
y
,
x
⟩
−
α
f
(
x
)
}
=
α
max
x
{
⟨
y
α
,
x
⟩
−
f
(
x
)
}
=
α
f
∗
(
y
α
)
\begin{aligned} g^*\left(\mathbf{y}\right)&=\max_{\mathbf{x}}\left\{\left\langle \mathbf{y},\mathbf{x} \right\rangle -g\left(\mathbf{x}\right)\right\}\\ &=\max_{\mathbf{x}}\left\{\left\langle \mathbf{y},\mathbf{x} \right\rangle -\alpha f\left(\mathbf{x}\right)\right\}\\ &=\alpha\max_{\mathbf{x}}\left\{\left\langle \frac{\mathbf{y}}{\alpha},\mathbf{x} \right\rangle - f\left(\mathbf{x}\right)\right\}\\ &=\alpha f^*\left(\frac{\mathbf{y}}{\alpha}\right) \end{aligned}
g∗(y)=xmax{⟨y,x⟩−g(x)}=xmax{⟨y,x⟩−αf(x)}=αxmax{⟨αy,x⟩−f(x)}=αf∗(αy)
Fenchel不等式
f
f
f是一个适当函数,则
∀
x
∈
E
,
y
∈
E
∗
\forall \mathbf{x}\in\mathbb{E},\mathbf{y}\in\mathbb{E}^*
∀x∈E,y∈E∗
f
(
x
)
+
f
∗
(
x
)
≥
⟨
y
,
x
⟩
f\left(\mathbf{x}\right)+f^*\left(\mathbf{x}\right)\ge \left\langle\mathbf{y},\mathbf{x}\right\rangle
f(x)+f∗(x)≥⟨y,x⟩
二次共轭函数
设函数
f
:
E
→
[
−
∞
,
∞
]
f:\mathbb{E}\to \left[-\infty,\infty\right]
f:E→[−∞,∞],则
f
f
f的二次共轭函数
f
∗
∗
f^{**}
f∗∗
f
∗
∗
=
max
y
∈
E
∗
{
⟨
y
,
x
⟩
−
f
∗
(
y
)
}
,
x
∈
E
f^{**}=\max_{y\in\mathbb{E}^*}\left\{\left\langle\mathbf{y},\mathbf{x}\right\rangle-f^*\left(\mathbf{y}\right)\right\},\quad\mathbf{x}\in\mathbb{E}
f∗∗=y∈E∗max{⟨y,x⟩−f∗(y)},x∈E
因为
f
(
x
)
≥
⟨
y
,
x
⟩
−
f
∗
(
y
)
⇒
f
(
x
)
≥
max
y
∈
E
∗
{
⟨
y
,
x
⟩
−
f
∗
(
y
)
}
=
f
∗
∗
f\left(\mathbf{x}\right)\ge \left\langle\mathbf{y},\mathbf{x}\right\rangle-f^*\left(\mathbf{y}\right)\Rightarrow f\left(\mathbf{x}\right)\ge \max_{y\in\mathbb{E}^*}\left\{\left\langle\mathbf{y},\mathbf{x}\right\rangle-f^*\left(\mathbf{y}\right)\right\}=f^{**}
f(x)≥⟨y,x⟩−f∗(y)⇒f(x)≥y∈E∗max{⟨y,x⟩−f∗(y)}=f∗∗
所以
f
∗
∗
≤
f
f^{**}\le f
f∗∗≤f
定理1
设函数
f
:
E
→
(
−
∞
,
∞
]
f:\mathbb{E}\to \left(-\infty,\infty\right]
f:E→(−∞,∞]是一个适当凸函数,则
f
∗
∗
=
f
f^{**}=f
f∗∗=f
证明:
再说
共轭次梯度定理
设函数 f : E → ( − ∞ , ∞ ] f:\mathbb{E}\to \left(-\infty,\infty\right] f:E→(−∞,∞]是适当凸函数,则对于任意 x ∈ E , y ∈ E ∗ \mathbf{x}\in \mathbb{E},\mathbf{y}\in\mathbb{E}^* x∈E,y∈E∗下面两个式子等价
(i)
⟨
x
,
y
⟩
=
f
(
x
)
+
f
∗
(
y
)
\left\langle\mathbf{x},\mathbf{y} \right\rangle = f\left(\mathbf{x}\right)+f^*\left(\mathbf{y}\right)
⟨x,y⟩=f(x)+f∗(y)
(ii)
y
∈
∂
f
(
x
)
\mathbf{y}\in \partial f\left(\mathbf{x}\right)
y∈∂f(x)
如果
f
f
f是闭的,则(i)和(ii)等价于
(iii)
x
∈
∂
f
∗
(
y
)
\mathbf{x}\in \partial f^*\left(\mathbf{y}\right)
x∈∂f∗(y)
证明:
由次微分的定义
f
(
z
)
≥
f
(
x
)
+
⟨
y
,
z
−
x
⟩
∀
z
∈
E
f\left(\mathbf{z}\right)\ge f\left(\mathbf{x}\right)+\left\langle \mathbf{y},\mathbf{z}-\mathbf{x}\right\rangle\quad \forall \mathbf{z}\in\mathbb{E}
f(z)≥f(x)+⟨y,z−x⟩∀z∈E
即
⟨
y
,
x
⟩
−
f
(
x
)
≥
⟨
y
,
z
⟩
−
f
(
z
)
∀
z
∈
E
\left\langle \mathbf{y},\mathbf{x}\right\rangle - f\left(\mathbf{x}\right)\ge \left\langle \mathbf{y},\mathbf{z}\right\rangle-f\left(\mathbf{z}\right) \quad \forall \mathbf{z}\in\mathbb{E}
⟨y,x⟩−f(x)≥⟨y,z⟩−f(z)∀z∈E
对右边关于
z
\mathbf{z}
z取最大值
⟨
x
,
y
⟩
−
f
(
x
)
≥
f
∗
(
y
)
\left\langle\mathbf{x},\mathbf{y} \right\rangle - f\left(\mathbf{x}\right)\ge f^*\left(\mathbf{y}\right)
⟨x,y⟩−f(x)≥f∗(y)
再根据Fenchel不等式,有
⟨
x
,
y
⟩
=
f
(
x
)
+
f
∗
(
y
)
\left\langle\mathbf{x},\mathbf{y} \right\rangle = f\left(\mathbf{x}\right)+f^*\left(\mathbf{y}\right)
⟨x,y⟩=f(x)+f∗(y)
如果
f
f
f是闭函数,则根据定理1,
f
∗
∗
=
f
f^{**}=f
f∗∗=f
设
g
=
f
∗
g=f^*
g=f∗
⟨
x
,
y
⟩
=
g
(
y
)
+
g
∗
(
x
)
\left\langle\mathbf{x},\mathbf{y} \right\rangle = g\left(\mathbf{y}\right)+g^*\left(\mathbf{x}\right)
⟨x,y⟩=g(y)+g∗(x)
于是
x
∈
∂
g
(
y
)
=
∂
f
∗
(
y
)
\mathbf{x}\in \partial g\left(\mathbf{y}\right)=\partial f^* \left(\mathbf{y}\right)
x∈∂g(y)=∂f∗(y)
近端算子
proximal operator
假设:
1.
f
:
R
n
→
R
∪
{
∞
}
f:\mathbb{R}^n\to\mathbb{R}\cup\left\{\infty\right\}
f:Rn→R∪{∞}是一个闭,适当,凸函数
2.
dom
f
\operatorname{dom} f
domf是使得
f
(
x
)
f\left(\mathbf{x}\right)
f(x)的地方
定义:
prox
f
:
R
n
→
R
n
\operatorname{prox}_f:\mathbb{R}^n\to\mathbb{R}^n
proxf:Rn→Rn定义为
prox
f
(
v
)
=
arg
min
x
∈
R
n
(
f
(
x
)
+
1
2
∥
x
−
v
∥
2
)
\operatorname{prox}_f\left(\mathbf{v}\right)=\arg\min_{\mathbf{x}\in\mathbb{R}^n}\left(f\left(\mathbf{x}\right)+\frac{1}{2}\|\mathbf{x}-\mathbf{v}\|^2\right)
proxf(v)=argx∈Rnmin(f(x)+21∥x−v∥2)
prox λ f ( v ) = arg min x ∈ R n ( f ( x ) + 1 2 λ ∥ x − v ∥ 2 ) \operatorname{prox}_{\lambda f}\left(\mathbf{v}\right)=\arg\min_{\mathbf{x}\in\mathbb{R}^n}\left(f\left(\mathbf{x}\right)+\frac{1}{2\lambda}\|\mathbf{x}-\mathbf{v}\|^2\right) proxλf(v)=argx∈Rnmin(f(x)+2λ1∥x−v∥2)
第二近端定理
设
f
:
E
→
(
−
∞
,
∞
]
f:\mathbb{E}\to \left(-\infty,\infty\right]
f:E→(−∞,∞]是一个闭凸函数,则下列3个式子等价
(i)
u
=
Prox
f
(
x
)
\mathbf{u}=\operatorname{Prox}_f\left(\mathbf{x}\right)
u=Proxf(x)
(ii)
x
−
u
∈
∂
f
(
u
)
\mathbf{x}-\mathbf{u}\in\partial f\left(\mathbf{u}\right)
x−u∈∂f(u)
(iii)
⟨
x
−
u
,
y
−
u
⟩
≤
f
(
u
)
−
f
(
u
)
∀
y
∈
E
\left\langle \mathbf{x}-\mathbf{u},\mathbf{y}-\mathbf{u}\right\rangle \le f\left(\mathbf{u}\right)-f\left(\mathbf{u}\right)\quad \forall \mathbf{y}\in\mathbb{E}
⟨x−u,y−u⟩≤f(u)−f(u)∀y∈E
证明:
prox
f
(
x
)
=
arg
min
v
∈
R
n
(
f
(
v
)
+
1
2
∥
v
−
x
∥
2
)
\operatorname{prox}_f\left(\mathbf{x}\right)=\arg\min_{\mathbf{v}\in\mathbb{R}^n}\left(f\left(\mathbf{v}\right)+\frac{1}{2}\|\mathbf{v}-\mathbf{x}\|^2\right)
proxf(x)=argv∈Rnmin(f(v)+21∥v−x∥2)
所以
0
∈
∂
f
(
u
)
+
u
−
x
\mathbf{0}\in\partial f\left(\mathbf{u}\right)+\mathbf{u}-\mathbf{x}
0∈∂f(u)+u−x
所以
x
−
u
∈
∂
f
(
u
)
\mathbf{x}-\mathbf{u}\in\partial f\left(\mathbf{u}\right)
x−u∈∂f(u)
再根据次梯度定义,(iii)成立
引理
设
g
:
E
→
(
−
∞
,
∞
]
g:\mathbb{E}\to\left(-\infty,\infty\right]
g:E→(−∞,∞]是适当函数,
λ
≠
0
\lambda\neq 0
λ=0
f
(
x
)
=
λ
g
(
x
λ
)
f\left(\mathbf{x}\right)=\lambda g\left(\frac{\mathbf{x}}{\lambda}\right)
f(x)=λg(λx)
则
prox
f
(
x
)
=
λ
prox
g
λ
(
x
λ
)
\operatorname{prox}_f\left(\mathbf{x}\right)=\lambda\operatorname{prox}_{\frac{g}{\lambda}}\left(\frac{\mathbf{x}}{\lambda}\right)
proxf(x)=λproxλg(λx)
证明:
prox
f
(
x
)
=
arg
min
u
{
f
(
u
)
+
1
2
∥
u
−
x
∥
2
}
=
arg
min
u
{
λ
g
(
x
λ
)
+
1
2
∥
u
−
x
∥
2
}
=
λ
arg
min
z
{
λ
g
(
z
)
+
1
2
∥
λ
z
−
x
∥
2
}
=
λ
arg
min
z
{
g
(
z
)
λ
+
1
2
∥
z
−
x
λ
∥
2
}
=
λ
prox
g
λ
(
x
λ
)
\begin{aligned} \operatorname{prox}_f\left(\mathbf{x}\right)&=\arg\min_{\mathbf{u}}\left\{f\left(\mathbf{u}\right)+\frac{1}{2}\|\mathbf{u}-\mathbf{x}\|^2\right\}\\ &=\arg\min_{\mathbf{u}}\left\{\lambda g\left(\frac{\mathbf{x}}{\lambda}\right)+\frac{1}{2}\|\mathbf{u}-\mathbf{x}\|^2\right\}\\ &=\lambda \arg\min_{\mathbf{z}}\left\{\lambda g\left(\mathbf{z}\right)+\frac{1}{2}\|\lambda\mathbf{z}-\mathbf{x}\|^2\right\}\\ &=\lambda \arg\min_{\mathbf{z}}\left\{\frac{g\left(\mathbf{z}\right)}{\lambda}+\frac{1}{2}\|\mathbf{z}-\frac{\mathbf{x}}{\lambda}\|^2\right\}\\ &=\lambda\operatorname{prox}_{\frac{g}{\lambda}}\left(\frac{\mathbf{x}}{\lambda}\right) \end{aligned}
proxf(x)=argumin{f(u)+21∥u−x∥2}=argumin{λg(λx)+21∥u−x∥2}=λargzmin{λg(z)+21∥λz−x∥2}=λargzmin{λg(z)+21∥z−λx∥2}=λproxλg(λx)
Moreau Decomposition
设
f
:
E
→
(
−
∞
,
∞
]
f:\mathbb{E}\to \left(-\infty,\infty\right]
f:E→(−∞,∞]是一个适当闭的凸函数
x
=
Prox
f
(
x
)
+
Prox
f
∗
(
x
)
\mathbf{x}=\operatorname{Prox}_{f}(\mathbf{x})+\operatorname{Prox}_{f^{*}}(\mathbf{x})
x=Proxf(x)+Proxf∗(x)
其中
f
∗
f^*
f∗是共轭函数
证明:
设
u
=
Prox
f
(
x
)
\mathbf{u}=\operatorname{Prox}_{f}(\mathbf{x})
u=Proxf(x)
⟺
x
−
u
∈
∂
f
(
u
)
⟺
u
∈
∂
f
∗
(
x
−
u
)
⟺
x
−
(
x
−
u
)
∈
∂
f
∗
(
x
−
u
)
⟺
x
−
u
=
Prox
f
∗
(
x
)
⟺
x
=
u
+
Prox
f
∗
(
x
)
=
Prox
f
(
x
)
+
Prox
f
∗
(
x
)
\begin{aligned} &\Longleftrightarrow \mathbf{x}-\mathbf{u} \in \partial f(\mathbf{u}) \\ &\Longleftrightarrow \mathbf{u} \in \partial f^{*}(\mathbf{x}-\mathbf{u}) \\ &\Longleftrightarrow \mathbf{x}-(\mathbf{x}-\mathbf{u}) \in \partial f^{*}(\mathbf{x}-\mathbf{u}) \\ &\Longleftrightarrow \mathbf{x}-\mathbf{u}=\operatorname{Prox}_{f^{*}}(\mathbf{x}) \\ &\Longleftrightarrow \mathbf{x}=\mathbf{u}+\operatorname{Prox}_{f^{*}}(\mathbf{x})=\operatorname{Prox}_{f}(\mathbf{x})+\operatorname{Prox}_{f^{*}}(\mathbf{x}) \end{aligned}
⟺x−u∈∂f(u)⟺u∈∂f∗(x−u)⟺x−(x−u)∈∂f∗(x−u)⟺x−u=Proxf∗(x)⟺x=u+Proxf∗(x)=Proxf(x)+Proxf∗(x)
扩展版本
x
=
Prox
λ
f
(
x
)
+
λ
Prox
f
∗
λ
(
x
/
λ
)
\mathbf{x}=\operatorname{Prox}_{\lambda f}(\mathbf{x})+\lambda \operatorname{Prox}_{\frac{f^{*}}{\lambda}}(\mathbf{x} / \lambda)
x=Proxλf(x)+λProxλf∗(x/λ)
证明:
x
=
Prox
λ
f
(
x
)
+
Prox
(
λ
f
)
∗
(
x
)
=
Prox
λ
f
(
x
)
+
Prox
λ
f
∗
(
⋅
λ
)
(
x
)
=
=
Prox
λ
f
(
x
)
+
λ
Prox
f
∗
λ
(
x
/
λ
)
\begin{aligned} \mathbf{x}&=\operatorname{Prox}_{\lambda f}(\mathbf{x})+\operatorname{Prox}_{\left(\lambda f\right)^{*}}(\mathbf{x})\\ &=\operatorname{Prox}_{\lambda f}(\mathbf{x})+\operatorname{Prox}_{\lambda f^*\left(\frac{\cdot}{\lambda}\right)}(\mathbf{x})\\ &==\operatorname{Prox}_{\lambda f}(\mathbf{x})+\lambda \operatorname{Prox}_{\frac{f^{*}}{\lambda}}(\mathbf{x} / \lambda)\\ \end{aligned}
x=Proxλf(x)+Prox(λf)∗(x)=Proxλf(x)+Proxλf∗(λ⋅)(x)==Proxλf(x)+λProxλf∗(x/λ)
例子
投影
考虑闭凸集
C
≠
∅
\mathbf{C}\neq \empty
C=∅
l
C
l_C
lC是
C
C
C的指示函数:如果
x
∈
C
\mathbf{x}\in C
x∈C,则
l
C
(
x
)
=
0
l_C\left(\mathbf{x}\right)=0
lC(x)=0,否则为
∞
\infty
∞
prox
l
C
(
x
)
=
arg
min
y
(
l
C
(
y
)
+
1
2
∥
y
−
x
∥
2
)
=
arg
min
y
1
2
∥
y
−
x
∥
2
=
P
C
(
x
)
\operatorname{prox}_{l_C}\left(\mathbf{x}\right)=\arg\min_{\mathbf{y}}\left(l_C\left(\mathbf{y}\right)+\frac{1}{2}\|\mathbf{y}-\mathbf{x}\|^2\right)=\arg\min_{\mathbf{y}}\frac{1}{2}\|\mathbf{y}-\mathbf{x}\|^2=P_C\left(\mathbf{x}\right)
proxlC(x)=argymin(lC(y)+21∥y−x∥2)=argymin21∥y−x∥2=PC(x)
也就是投影
1范数
arg min x ∈ R n ( ∥ x ∥ 1 + 1 2 λ ∥ x − v ∥ 2 ) = arg min x ∈ R n ( λ ∥ x ∥ 1 + 1 2 ∥ x − v ∥ 2 ) = arg min x ∈ R n ∑ i = 1 n ( λ ∣ x i ∣ + 1 2 ( x − v ) 2 ) = ∑ i = 1 n arg min x ∈ R n ( λ ∣ x i ∣ + 1 2 ( x − v ) 2 ) \begin{aligned} &\arg\min_{x\in\mathbb{R}^n}\left(\|\mathbf{x}\|_1+\frac{1}{2\lambda}\|\mathbf{x}-\mathbf{v}\|^2\right)\\ =&\arg\min_{x\in\mathbb{R}^n}\left(\lambda\|\mathbf{x}\|_1+\frac{1}{2}\|\mathbf{x}-\mathbf{v}\|^2\right)\\ =&\arg\min_{x\in\mathbb{R}^n}\sum_{i=1}^{n}\left(\lambda \left|x_i\right|+\frac{1}{2}\left(\mathbf{x}-\mathbf{v}\right)^2\right)\\ =&\sum_{i=1}^{n}\arg\min_{x\in\mathbb{R}^n}\left(\lambda \left|x_i\right|+\frac{1}{2}\left(\mathbf{x}-\mathbf{v}\right)^2\right)\\ \end{aligned} ===argx∈Rnmin(∥x∥1+2λ1∥x−v∥2)argx∈Rnmin(λ∥x∥1+21∥x−v∥2)argx∈Rnmini=1∑n(λ∣xi∣+21(x−v)2)i=1∑nargx∈Rnmin(λ∣xi∣+21(x−v)2)
令
g
(
x
)
=
λ
∣
x
∣
+
1
2
(
x
−
v
)
2
(
λ
>
0
,
x
∈
R
)
g\left(x\right)=\lambda\left|x\right|+\frac{1}{2}\left(x-v\right)^2\left(\lambda>0,x\in\mathbb{R}\right)
g(x)=λ∣x∣+21(x−v)2(λ>0,x∈R)
g
(
x
)
=
{
λ
x
+
1
2
(
x
−
v
)
2
,
x
>
0
−
λ
x
+
1
2
(
x
−
v
)
2
,
x
≤
0
g\left(x\right)=\begin{cases} \lambda x+\frac{1}{2}\left(x-v\right)^2,&x>0\\ -\lambda x+\frac{1}{2}\left(x-v\right)^2,&x\le 0\\ \end{cases}
g(x)={λx+21(x−v)2,−λx+21(x−v)2,x>0x≤0
g
′
(
x
)
=
{
λ
+
x
−
v
,
x
>
0
−
λ
+
x
−
v
,
x
<
0
g'\left(x\right)=\begin{cases} \lambda+x-v,&x>0\\ -\lambda+x-v,&x<0\\ \end{cases}
g′(x)={λ+x−v,−λ+x−v,x>0x<0
λ
+
x
−
v
=
0
⇒
x
=
v
−
λ
\lambda+x-v=0\Rightarrow x=v-\lambda
λ+x−v=0⇒x=v−λ
当
v
>
λ
v>\lambda
v>λ时
∀
x
∈
(
−
∞
,
v
−
λ
)
\
{
0
}
,
g
′
(
x
)
<
0
\forall x\in \left(-\infty,v-\lambda\right)\backslash\left\{0\right\},g'\left(x\right)<0
∀x∈(−∞,v−λ)\{0},g′(x)<0
∀
x
∈
(
v
−
λ
,
+
∞
)
,
g
′
(
x
)
>
0
\forall x\in\left(v-\lambda,+\infty\right),g'\left(x\right)>0
∀x∈(v−λ,+∞),g′(x)>0
所以
prox
λ
g
(
v
)
=
v
−
λ
\operatorname{prox}_{\lambda g}\left(v\right)=v-\lambda
proxλg(v)=v−λ
同理,当
v
<
−
λ
v<-\lambda
v<−λ时
prox
λ
g
(
v
)
=
v
+
λ
\operatorname{prox}_{\lambda g}\left(v\right)=v+\lambda
proxλg(v)=v+λ
当
∣
v
∣
<
λ
\left|v\right|<\lambda
∣v∣<λ时
∀
x
∈
(
0
,
+
∞
)
,
g
′
(
x
)
>
0
\forall x\in\left(0,+\infty\right),g'\left(x\right)>0
∀x∈(0,+∞),g′(x)>0
∀
x
∈
(
−
∞
,
0
)
,
g
′
(
x
)
<
0
\forall x\in\left(-\infty,0\right),g'\left(x\right)<0
∀x∈(−∞,0),g′(x)<0
所以
prox
λ
g
(
v
)
=
0
\operatorname{prox}_{\lambda g}\left(v\right)=0
proxλg(v)=0
prox
λ
g
(
v
)
=
{
v
−
λ
,
v
>
λ
0
,
∣
v
∣
<
λ
v
+
λ
,
v
<
−
λ
=
sign
(
x
)
[
∣
v
∣
−
λ
]
+
=
v
−
P
[
−
λ
,
λ
]
v
\operatorname{prox}_{\lambda g}\left(v\right)=\begin{cases} v-\lambda,&v>\lambda\\ 0,&\left|v\right|<\lambda\\ v+\lambda,&v<-\lambda\\ \end{cases}=\operatorname{sign}\left(x\right)\left[\left|v\right|-\lambda\right]_+=v-P_{\left[-\lambda,\lambda\right]}v
proxλg(v)=⎩
⎨
⎧v−λ,0,v+λ,v>λ∣v∣<λv<−λ=sign(x)[∣v∣−λ]+=v−P[−λ,λ]v
P
[
−
λ
,
λ
]
P_{\left[-\lambda,\lambda\right]}
P[−λ,λ]表示投影到
[
−
λ
,
λ
]
\left[-\lambda,\lambda\right]
[−λ,λ]
所以
prox
λ
f
(
v
)
=
sign
(
v
)
[
∣
v
∣
−
λ
]
+
=
v
−
P
[
−
λ
,
λ
]
n
v
\operatorname{prox}_{\lambda f}\left(\mathbf{v}\right)=\operatorname{sign}\left(\mathbf{v}\right)\left[\left|\mathbf{v}\right|-\lambda\right]_+=\mathbf{v}-P_{\left[-\lambda,\lambda\right]^n}\mathbf{v}
proxλf(v)=sign(v)[∣v∣−λ]+=v−P[−λ,λ]nv
例3
f
(
x
)
=
{
μ
x
,
x
≥
0
∞
,
x
<
0
f\left(x\right)=\begin{cases} \mu x,&x\ge 0\\ \infty,&x<0 \end{cases}
f(x)={μx,∞,x≥0x<0
prox
λ
f
(
v
)
=
[
v
−
λ
μ
]
+
\operatorname{prox}_{\lambda f}\left(v\right)=\left[v-\lambda \mu\right]_+
proxλf(v)=[v−λμ]+
p范数
设
∥
⋅
∥
p
\|\cdot\|_p
∥⋅∥p的对偶范数为
∥
⋅
∥
∗
\|\cdot\|_*
∥⋅∥∗
(
p
p
p范数的对偶范数为
q
q
q范数,其中
1
p
+
1
q
=
1
\frac{1}{p}+\frac{1}{q}=1
p1+q1=1,若
p
=
1
p=1
p=1,则
q
=
∞
q=\infty
q=∞)
范数的共轭函数为其单位对偶范数球的示性函数,即
f
∗
(
y
)
=
{
0
,
∥
y
∥
∗
≤
1
+
∞
,
∥
y
∥
∗
≥
1
f^*\left(\mathbf{y}\right)=\begin{cases} 0,&\|\mathbf{y}\|_*\le 1\\ +\infty,&\|\mathbf{y}\|_*\ge1 \end{cases}
f∗(y)={0,+∞,∥y∥∗≤1∥y∥∗≥1
根据Moreau Decomposition
x
=
Prox
λ
f
(
x
)
+
λ
Prox
f
∗
λ
(
x
/
λ
)
\mathbf{x}=\operatorname{Prox}_{\lambda f}(\mathbf{x})+\lambda \operatorname{Prox}_{\frac{f^{*}}{\lambda}}(\mathbf{x} / \lambda)
x=Proxλf(x)+λProxλf∗(x/λ)
因为
λ
>
0
\lambda>0
λ>0
所以
1
λ
f
∗
(
y
)
=
f
∗
(
y
)
\frac{1}{\lambda}f^*\left(\mathbf{y}\right)=f^*\left(\mathbf{y}\right)
λ1f∗(y)=f∗(y)
所以
Prox
f
∗
λ
(
x
/
λ
)
\operatorname{Prox}_{\frac{f^{*}}{\lambda}}(\mathbf{x} / \lambda)
Proxλf∗(x/λ)相当于投影到单位范数球里
即
Prox
f
∗
λ
(
x
/
λ
)
=
P
∥
x
∥
∗
≤
1
x
λ
=
P
∥
x
∥
∗
≤
λ
x
\operatorname{Prox}_{\frac{f^{*}}{\lambda}}(\mathbf{x} / \lambda)=P_{\|\mathbf{x}\|_*\le 1}\frac{\mathbf{x}}{\lambda}=P_{\|\mathbf{x}\|_*\le \lambda}\mathbf{x}
Proxλf∗(x/λ)=P∥x∥∗≤1λx=P∥x∥∗≤λx
所以
Prox
λ
f
(
x
)
=
x
−
λ
P
∥
x
∥
∗
≤
λ
x
\operatorname{Prox}_{\lambda f}(\mathbf{x})=\mathbf{x}-\lambda P_{\|\mathbf{x}\|_*\le \lambda}\mathbf{x}
Proxλf(x)=x−λP∥x∥∗≤λx
性质
1.最小值点=不动点
设
λ
>
0
\lambda>0
λ>0,
x
∗
\mathbf{x}^*
x∗是
f
f
f最小值点当且仅当
prox
λ
f
(
x
∗
)
=
x
∗
\operatorname{prox}_{\lambda f}\left(\mathbf{x}^*\right)=\mathbf{x}^*
proxλf(x∗)=x∗
PPA
Proximal-point algorithm
x
k
+
1
=
prox
λ
f
(
x
k
)
\mathbf{x}^{k+1}=\operatorname{prox}_{\lambda f}\left(\mathbf{x}^k\right)
xk+1=proxλf(xk)
收敛性:
对于强凸函数,
prox
λ
f
\operatorname{prox}_{\lambda f}
proxλf是压缩的
即
∃
C
0
<
1
\exists C_0<1
∃C0<1
∥
prox
λ
f
(
x
)
−
prox
λ
f
(
y
)
∥
≤
C
0
∥
x
−
y
∥
\|\operatorname{prox}_{\lambda f}\left(\mathbf{x}\right)-\operatorname{prox}_{\lambda f}\left(\mathbf{y}\right)\|\le C_0\|\mathbf{x}-\mathbf{y}\|
∥proxλf(x)−proxλf(y)∥≤C0∥x−y∥
于是
∥
x
k
+
1
−
x
∗
∥
=
∥
prox
λ
f
(
x
k
)
−
prox
λ
f
(
x
∗
)
∥
≤
C
0
∥
x
k
−
x
∗
∥
≤
C
0
k
+
1
∥
x
0
−
x
∗
∥
\|\mathbf{x}^{k+1}-\mathbf{x}^*\|=\|\operatorname{prox}_{\lambda f}\left(\mathbf{x}^{k}\right)-\operatorname{prox}_{\lambda f}\left(\mathbf{x}^*\right)\|\le C_0\|\mathbf{x}^k-\mathbf{x}^*\|\le C_0^{k+1}\|\mathbf{x}^0-\mathbf{x}^*\|
∥xk+1−x∗∥=∥proxλf(xk)−proxλf(x∗)∥≤C0∥xk−x∗∥≤C0k+1∥x0−x∗∥
对于一般的凸函数,
prox
λ
f
\operatorname{prox}_{\lambda f}
proxλf是非扩张的
∥
prox
λ
f
(
x
)
−
prox
λ
f
(
y
)
∥
2
≤
∥
x
−
y
∥
2
−
∥
(
x
−
prox
λ
f
(
x
)
)
−
(
y
−
prox
λ
f
(
y
)
)
∥
\|\operatorname{prox}_{\lambda f}\left(\mathbf{x}\right)-\operatorname{prox}_{\lambda f}\left(\mathbf{y}\right)\|^2\le\|\mathbf{x}-\mathbf{y}\|^2-\|\left(\mathbf{x}-\operatorname{prox}_{\lambda f}\left(\mathbf{x}\right)\right)-\left(\mathbf{y}-\operatorname{prox}_{\lambda f}\left(\mathbf{y}\right)\right)\|
∥proxλf(x)−proxλf(y)∥2≤∥x−y∥2−∥(x−proxλf(x))−(y−proxλf(y))∥
近端梯度法
设
f
(
x
)
=
g
(
x
)
+
h
(
x
)
f\left(\mathbf{x}\right)=g\left(\mathbf{x}\right)+h\left(\mathbf{x}\right)
f(x)=g(x)+h(x)
g
g
g是凸的,可微的,
dom
(
g
)
=
R
n
\operatorname{dom}\left(g\right)=\mathbb{R}^n
dom(g)=Rn
h
h
h是凸的,但是不一定可微
如果 f f f可微,可以用梯度下降法
梯度下降法是用泰勒展开到二阶,忽略高阶项,用
1
t
I
\frac{1}{t}\mathbf{I}
t1I来代替海森矩阵
即
x
+
=
arg
min
z
(
f
(
x
)
+
∇
f
(
x
)
T
(
z
−
x
)
+
1
2
t
∥
z
−
x
∥
2
)
\mathbf{x}^+=\arg\min_{\mathbf{z}} \left(f\left(\mathbf{x}\right)+\nabla f\left(\mathbf{x}\right)^T\left(\mathbf{z}-\mathbf{x}\right)+\frac{1}{2t}\|\mathbf{z}-\mathbf{x}\|^2\right)
x+=argzmin(f(x)+∇f(x)T(z−x)+2t1∥z−x∥2)
对于原问题,如果我们对 g g g也泰勒展开近似, h h h不变
x + = arg min z ( g ( x ) + ∇ g ( x ) T ( z − x ) + 1 2 t ∥ z − x ∥ 2 + h ( z ) ) = arg min z ( ∇ g ( x ) T ( z − x ) + 1 2 t ∥ z − x ∥ 2 + h ( z ) ) = arg min z ( t 2 ∥ ∇ g ( x ) ∥ 2 + ∇ g ( x ) T ( z − x ) + 1 2 t ∥ z − x ∥ 2 + h ( z ) ) = arg min z ( 1 2 t ∥ z − ( x − t ∇ g ( x ) ) ∥ 2 + h ( z ) ) = prox t h ( x − t ∇ g ( x ) ) \begin{aligned} \mathbf{x}^+&=\arg\min_{\mathbf{z}} \left(g\left(\mathbf{x}\right)+\nabla g\left(\mathbf{x}\right)^T\left(\mathbf{z}-\mathbf{x}\right)+\frac{1}{2t}\|\mathbf{z}-\mathbf{x}\|^2+h\left(\mathbf{z}\right)\right)\\ &=\arg\min_{\mathbf{z}} \left(\nabla g\left(\mathbf{x}\right)^T\left(\mathbf{z}-\mathbf{x}\right)+\frac{1}{2t}\|\mathbf{z}-\mathbf{x}\|^2+h\left(\mathbf{z}\right)\right)\\ &=\arg\min_{\mathbf{z}} \left(\frac{t}{2}\|\nabla g\left(\mathbf{x}\right)\|^2+\nabla g\left(\mathbf{x}\right)^T\left(\mathbf{z}-\mathbf{x}\right)+\frac{1}{2t}\|\mathbf{z}-\mathbf{x}\|^2+h\left(\mathbf{z}\right)\right)\\ &=\arg\min_{\mathbf{z}} \left(\frac{1}{2t}\|\mathbf{z}-\left(\mathbf{x}-t\nabla g\left(\mathbf{x}\right)\right)\|^2+h\left(\mathbf{z}\right)\right)\\ &=\operatorname{prox}_{t\ h}\left(\mathbf{x}-t\nabla g\left(\mathbf{x}\right)\right) \end{aligned} x+=argzmin(g(x)+∇g(x)T(z−x)+2t1∥z−x∥2+h(z))=argzmin(∇g(x)T(z−x)+2t1∥z−x∥2+h(z))=argzmin(2t∥∇g(x)∥2+∇g(x)T(z−x)+2t1∥z−x∥2+h(z))=argzmin(2t1∥z−(x−t∇g(x))∥2+h(z))=proxt h(x−t∇g(x))
我们相当于对 g g g梯度下降一次,对 h h h做一次近端算子
步骤:
选择初始
x
(
0
)
\mathbf{x}^{(0)}
x(0),
重复
x
(
k
+
1
)
=
prox
t
h
(
x
(
k
)
−
t
∇
g
(
x
(
k
)
)
)
\mathbf{x}^{(k+1)}=\operatorname{prox}_{t\ h}\left(\mathbf{x}^{(k)}-t\nabla g\left(\mathbf{x}^{(k)}\right)\right)
x(k+1)=proxt h(x(k)−t∇g(x(k)))
特殊情况:
h
=
0
h=0
h=0,相当于梯度下降
h
=
I
C
h=I_C
h=IC,相当于投影
g
=
0
g=0
g=0,相当于PPA
收敛性
假设
g
g
g是凸的,梯度Lipschitz连续(
L
>
0
L>0
L>0)
h
h
h是凸的
固定步长
t
t
t
f
(
x
(
k
)
)
−
f
∗
≤
∥
x
(
0
)
−
x
∗
∥
2
2
t
k
f\left(\mathbf{x}^{(k)}\right)-f^*\le\frac{\|\mathbf{x}^{(0)}-\mathbf{x}^*\|^2}{2tk}
f(x(k))−f∗≤2tk∥x(0)−x∗∥2
回溯法
f
(
x
(
k
)
)
−
f
∗
≤
∥
x
(
0
)
−
x
∗
∥
2
2
t
m
i
n
k
f\left(\mathbf{x}^{(k)}\right)-f^*\le\frac{\|\mathbf{x}^{(0)}-\mathbf{x}^*\|^2}{2t_{min}k}
f(x(k))−f∗≤2tmink∥x(0)−x∗∥2
t
m
i
n
=
min
{
1
,
β
L
}
t_{min}=\min\left\{1,\frac{\beta}{L}\right\}
tmin=min{1,Lβ}
有 O ( 1 k ) O\left(\frac{1}{k}\right) O(k1)的收敛速度
加速
v
=
x
(
k
−
1
)
+
k
−
2
k
+
1
(
x
(
k
−
1
)
−
x
(
k
−
2
)
)
\mathbf{v}=\mathbf{x}^{(k-1)}+\frac{k-2}{k+1}\left(\mathbf{x}^{(k-1)}-\mathbf{x}^{(k-2)}\right)
v=x(k−1)+k+1k−2(x(k−1)−x(k−2))
x
(
k
)
=
prox
t
h
(
v
−
t
∇
g
(
v
)
)
\mathbf{x}^{(k)}=\operatorname{prox}_{t\ h}\left(\mathbf{v}-t\nabla g\left(\mathbf{v}\right)\right)
x(k)=proxt h(v−t∇g(v))
有点像动量法
加速后有 O ( 1 k 2 ) O\left(\frac{1}{k^2}\right) O(k21)的收敛速度