求解变分不等式
介绍
设
θ
:
R
n
→
(
−
∞
,
∞
]
\theta:R^n → (-\infty,\infty]
θ:Rn→(−∞,∞] 为闭正常凸函数,
F
:
R
n
→
R
n
F:R^n → R^n
F:Rn→Rn为向量
值和连续映射。广义变分不等式
(
G
V
I
)
(GVI)
(GVI)[9,19]形式如下
x
∗
∈
R
n
,
θ
(
x
)
−
θ
(
x
∗
)
+
(
x
−
x
∗
)
T
F
(
x
∗
)
≥
0
,
∀
x
∈
R
n
.
x^*\in R^n,\theta(x)-\theta(x^*)+(x-x^*)^TF(x^*)\ge 0,\forall x\in R^n.
x∗∈Rn,θ(x)−θ(x∗)+(x−x∗)TF(x∗)≥0,∀x∈Rn.
保留
∞
−
∞
=
∞
\infty -\infty=\infty
∞−∞=∞的算术运算。在本文中,我们关注
F
F
F是单调的情况,即
(
F
(
x
)
−
F
(
y
)
)
T
(
x
−
y
)
≥
0
(F(x) -F(y))^T(x -y)\ge 0
(F(x)−F(y))T(x−y)≥0,对于所有
x
,
y
∈
R
n
x, y \in R^n
x,y∈Rn。在本文中,我们对
M
G
V
I
(
θ
,
F
)
MGVI(\theta, F)
MGVI(θ,F)做了以下额外的假设:
(
a
)
(a)
(a)单调算子
F
F
F是
L
i
p
s
c
h
i
t
z
Lipschitz
Lipschitz连续的,其中
L
i
p
s
c
h
i
t
z
Lipschitz
Lipschitz 常数
L
>
0
L > 0
L>0。
(
b
)
(b)
(b)
M
G
V
I
(
θ
,
F
)
MGVI(\theta ,F)
MGVI(θ,F)的解集,表示为
Ω
∗
\Omega^*
Ω∗,非空。
各种凸优化问题可以表述为
M
G
V
I
s
MGVIs
MGVIs,例如最小绝对收缩和选择算子(LASSO,least absolute shrinkage and selection operator/
l
1
l_1
l1正则化最小二乘)问题、基追问题[7]、基追去噪问题[5],和
D
a
n
t
z
i
g
Dantzig
Dantzig 选择器 [6],等等。此外,单调广义变分不等式(
M
G
V
I
MGVI
MGVI,monotone generalized variational inequality)包含经典单调变分不等式 (
M
V
I
MVI
MVI,monotone variational inequality)。通过设置
θ
=
δ
C
\theta = \delta _C
θ=δC(
δ
C
\delta _C
δC是
C
C
C的指标函数,如果是
x
∈
C
x \in C
x∈C,则等于
0
0
0,否则等于
∞
\infty
∞。),其中
C
⊆
R
n
C \subseteq R^n
C⊆Rn是一个非空闭凸集,则
M
G
V
I
MGVI
MGVI 就以下列形式简化为
M
V
I
MVI
MVI
x
∗
∈
C
,
(
x
−
x
∗
)
T
F
(
x
∗
)
≥
0
,
∀
x
∈
C
.
x^*\in C,(x-x^*)^TF(x^*)\ge 0,\forall x\in C.
x∗∈C,(x−x∗)TF(x∗)≥0,∀x∈C.
设
f
:
R
n
→
(
−
∞
,
∞
]
f: R^n → (−∞, ∞]
f:Rn→(−∞,∞]为闭正常凸函数,
f
f
f的邻近算子定义如下
P
r
o
x
f
:
x
↦
a
r
g
m
i
n
{
f
(
y
)
+
1
2
∥
x
−
y
∥
2
:
y
∈
R
n
}
Prox_f:x \mapsto argmin\{f(y)+\frac{1}{2}\|x-y\|^2 : y \in R^n\}
Proxf:x↦argmin{f(y)+21∥x−y∥2:y∈Rn}
设
f
:
R
n
→
(
−
∞
,
∞
]
f: R^n \rightarrow (-\infty, \infty]
f:Rn→(−∞,∞]为闭正常凸函数。那么接下来的三个条件是等价的:
1.
p
=
P
r
o
x
f
(
x
)
.
1. p = Prox_f (x).
1.p=Proxf(x).
2.
x
−
p
∈
∂
f
(
p
)
.
2. x - p \in \partial f(p).
2.x−p∈∂f(p).
3.
3.
3. 对于所有
w
∈
R
n
w\in R^n
w∈Rn,都有
(
x
−
p
)
T
(
p
−
w
)
≥
f
(
p
)
−
f
(
w
)
.
(x-p)^T(p-w) \ge f(p) - f(w).
(x−p)T(p−w)≥f(p)−f(w).
l 1 l_1 l1正则化最小二乘问题
对线性逆问题:
b
=
A
x
+
w
b=Ax+w
b=Ax+w,其中
b
∈
R
m
,
A
∈
R
m
×
n
b\in R^m,A\in R^{m\times n}
b∈Rm,A∈Rm×n已知,
w
w
w为噪音,求解
x
x
x。
让噪声最小,最小二乘法求解:
x
~
=
arg
min
x
∥
A
x
−
b
∥
2
\widetilde{x}=\mathop{\arg\min}\limits_{x}\|Ax-b\|^2
x
=xargmin∥Ax−b∥2
若
m
=
n
m=n
m=n且
A
A
A非奇异,
x
=
A
−
1
y
x=A^{-1}y
x=A−1y。但是很多情况下,
A
A
A病态,用最小二乘法求解时,系统微小的扰动都会导致结果差别很大。为了求解病态线性系统的逆问题,可以使用
l
2
l_2
l2正则化最小二乘(也叫Tikhonov regularization,Ridge regression)
x
~
=
arg
min
x
{
1
2
∥
A
x
−
b
∥
2
+
λ
∥
x
∥
2
}
\widetilde{x}=\mathop{\arg\min}\limits_{x}\{\frac{1}{2}\|Ax-b\|^2+\lambda\|x\|_2\}
x
=xargmin{21∥Ax−b∥2+λ∥x∥2},
即
m
i
n
x
∈
R
n
f
(
x
)
:
=
{
1
2
∥
A
x
−
b
∥
2
+
λ
∥
x
∥
2
}
\mathop{min}\limits_{x\in R^n}f(x):=\{\frac{1}{2}\|Ax-b\|^2+\lambda\|x\|_2\}
x∈Rnminf(x):={21∥Ax−b∥2+λ∥x∥2}
而
l
1
l_1
l1正则化最小二乘(Lasso,least absolute shrinkage and selection operator,最小绝对值收敛和选择算子)问题。
m
i
n
x
∈
R
n
f
(
x
)
:
=
{
1
2
∥
A
x
−
b
∥
2
+
λ
∥
x
∥
1
}
\mathop{min}\limits_{x\in R^n}f(x):=\{\frac{1}{2}\|Ax-b\|^2+\lambda\|x\|_1\}
x∈Rnminf(x):={21∥Ax−b∥2+λ∥x∥1}
对无约束问题
m
i
n
f
(
x
)
,
x
∈
R
n
minf(x),x\in R^n
minf(x),x∈Rn
梯度下降法
x
k
=
x
k
−
1
−
t
k
−
1
∇
f
(
x
k
−
1
)
x_k=x_{k-1}-t_{k-1}\nabla f(x_{k-1})
xk=xk−1−tk−1∇f(xk−1)
梯度下降法对可微函数可直接求微分,对不可微函数引入了次梯度(subgradient),次微分(subdifferential)概念
投影算子(projection,Pc)
P
c
=
P
r
o
x
t
k
δ
C
Pc=Prox_{t_k\delta _C}
Pc=ProxtkδC
x
k
=
P
c
(
x
k
−
1
−
t
k
−
1
∇
f
(
x
k
−
1
)
)
x_k=Pc(x_{k-1}-t_{k-1}\nabla f(x_{k-1}))
xk=Pc(xk−1−tk−1∇f(xk−1))
近端线性
x
k
=
arg
min
x
{
f
(
x
k
−
1
)
+
<
x
−
x
k
−
1
,
∇
f
(
x
k
−
1
)
>
+
1
2
t
k
−
1
∥
x
−
x
k
−
1
∥
2
}
x_k=\mathop{\arg\min}\limits_{x}\{f(x_{k-1})+<x-x_{k-1},\nabla f(x_{k-1})>+\frac{1}{2t_{k-1}}\|x-x_{k-1}\|^2\}
xk=xargmin{f(xk−1)+<x−xk−1,∇f(xk−1)>+2tk−11∥x−xk−1∥2}
f
(
x
k
−
1
)
+
<
x
−
x
k
−
1
,
∇
f
(
x
k
−
1
)
>
+
1
2
t
k
−
1
∥
x
−
x
k
−
1
∥
2
=
1
2
t
k
−
1
∥
x
−
(
x
k
−
1
−
t
k
−
1
∇
f
(
x
k
−
1
)
)
∥
2
+
D
f(x_{k-1})+<x-x_{k-1},\nabla f(x_{k-1})>+\frac{1}{2t_{k-1}}\|x-x_{k-1}\|^2=\frac{1}{2t_{k-1}}\|x-(x_{k-1}-t_{k-1}\nabla f(x_{k-1}))\|^2+D
f(xk−1)+<x−xk−1,∇f(xk−1)>+2tk−11∥x−xk−1∥2=2tk−11∥x−(xk−1−tk−1∇f(xk−1))∥2+D
D为常数
x
k
=
arg
min
x
{
1
2
t
k
∥
x
−
(
x
k
−
1
−
t
k
−
1
∇
f
(
x
k
−
1
)
)
∥
2
}
x_k=\mathop{\arg\min}\limits_{x}\{\frac{1}{2t_k}\|x-(x_{k-1}-t_{k-1}\nabla f(x_{k-1}))\|^2\}
xk=xargmin{2tk1∥x−(xk−1−tk−1∇f(xk−1))∥2}
对无约束问题
m
i
n
f
(
x
)
+
g
(
x
)
,
x
∈
R
n
minf(x)+g(x),x\in R^n
minf(x)+g(x),x∈Rn
x
k
+
1
=
arg
min
x
{
f
(
x
k
)
+
<
x
−
x
k
,
∇
f
(
x
k
)
>
+
1
2
t
k
∥
x
−
x
k
∥
2
+
g
(
x
)
}
x_{k+1}=\mathop{\arg\min}\limits_{x}\{f(x_k)+<x-x_k,\nabla f(x_k)>+\frac{1}{2t_k}\|x-x_k\|^2+g(x)\}
xk+1=xargmin{f(xk)+<x−xk,∇f(xk)>+2tk1∥x−xk∥2+g(x)}
x
k
+
1
=
arg
min
x
{
1
2
∥
x
−
(
x
k
−
t
k
∇
f
(
x
k
)
)
∥
2
+
t
k
g
(
x
)
}
x_{k+1}=\mathop{\arg\min}\limits_{x}\{\frac{1}{2}\|x-(x_k-t_k\nabla f(x_k))\|^2+t_kg(x)\}
xk+1=xargmin{21∥x−(xk−tk∇f(xk))∥2+tkg(x)}
即
x
k
+
1
=
P
r
o
x
t
k
g
(
x
k
−
t
k
∇
f
(
x
k
)
)
x_{k+1}=Prox_{t_kg}(x_k-t_k\nabla f(x_k))
xk+1=Proxtkg(xk−tk∇f(xk))
基追问题
m
i
n
∥
x
∥
1
s
.
t
.
A
x
=
b
\mathop{min}\quad \|x\|_1 \\s.t. \quad Ax=b
min∥x∥1s.t.Ax=b
其中
A
∈
R
m
×
n
,
b
∈
R
n
A\in R^{m\times n},b\in R^{n}
A∈Rm×n,b∈Rn
可分离两块凸优化模型
m
i
n
θ
1
(
x
)
+
θ
2
(
x
)
s
.
t
.
A
x
+
B
y
=
c
\mathop{min}\quad\theta_1(x)+\theta_2(x)\\ s.t. \quad Ax+By=c
minθ1(x)+θ2(x)s.t.Ax+By=c
其中
A
∈
R
m
×
n
,
B
∈
R
m
×
q
,
c
∈
R
m
A\in R^{m\times n},B\in R^{m\times q},c\in R^m
A∈Rm×n,B∈Rm×q,c∈Rm,
θ
1
:
R
n
→
(
−
∞
,
∞
]
\theta_1:R^n\rightarrow(-\infty,\infty]
θ1:Rn→(−∞,∞]和
θ
2
:
R
q
→
(
−
∞
,
∞
]
\theta_2:R^q\rightarrow(-\infty,\infty]
θ2:Rq→(−∞,∞]为两个正常闭和凸函数
迭代收缩阈值算法(ISTA,Iterative Shrinkage-Thresholding Algorithm)
ISTA
取
f
(
x
)
=
1
2
∥
A
x
−
b
∥
2
,
g
(
x
)
=
λ
∥
x
∥
1
f(x)=\frac{1}{2}\|Ax-b\|^2,g(x)=\lambda \|x\|_1
f(x)=21∥Ax−b∥2,g(x)=λ∥x∥1
∇
f
(
x
k
)
=
A
T
(
A
x
−
b
)
,
t
k
=
1
L
k
\nabla f(x_k)=A^T(Ax-b),t_k=\frac{1}{L_k}
∇f(xk)=AT(Ax−b),tk=Lk1
得
x
k
+
1
=
P
r
o
x
1
L
k
λ
∥
∥
1
(
x
k
−
1
L
k
A
T
(
A
x
k
−
b
)
)
x_{k+1}=Prox_{\frac{1}{L_k}\lambda \|\|_1}(x_k-\frac{1}{L_k}A^T(Ax_k-b))
xk+1=ProxLk1λ∥∥1(xk−Lk1AT(Axk−b))
L
L
L是
∇
f
(
x
)
\nabla f(x)
∇f(x)的
L
i
p
s
c
h
i
t
z
Lipschitz
Lipschitz常数,
L
>
λ
m
a
x
(
A
T
A
)
L>\lambda_{max}(A^TA)
L>λmax(ATA)
由
T
λ
t
k
=
P
r
o
x
t
k
λ
∥
∥
1
=
P
r
o
x
λ
t
k
\Tau _{\lambda t_k}=Prox_{t_k\lambda\|\|_1}=Prox_{\lambda t_k}
Tλtk=Proxtkλ∥∥1=Proxλtk得
x
k
+
1
=
T
λ
L
k
(
x
k
−
1
L
k
A
T
(
A
x
k
−
b
)
)
x_{k+1}=T_{\frac{\lambda}{L_k}}(x_k-\frac{1}{L_k}A^T(Ax_k-b))
xk+1=TLkλ(xk−Lk1AT(Axk−b))
收缩阈值算子(Shrinkage-Thresholding operator )
T
λ
(
x
)
=
s
g
n
(
x
)
m
a
x
(
∣
x
∣
−
λ
,
0
)
=
m
a
x
(
0
,
x
−
λ
)
+
m
i
n
(
0
,
x
+
λ
)
\Tau _{\lambda}(x)=sgn(x)max(|x|-\lambda,0)=max(0, x -\lambda) + min(0, x +\lambda)
Tλ(x)=sgn(x)max(∣x∣−λ,0)=max(0,x−λ)+min(0,x+λ)
x
k
x_k
xk收敛速度
O
(
1
t
)
O(\frac{1}{t})
O(t1)
matlab核心代码
T = @(tau, x) max(0, x - tau) + min(0, x + tau);% 收缩阈值函数
x_new = T(lambda/L0, x_old - (1/L0)*A'*(A*x_old-b));
FISTA改进:
快速迭代收缩阈值算法(FISTA,Fast Iterative Shrinkage-Thresholding Algorithm)
s
t
e
p
0
:
step\quad 0:
step0:
y
1
=
x
0
,
t
1
=
1
y_1=x_0,t_1=1
y1=x0,t1=1
s
t
e
p
k
(
k
≥
1
)
:
step\quad k(k\ge1):
stepk(k≥1):
x
k
+
1
=
T
λ
L
k
(
y
k
−
1
L
k
A
T
(
A
x
k
−
b
)
)
x_{k+1}=T_{\frac{\lambda}{L_k}}(y_k-\frac{1}{L_k}A^T(Ax_k-b))
xk+1=TLkλ(yk−Lk1AT(Axk−b))
t
k
+
1
=
1
+
1
+
4
t
k
2
2
t_{k+1}=\frac{1+\sqrt{1+4t_k^2}}{2}
tk+1=21+1+4tk2
y
k
+
1
=
x
k
+
(
t
k
−
1
t
k
+
1
)
(
x
k
−
x
k
−
1
)
y_{k+1}=x_k+(\frac{t_k-1}{t_{k+1}})(x_k-x_{k-1})
yk+1=xk+(tk+1tk−1)(xk−xk−1)
x
k
x_k
xk收敛速度
O
(
1
t
2
)
O(\frac{1}{t^2})
O(t21)
广义外梯度法(GEM,Generalized Extragradient Method)
外梯度法(EM,Extragradient Method)
广义外梯度法(Generalized Extragradient Method)
由外梯度法(EM,Extragradient Method)推广得到,其中EM采用投影算子(
P
c
Pc
Pc,projection operator),适用于求解
M
V
I
(
θ
,
F
,
C
)
MVI(\theta,F,C)
MVI(θ,F,C);GEM采用邻近算子(
P
r
o
x
Prox
Prox,proximity operator),适用于求解
M
G
V
I
(
θ
,
F
)
MGVI(\theta,F)
MGVI(θ,F),其他步骤基本相同,都是预估校正算法
GEM
l
1
l_1
l1正则化最小二乘问题,由KKT条件,易知
θ
(
x
)
=
λ
∥
x
∥
1
,
F
(
x
)
=
A
T
(
A
x
−
b
)
\theta(x)=\lambda\|x\|_1,F(x)=A^T(Ax-b)
θ(x)=λ∥x∥1,F(x)=AT(Ax−b)
β
k
≤
ν
L
,
ν
∈
(
0
,
1
)
,
L
\beta^k\le\frac{\nu}{L},\nu\in(0,1),L
βk≤Lν,ν∈(0,1),L为
F
(
x
)
F(x)
F(x)的
L
i
p
s
c
h
i
t
z
Lipschitz
Lipschitz常数
GEM是预估校正算法
预测:
x
~
k
=
P
r
o
x
β
k
θ
(
x
k
−
β
k
F
(
x
k
)
)
\widetilde{x}_k=Prox_{\beta^k\theta}(x_k-\beta^kF(x_k))
x
k=Proxβkθ(xk−βkF(xk))
校正:
x
k
+
1
=
P
r
o
x
β
k
θ
(
x
k
−
β
k
F
(
x
~
k
)
)
x_{k+1}=Prox_{\beta^k\theta}(x_k-\beta^kF(\widetilde{x}_k))
xk+1=Proxβkθ(xk−βkF(x
k))
邻近收缩算法 (PCA,proximity and contraction algorithms)
收缩阈值算子(
T
\Tau
T),投影算子(
P
C
P_C
PC)是特殊的邻近算子(
P
r
o
x
Prox
Prox)
邻近算子具有收缩性质,也就是生成的序列
{
x
k
}
\{x_k\}
{xk}会收敛
邻近点算法(PPA,proximity point algorithms)
邻近梯度算法(PGA,proximity gradient algorithms)
使用邻近算子(
P
r
o
x
Prox
Prox)作为近似梯度
变分不等式
θ
(
x
~
k
)
−
θ
(
x
∗
)
+
(
x
~
k
−
x
∗
)
T
F
(
x
∗
)
≥
0
\theta(\tilde{x}^k) -\theta(x^*) +(\tilde{x}^k - x^*)^T F(x^*) \ge 0
θ(x~k)−θ(x∗)+(x~k−x∗)TF(x∗)≥0
单调
(
x
~
k
−
x
∗
)
T
(
F
(
x
~
k
)
−
F
(
x
∗
)
)
≥
0
(\tilde{x}^k - x^*)^T(F(\tilde{x}^k) -F(x^*))\ge 0
(x~k−x∗)T(F(x~k)−F(x∗))≥0
P G A a 1 PGA_{a1} PGAa1
F
(
x
)
=
M
x
+
q
F(x)=Mx+q
F(x)=Mx+q
M半正定,不需要对称
x
0
∈
R
n
x^0 \in R^n
x0∈Rn,并且
γ
∈
(
0
,
2
)
,
α
k
∗
≥
1
∥
I
+
β
k
M
T
∥
2
2
\gamma \in (0,2),\alpha ^*_k\ge\frac{1}{\|I+\beta ^kM^T\|^2_2}
γ∈(0,2),αk∗≥∥I+βkMT∥221
预测器:选择
β
k
>
0
\beta ^k>0
βk>0,且
x
~
k
=
P
r
o
x
β
k
θ
(
x
k
−
β
k
F
(
x
k
)
)
\widetilde{x}^k=Prox_{\beta ^k\theta}(x^k-\beta ^kF(x^k))
x
k=Proxβkθ(xk−βkF(xk));
校正器:设
d
(
x
k
,
x
~
k
,
β
k
)
=
(
I
+
β
k
M
T
)
(
x
k
−
x
~
k
)
d(x^k,\widetilde{x}^k,\beta ^k)=(I+\beta ^kM^T)(x^k-\widetilde{x}^k)
d(xk,x
k,βk)=(I+βkMT)(xk−x
k),并且计算
α
k
∗
=
∥
x
k
−
x
~
k
∥
2
∥
d
(
x
k
,
x
~
k
,
β
k
)
∥
2
\alpha ^*_k=\frac{\|x^k-\widetilde{x}^k\|^2}{\|d(x^k,\widetilde{x}^k,\beta ^k)\|^2}
αk∗=∥d(xk,x
k,βk)∥2∥xk−x
k∥2。设置
x
k
+
1
=
x
k
−
γ
α
k
∗
d
(
x
k
,
x
~
k
,
β
k
)
x^{k+1}=x^k-\gamma\alpha ^*_kd(x^k,\widetilde{x}^k,\beta ^k)
xk+1=xk−γαk∗d(xk,x
k,βk)。
P G A a 2 PGA_{a2} PGAa2
F
(
x
)
=
M
x
+
q
F(x)=Mx+q
F(x)=Mx+q
M对称半正定,
G
=
I
+
β
M
G=I+\beta M
G=I+βM
x
0
∈
R
n
x^0 \in R^n
x0∈Rn,
β
>
0
\beta >0
β>0,并且
γ
∈
(
0
,
2
)
\gamma \in (0,2)
γ∈(0,2),
α
k
∗
≥
1
λ
m
a
x
(
G
)
\alpha ^*_k \ge \frac{1}{\lambda_{max}(G)}
αk∗≥λmax(G)1
预测器:
x
~
k
=
P
r
o
x
β
θ
(
x
k
−
β
F
(
x
k
)
)
\widetilde{x}^k=Prox_{\beta\theta}(x^k-\beta F(x^k))
x
k=Proxβθ(xk−βF(xk));
校正器:设
d
(
x
k
,
x
~
k
)
=
x
k
−
x
~
k
d(x^k,\widetilde{x}^k)=x^k-\widetilde{x}^k
d(xk,x
k)=xk−x
k,并且计算
α
k
∗
=
∥
x
k
−
x
~
k
∥
2
∥
d
(
x
k
,
x
~
k
)
∥
G
2
\alpha ^*_k=\frac{\|x^k-\widetilde{x}^k\|^2}{\|d(x^k,\widetilde{x}^k)\|^2_G}
αk∗=∥d(xk,x
k)∥G2∥xk−x
k∥2,设置
x
k
+
1
=
x
k
−
γ
α
k
∗
d
(
x
k
,
x
~
k
)
x^{k+1}=x^k-\gamma\alpha ^*_kd(x^k,\widetilde{x}^k)
xk+1=xk−γαk∗d(xk,x
k)。
P G A b 1 PGA_{b1} PGAb1
F ( x ) F(x) F(x)单调
x
0
∈
R
n
x^0 \in R^n
x0∈Rn,并且
γ
∈
(
0
,
2
)
\gamma \in (0,2)
γ∈(0,2) ,
α
k
∗
≥
1
2
\alpha ^*_k \ge\frac{1}{2}
αk∗≥21
预测器:选择
β
k
>
0
\beta ^k>0
βk>0,使得
β
k
∥
F
(
x
k
)
−
F
(
x
~
k
)
∥
≤
ν
∥
x
k
−
x
~
k
∥
\beta ^k\|F(x^k)-F(\widetilde{x}^k)\|\le\nu\|x^k-\widetilde{x}^k\|
βk∥F(xk)−F(x
k)∥≤ν∥xk−x
k∥,其中
x
~
k
=
P
r
o
x
β
k
θ
(
x
k
−
β
k
F
(
x
k
)
)
\widetilde{x}^k=Prox_{\beta ^k\theta}(x^k-\beta ^kF(x^k))
x
k=Proxβkθ(xk−βkF(xk));
校正器:设置
d
(
x
k
,
x
~
k
,
β
k
)
=
(
x
k
−
x
~
k
)
−
β
k
(
F
(
x
k
)
−
F
(
x
~
k
)
)
d(x^k,\widetilde{x}^k,\beta ^k)=(x^k-\widetilde{x}^k)-\beta ^k(F(x^k)-F(\widetilde{x}^k))
d(xk,x
k,βk)=(xk−x
k)−βk(F(xk)−F(x
k)),并且计算
α
k
∗
=
(
x
k
−
x
~
k
)
T
d
(
x
k
,
x
~
k
,
β
k
)
∥
d
(
x
k
,
x
~
k
,
β
k
)
∥
2
\alpha^*_k=\frac{(x^k-\widetilde{x}^k)^Td(x^k,\widetilde{x}^k,\beta ^k)}{\|d(x^k,\widetilde{x}^k,\beta ^k)\|^2}
αk∗=∥d(xk,x
k,βk)∥2(xk−x
k)Td(xk,x
k,βk),设置
x
k
+
1
=
x
k
−
γ
α
k
∗
d
(
x
k
,
x
~
k
,
β
k
)
x^{k+1}=x^k-\gamma\alpha^*_kd(x^k,\widetilde{x}^k,\beta ^k)
xk+1=xk−γαk∗d(xk,x
k,βk)。
P G A b 2 PGA_{b2} PGAb2
F
(
x
)
=
A
x
+
c
F(x)=Ax+c
F(x)=Ax+c
A对称半正定,
G
=
I
−
β
M
G=I-\beta M
G=I−βM,G对称正定
x
0
∈
R
n
x^0 \in R^n
x0∈Rn,
0
<
β
<
1
λ
m
a
x
(
A
)
0<\beta <\frac{1}{\lambda_{max}(A)}
0<β<λmax(A)1,并且
γ
∈
(
0
,
2
)
\gamma \in (0,2)
γ∈(0,2)
预测器:
x
~
k
=
P
r
o
x
β
θ
(
x
k
−
β
F
(
x
k
)
)
\widetilde{x}^k=Prox_{\beta\theta}(x^k-\beta F(x^k))
x
k=Proxβθ(xk−βF(xk));
校正器:设
d
(
x
k
,
x
~
k
)
=
x
k
−
x
~
k
d(x^k,\widetilde{x}^k)=x^k-\widetilde{x}^k
d(xk,x
k)=xk−x
k,设置
x
k
+
1
=
x
k
−
γ
d
(
x
k
,
x
~
k
)
x^{k+1}=x^k-\gamma d(x^k,\widetilde{x}^k)
xk+1=xk−γd(xk,x
k)。
交替方向乘子法(ADMM,alternating direction method of multipliers)
m
i
n
x
,
y
,
z
θ
1
(
x
)
+
θ
2
(
x
)
+
θ
3
(
x
)
min_{x,y,z}\theta_1(x)+\theta_2(x)+\theta_3(x)
minx,y,zθ1(x)+θ2(x)+θ3(x)
s
.
t
.
f
(
x
,
y
,
z
)
=
0
s.t. f(x,y,z)=0
s.t.f(x,y,z)=0
拉格朗日函数
L
(
x
,
y
,
z
,
λ
)
=
θ
1
(
x
)
+
θ
2
(
x
)
+
θ
3
(
x
)
+
λ
f
(
x
,
y
,
z
)
L(x,y,z,\lambda)=\theta_1(x)+\theta_2(x)+\theta_3(x)+\lambda f(x,y,z)
L(x,y,z,λ)=θ1(x)+θ2(x)+θ3(x)+λf(x,y,z)
λ
\lambda
λ为拉格朗日乘子
增广拉格朗日函数
L
(
x
,
y
,
z
,
λ
)
=
θ
1
(
x
)
+
θ
2
(
x
)
+
θ
3
(
x
)
+
λ
f
(
x
,
y
,
z
)
+
ρ
2
∥
f
(
x
,
y
,
z
)
∥
2
2
L(x,y,z,\lambda)=\theta_1(x)+\theta_2(x)+\theta_3(x)+\lambda f(x,y,z)+\frac{\rho}{2} \|f(x,y,z)\|_2^2
L(x,y,z,λ)=θ1(x)+θ2(x)+θ3(x)+λf(x,y,z)+2ρ∥f(x,y,z)∥22
λ
\lambda
λ为拉格朗日乘子,
ρ
\rho
ρ罚因子
ADMM
m
i
n
x
,
y
,
z
θ
1
(
x
)
+
θ
2
(
x
)
min_{x,y,z}\theta_1(x)+\theta_2(x)
minx,y,zθ1(x)+θ2(x)
s
.
t
.
A
x
+
B
y
=
c
s.t. Ax+By=c
s.t.Ax+By=c
初始化:
x
0
∈
R
n
x^0 \in R^n
x0∈Rn,
y
0
∈
R
q
y^0 \in R^q
y0∈Rq,
λ
0
∈
R
m
\lambda^0 \in R^m
λ0∈Rm,并且
ρ
>
0.
\rho>0.
ρ>0.
一般步骤:对
k
=
0
,
1
,
k=0,1,
k=0,1,…执行以下步骤:
(
a
)
x
k
+
1
∈
arg
min
{
θ
1
(
x
)
+
ρ
2
∥
A
x
+
B
y
k
−
c
+
1
ρ
λ
k
∥
2
:
x
∈
R
n
}
;
(a)x^{k+1}\in \mathop{\arg\min}\{\theta_1(x)+\frac{\rho}{2}\|Ax+By^k-c+\frac{1}{\rho}\lambda^k\|^2:x\in R^n\};
(a)xk+1∈argmin{θ1(x)+2ρ∥Ax+Byk−c+ρ1λk∥2:x∈Rn};
(
b
)
y
k
+
1
∈
arg
min
{
θ
2
(
y
)
+
ρ
2
∥
A
x
k
+
1
+
B
y
−
c
+
1
ρ
λ
k
∥
2
:
x
∈
R
q
}
;
(b)y^{k+1}\in \mathop{\arg\min}\{\theta_2(y)+\frac{\rho}{2}\|Ax^{k+1}+By-c+\frac{1}{\rho}\lambda^k\|^2:x\in R^q\};
(b)yk+1∈argmin{θ2(y)+2ρ∥Axk+1+By−c+ρ1λk∥2:x∈Rq};
(
c
)
λ
k
+
1
=
λ
k
+
ρ
(
A
x
k
+
1
+
B
y
k
+
1
−
c
)
.
(c)\lambda^{k+1}=\lambda^k+\rho(Ax^{k+1}+By^{k+1}-c).
(c)λk+1=λk+ρ(Axk+1+Byk+1−c).
交替方向线性近似乘子法(AD-LPMM,linearized proximal)
交替方向近似乘子法(AD-PMM,proximal)的一个特例
初始化:
x
0
∈
R
n
x^0 \in R^n
x0∈Rn,
y
0
∈
R
q
y^0 \in R^q
y0∈Rq,
λ
0
∈
R
m
\lambda^0 \in R^m
λ0∈Rm,
ρ
>
0
\rho>0
ρ>0,
α
≥
ρ
λ
m
a
x
(
A
T
A
)
\alpha\ge\rho\lambda_{max}(A^TA)
α≥ρλmax(ATA),
β
≥
ρ
λ
m
a
x
(
B
T
B
)
.
\beta\ge\rho\lambda_{max}(B^TB).
β≥ρλmax(BTB).
一般步骤:对
k
=
0
,
1
,
k=0,1,
k=0,1,…执行以下步骤:
(
a
)
x
k
+
1
=
P
r
o
x
1
α
θ
1
[
x
k
+
ρ
α
A
T
(
A
x
k
+
B
y
k
−
c
+
1
ρ
λ
k
)
]
;
(a)x^{k+1}=Prox_{\frac{1}{\alpha}\theta_1}[x^k+\frac{\rho}{\alpha}A^T(Ax^k+By^k-c+\frac{1}{\rho}\lambda^k)];
(a)xk+1=Proxα1θ1[xk+αρAT(Axk+Byk−c+ρ1λk)];
(
b
)
y
k
+
1
=
P
r
o
x
1
β
θ
2
[
y
k
+
ρ
β
B
T
(
A
x
k
+
1
+
B
y
k
−
c
+
1
ρ
λ
k
)
]
;
(b)y^{k+1}=Prox_{\frac{1}{\beta}\theta_2}[y^k+\frac{\rho}{\beta}B^T(Ax^{k+1}+By^k-c+\frac{1}{\rho}\lambda^k)];
(b)yk+1=Proxβ1θ2[yk+βρBT(Axk+1+Byk−c+ρ1λk)];
(
c
)
λ
k
+
1
=
λ
k
+
ρ
(
A
x
k
+
1
+
B
y
k
+
1
−
c
)
.
(c)\lambda^{k+1}=\lambda^k+\rho(Ax^{k+1}+By^{k+1}-c).
(c)λk+1=λk+ρ(Axk+1+Byk+1−c).