介绍
basis pursuit problem
m
i
n
θ
(
x
)
min \quad\theta(x)
minθ(x)
s
.
t
.
A
x
=
b
s.t.\quad Ax=b
s.t.Ax=b
其中
A
∈
R
m
×
n
,
c
∈
R
m
,
θ
:
R
n
→
(
−
∞
,
∞
]
A\in R^{m\times n},c \in R^m,\theta:R^n\rightarrow(-\infty,\infty]
A∈Rm×n,c∈Rm,θ:Rn→(−∞,∞]为闭正常凸函数
处理
考虑基追问题拉格朗日函数
L
(
x
,
λ
)
=
θ
(
x
)
−
λ
T
(
A
x
−
b
)
.
L(x,\lambda)=\theta(x)-\lambda^T(Ax-b).
L(x,λ)=θ(x)−λT(Ax−b).
则
(
x
∗
,
λ
∗
)
∈
R
n
×
R
m
(x^*,\lambda^*)\in R^n\times R^m
(x∗,λ∗)∈Rn×Rm是基追问题的鞍点当且仅当
m
i
n
λ
L
(
x
∗
,
y
∗
,
λ
)
≤
L
(
x
∗
,
y
∗
,
λ
∗
)
≤
m
i
n
x
,
y
L
(
x
,
y
,
λ
∗
)
,
\mathop{min}\limits_{\lambda}L(x^*,y^*,\lambda)\le L(x^*,y^*,\lambda^*)\le \mathop{min}\limits_{x,y}L(x,y,\lambda^*),
λminL(x∗,y∗,λ)≤L(x∗,y∗,λ∗)≤x,yminL(x,y,λ∗),
相当于
θ
(
x
,
λ
)
−
θ
1
(
x
∗
,
λ
∗
)
+
[
x
−
x
∗
λ
−
λ
∗
]
T
[
(
0
−
A
T
A
0
)
(
x
∗
λ
∗
)
+
(
0
−
b
)
]
≥
0.
\theta(x,\lambda)-\theta_1(x^*,\lambda^*)+ \begin{bmatrix} x-x^*\\\lambda-\lambda^* \end{bmatrix}^T \begin{bmatrix} \begin{pmatrix} 0&-A^T\\ A&0 \end{pmatrix} \begin{pmatrix} x^*\\\lambda^* \end{pmatrix}+ \begin{pmatrix} 0\\-b \end{pmatrix} \end{bmatrix}\ge 0.
θ(x,λ)−θ1(x∗,λ∗)+[x−x∗λ−λ∗]T[(0A−AT0)(x∗λ∗)+(0−b)]≥0.
即
θ
(
x
,
λ
)
=
∥
x
∥
1
\theta(x,\lambda)=\|x\|_1
θ(x,λ)=∥x∥1和
F
(
x
,
λ
)
=
[
0
−
A
T
A
0
]
[
x
λ
]
+
[
0
−
b
]
F(x,\lambda)=\begin{bmatrix}0&-A^T\\A&0\end{bmatrix}\begin{bmatrix}x\\\lambda\end{bmatrix}+ \begin{bmatrix}0 \\-b\end{bmatrix}
F(x,λ)=[0A−AT0][xλ]+[0−b]
转化为变分不等式求解问题
θ
(
x
)
−
θ
(
x
∗
)
+
(
x
−
x
∗
)
T
F
(
x
∗
)
≥
0
\theta(x) -\theta(x^*) +(x- x^*)^T F(x^*) \ge 0
θ(x)−θ(x∗)+(x−x∗)TF(x∗)≥0
AD-LPMM
y
=
A
x
−
b
y=Ax-b
y=Ax−b
L
(
x
,
y
,
λ
,
ρ
)
=
θ
(
x
)
−
λ
T
(
y
−
b
)
+
ρ
2
∥
y
∥
2
=
θ
(
x
)
+
ρ
2
∥
y
−
λ
T
ρ
∥
2
+
c
L(x,y,\lambda,\rho)=\theta(x)-\lambda^T(y-b)+\frac{\rho}{2}\|y\|^2\\ \quad\quad\quad=\theta(x)+\frac{\rho}{2}\|y-\frac{\lambda^T}{\rho}\|^2+c
L(x,y,λ,ρ)=θ(x)−λT(y−b)+2ρ∥y∥2=θ(x)+2ρ∥y−ρλT∥2+c
c为常数
P
r
o
x
f
:
x
→
arg
min
{
f
(
y
)
+
1
2
∥
x
−
y
∥
2
:
y
∈
R
n
}
Prox_f:x\rightarrow\arg\min\{f(y)+\frac{1}{2}\|x-y\|^2:y\in R^n\}
Proxf:x→argmin{f(y)+21∥x−y∥2:y∈Rn}
arg
min
L
(
x
,
y
,
λ
,
ρ
)
=
arg
min
{
θ
(
x
)
ρ
+
1
2
∥
(
x
+
y
−
λ
T
ρ
)
−
x
∥
2
}
=
P
r
o
x
θ
ρ
[
x
+
y
−
λ
T
ρ
]
.
\arg\min L(x,y,\lambda,\rho)=\arg\min\{\frac{\theta(x)}{\rho}+\frac{1}{2}\|(x+y-\frac{\lambda^T}{\rho})-x\|^2\}=Prox_{\frac{\theta}{\rho}}[x+y-\frac{\lambda^T}{\rho}].
argminL(x,y,λ,ρ)=argmin{ρθ(x)+21∥(x+y−ρλT)−x∥2}=Proxρθ[x+y−ρλT].
初始化:
x
0
∈
R
n
x^0 \in R^n
x0∈Rn,
λ
0
∈
R
m
\lambda^0 \in R^m
λ0∈Rm,
ρ
>
0
\rho>0
ρ>0,
λ
m
a
x
(
A
T
A
)
≤
1.
\lambda_{max}(A^TA)\le1.
λmax(ATA)≤1.
一般步骤:对
k
=
0
,
1
,
k=0,1,
k=0,1,…执行以下步骤:
(
a
)
x
k
+
1
=
P
r
o
x
θ
α
[
x
k
+
α
ρ
A
T
(
y
k
−
λ
T
ρ
)
]
;
(a)x^{k+1}=Prox_{\frac{\theta}{\alpha}}[x^k+\frac{\alpha}{\rho}A^T(y^k-\frac{\lambda^T}{\rho})];
(a)xk+1=Proxαθ[xk+ραAT(yk−ρλT)];
(
b
)
y
k
+
1
=
P
r
o
x
θ
β
[
y
k
+
β
ρ
(
A
x
k
+
1
−
λ
T
ρ
)
]
;
(b)y^{k+1}=Prox_{\frac{\theta}{\beta}}[y^k+\frac{\beta}{\rho}(Ax^{k+1}-\frac{\lambda^T}{\rho})];
(b)yk+1=Proxβθ[yk+ρβ(Axk+1−ρλT)];
(
c
)
λ
k
+
1
=
λ
k
+
ρ
(
A
x
k
+
1
−
b
−
y
k
+
1
)
.
(c)\lambda^{k+1}=\lambda^k+\rho(Ax^{k+1}-b-y^{k+1}).
(c)λk+1=λk+ρ(Axk+1−b−yk+1).
GEM
初始化:
x
0
∈
R
n
x^0 \in R^n
x0∈Rn,
λ
0
∈
R
m
\lambda^0 \in R^m
λ0∈Rm,
β
>
0
\beta>0
β>0,并且
ν
,
μ
∈
(
0
,
1
)
\nu,\mu\in(0,1)
ν,μ∈(0,1),
μ
<
ν
,
β
\mu<\nu,\beta
μ<ν,β采取自适应规则
(
1
)
x
~
k
=
P
r
o
x
β
θ
1
(
x
k
+
β
A
T
λ
k
)
;
(1)\widetilde{x}^k=Prox_{\beta\theta_1}(x^k+\beta A^T\lambda^k);
(1)x
k=Proxβθ1(xk+βATλk);
(
2
)
λ
~
k
=
λ
k
−
β
(
A
x
k
−
b
)
;
(2)\widetilde{\lambda}^k=\lambda^k-\beta(Ax^k-b);
(2)λ
k=λk−β(Axk−b);
(
3
)
r
k
=
β
∥
(
A
T
(
λ
k
−
λ
~
k
)
A
(
x
k
−
x
~
k
)
)
∥
/
∥
(
x
k
−
x
~
k
λ
k
−
λ
~
k
)
∥
;
(3)r_k=\beta\begin{Vmatrix}\begin{pmatrix}A^T(\lambda^k-\widetilde{\lambda}^k)\\A(x^k-\widetilde{x}^k)\end{pmatrix}\end{Vmatrix}\bigg/ \begin{Vmatrix}\begin{pmatrix}x^k-\widetilde{x}^k\\\lambda^k-\widetilde{\lambda}^k\end{pmatrix}\end{Vmatrix};
(3)rk=β
(AT(λk−λ
k)A(xk−x
k))
/
(xk−x
kλk−λ
k)
;
(
4
)
(4)
(4)如果
r
k
>
ν
r_k>\nu
rk>ν:
(
5
)
β
=
2
3
∗
β
min
{
1
,
1
r
k
}
;
(5)\quad\quad\beta=\frac{2}{3}*\beta\min\{1,\frac{1}{r_k}\};
(5)β=32∗βmin{1,rk1};
(
6
)
(6)\quad\quad
(6)跳转到
(
1
)
(1)
(1).
(
7
)
x
k
+
1
=
P
r
o
x
β
θ
1
(
x
k
+
β
A
T
λ
~
k
)
;
(7)x^{k+1}=Prox_{\beta\theta_1}(x^k+\beta A^T\widetilde{\lambda}^k);
(7)xk+1=Proxβθ1(xk+βATλ
k);
(
8
)
λ
k
+
1
=
λ
k
−
β
(
A
x
~
k
−
b
)
;
(8)\lambda^{k+1}=\lambda^k-\beta(A\widetilde{x}^k-b);
(8)λk+1=λk−β(Ax
k−b);
(
9
)
(9)
(9)如果
r
k
≤
ν
r_k\le\nu
rk≤ν:
(
10
)
β
=
1.5
∗
β
;
(10)\quad\quad\beta=1.5*\beta;
(10)β=1.5∗β;
PGA
P G A a 1 PGA_{a1} PGAa1
初始化:
x
0
∈
R
n
,
λ
0
∈
R
m
,
β
>
0
,
ν
,
μ
∈
(
0
,
1
)
,
γ
∈
(
0
,
2
)
.
x^0 \in R^n,\lambda^0 \in R^m,\beta>0,\nu,\mu\in(0,1),\gamma\in(0,2).
x0∈Rn,λ0∈Rm,β>0,ν,μ∈(0,1),γ∈(0,2).
(
1
)
(1)
(1)
x
~
k
=
P
r
o
x
β
θ
1
(
x
k
+
β
A
T
λ
k
)
;
\widetilde{x}^k=Prox_{\beta\theta_1}(x^k+\beta A^T\lambda^k);
x
k=Proxβθ1(xk+βATλk);
(
2
)
(2)
(2)
λ
~
k
=
λ
k
−
β
(
A
x
k
−
b
)
;
\widetilde{\lambda}^k=\lambda^k-\beta(Ax^k-b);
λ
k=λk−β(Axk−b);
(
3
)
(3)
(3)
r
k
=
β
∥
(
A
T
(
λ
k
−
λ
~
k
)
A
(
x
k
−
x
~
k
)
)
∥
/
∥
(
x
k
−
x
~
k
λ
k
−
λ
~
k
)
∥
;
r_k=\beta\begin{Vmatrix}\begin{pmatrix}A^T(\lambda^k-\widetilde{\lambda}^k)\\A(x^k-\widetilde{x}^k)\end{pmatrix}\end{Vmatrix}\bigg/ \begin{Vmatrix}\begin{pmatrix}x^k-\widetilde{x}^k\\\lambda^k-\widetilde{\lambda}^k\end{pmatrix}\end{Vmatrix};
rk=β
(AT(λk−λ
k)A(xk−x
k))
/
(xk−x
kλk−λ
k)
;
(
4
)
(4)
(4)如果
r
k
>
ν
r_k>\nu
rk>ν:
(
5
)
β
=
2
3
∗
β
min
{
1
,
1
r
k
}
;
(5)\quad\quad\beta=\frac{2}{3}*\beta\min\{1,\frac{1}{r_k}\};
(5)β=32∗βmin{1,rk1};
(
6
)
(6)\quad\quad
(6)跳转到
(
1
)
(1)
(1).
(
7
)
α
k
∗
=
∥
(
x
k
−
x
~
k
λ
k
−
λ
~
k
)
∥
2
/
∥
(
[
x
k
−
x
~
k
]
+
β
[
A
T
(
λ
k
−
λ
~
k
)
]
[
λ
k
−
λ
~
k
]
−
β
[
A
(
x
k
−
x
~
k
)
]
)
∥
2
;
(7)\alpha^*_k=\begin{Vmatrix}\begin{pmatrix}x^k-\widetilde{x}^k\\\lambda^k-\widetilde{\lambda}^k\end{pmatrix}\end{Vmatrix}^2\bigg/\begin{Vmatrix}\begin{pmatrix}[x^k-\widetilde{x}^k]+\beta[A^T(\lambda^k-\widetilde{\lambda}^k)]\\ [\lambda^k-\widetilde{\lambda}^k]-\beta[A(x^k-\widetilde{x}^k)]\end{pmatrix}\end{Vmatrix}^2;
(7)αk∗=
(xk−x
kλk−λ
k)
2/
([xk−x
k]+β[AT(λk−λ
k)][λk−λ
k]−β[A(xk−x
k)])
2;
(
6
)
x
k
+
1
=
x
k
−
γ
α
k
∗
(
[
x
k
−
x
~
k
]
+
β
[
A
T
(
λ
k
−
λ
~
k
)
]
)
;
(6)x^{k+1}=x^k-\gamma\alpha^*_k([x^k-\widetilde{x}^k]+\beta[A^T(\lambda^k-\widetilde{\lambda}^k)]);
(6)xk+1=xk−γαk∗([xk−x
k]+β[AT(λk−λ
k)]);
(
7
)
λ
k
+
1
=
λ
k
−
γ
α
k
∗
(
[
λ
k
−
λ
~
k
]
−
β
[
A
(
x
k
−
x
~
k
)
)
]
)
;
(7)\lambda^{k+1}=\lambda^k-\gamma\alpha^*_k([\lambda^k-\widetilde{\lambda}^k]-\beta[A(x^k-\widetilde{x}^k))]);
(7)λk+1=λk−γαk∗([λk−λ
k]−β[A(xk−x
k))]);
(
8
)
(8)
(8)如果
r
k
≤
ν
r_k\le\nu
rk≤ν:
(
9
)
β
=
1.5
∗
β
(9)\quad\quad\beta=1.5*\beta
(9)β=1.5∗β
P G A b 1 PGA_{b1} PGAb1
除(3)外,其他步骤与
P
G
A
a
1
PGA_{a1}
PGAa1相同
(
3
)
α
k
∗
=
[
x
k
−
x
~
k
λ
k
−
λ
~
k
]
T
[
x
k
−
x
~
k
+
β
A
T
(
λ
k
−
λ
~
k
)
λ
k
−
λ
~
k
−
β
A
(
x
k
−
x
~
k
)
]
∥
(
x
k
−
x
~
k
+
β
A
T
(
λ
k
−
λ
~
k
)
λ
k
−
λ
~
k
−
β
A
(
x
k
−
x
~
k
)
)
∥
2
;
(3)\alpha^*_k=\frac{\begin{bmatrix}x^k-\widetilde{x}^k\\\lambda^k-\widetilde{\lambda}^k\end{bmatrix}^T\begin{bmatrix}x^k-\widetilde{x}^k+\beta A^T(\lambda^k-\widetilde{\lambda}^k)\\\lambda^k-\widetilde{\lambda}^k-\beta A(x^k-\widetilde{x}^k)\end{bmatrix}}{\begin{Vmatrix}\begin{pmatrix}x^k-\widetilde{x}^k+\beta A^T(\lambda^k-\widetilde{\lambda}^k)\\ \lambda^k-\widetilde{\lambda}^k-\beta A(x^k-\widetilde{x}^k)\end{pmatrix}\end{Vmatrix}^2};
(3)αk∗=
(xk−x
k+βAT(λk−λ
k)λk−λ
k−βA(xk−x
k))
2[xk−x
kλk−λ
k]T[xk−x
k+βAT(λk−λ
k)λk−λ
k−βA(xk−x
k)];