ADMM
对偶下降
m i n i m i z e f ( x ) s u b j e c t t o A x = b minimize\ \ f(x) \\ subject\ \ to \ \ Ax=b minimize f(x)subject to Ax=b
其中, x ∈ R n x\in R^n x∈Rn, A ∈ R m × n A\in R^{m×n} A∈Rm×n, f : R n → R f:R^n \to R f:Rn→R
该问题的拉格朗日式如下
L
(
x
,
y
)
=
f
(
x
)
+
y
T
(
A
x
−
b
)
L(x,y)=f(x)+y^T(Ax-b)
L(x,y)=f(x)+yT(Ax−b)
对偶方程如下:
g
(
y
)
=
*
i
n
f
x
L
(
x
,
y
)
=
−
*
s
u
p
x
∈
d
o
m
f
(
−
y
T
A
x
−
f
(
x
)
)
−
b
T
y
=
−
f
∗
(
−
A
T
y
)
−
b
T
y
g(y)=\operatorname*{inf}_{x}L(x,y)=-\operatorname*{sup}_{x\in domf}(-y^TAx-f(x))-b^Ty=-f^*(-A^Ty)-b^Ty
g(y)=*infxL(x,y)=−*supx∈domf(−yTAx−f(x))−bTy=−f∗(−ATy)−bTy
为什么这样做呢?
min
x
L
(
x
,
y
)
=
d
∗
≤
p
∗
=
min
x
f
(
x
)
+
I
(
A
x
−
b
)
w
h
e
r
e
I
(
u
)
=
{
0
u=0
∞
u
≠
0
\min_{x}L(x,y)=d^*\le p^*=\min_{x} f(x)+I(Ax-b) \\ where \ \ I(u)=\begin{cases} 0 & \text {u=0}\\[2ex] \infty & \text {u} \neq {0} \end{cases}
xminL(x,y)=d∗≤p∗=xminf(x)+I(Ax−b)where I(u)=⎩⎨⎧0∞u=0u̸=0
对偶问题最优化
m
a
x
i
m
i
z
e
g
(
y
)
maximize \ \ g(y)
maximize g(y)
令
y
∗
=
*
a
r
g
m
a
x
y
g
(
y
)
y*=\operatorname*{argmax}_{y}\ \ g(y)
y∗=*argmaxy g(y),
x
∗
=
a
r
g
m
i
n
x
L
(
x
,
y
∗
)
x*=argmin_x\ \ L(x,y*)
x∗=argminx L(x,y∗)
如果强对偶成立,则 d ∗ = p ∗ d*=p* d∗=p∗。 x ∗ x* x∗是原问题的最优解
对偶下降算法步骤如下:
x k + 1 : = a r g m i n x L ( x , y k ) y k + 1 : = y k + α k ( A x k + 1 − b ) x^{k+1}:=argmin_xL(x,y^k) \\ y^{k+1}:=y^k+\alpha^k(Ax^{k+1}-b) xk+1:=argminxL(x,yk)yk+1:=yk+αk(Axk+1−b)
先最小化 L ( x , y k ) L(x,y^k) L(x,yk),后最大化 L ( x k + 1 , y ) L(x^{k+1},y) L(xk+1,y)
特点:
- α k \alpha^k αk选择合适,且满足一定条件下, x k x^k xk, y k y^k yk收敛到最优点
- 然而,对许多例子不成立。比如 f ( x ) = B x f(x)=Bx f(x)=Bx,在更新 x x x时不存在最小值
对偶分解
每次迭代需要广播和集合两个步骤: x x x更新可以独立进行, y y y更新要集中进行
增广拉格朗日乘子法
L ρ ( x , y ) = f ( x ) + y T ( A x − b ) + ( ρ / 2 ) ∥ A x − b ∥ 2 2 L_\rho(x,y)=f(x)+y^T(Ax-b)+(\rho /2)\left\lVert Ax-b \right\rVert_2^2 Lρ(x,y)=f(x)+yT(Ax−b)+(ρ/2)∥Ax−b∥22
这等价于如下问题原拉格朗日:
m
i
n
i
m
i
z
e
f
(
x
)
+
(
ρ
/
2
)
∥
A
x
−
b
∥
2
2
s
u
b
j
e
c
t
t
o
A
x
=
b
minimize \ \ f(x)+(\rho /2)\left\lVert Ax-b \right\rVert_2^2 \\ subject\ to \ \ Ax=b
minimize f(x)+(ρ/2)∥Ax−b∥22subject to Ax=b
算法步骤:
x k + 1 : = a r g m i n x L ρ ( x , y k ) y k + 1 : = y k + ρ ( A x k + 1 − b ) x^{k+1}:=argmin_x\ \ L_\rho(x,y^k) \\ y^{k+1}:=y^k+\rho(Ax^{k+1}-b) xk+1:=argminx Lρ(x,yk)yk+1:=yk+ρ(Axk+1−b)
为什么选择 ρ \rho ρ作为迭代步长?
最优条件必须满足
A
x
∗
−
b
=
0
∇
f
(
x
∗
)
+
A
T
y
∗
=
0
Ax^*-b=0 \\ \nabla f(x^*)+A^Ty^*=0
Ax∗−b=0∇f(x∗)+ATy∗=0
根据定义,
x
k
+
1
x^{k+1}
xk+1最小化
L
ρ
(
x
,
y
k
)
L_\rho(x,y^k)
Lρ(x,yk),因此
0
=
∇
x
L
ρ
(
x
k
+
1
,
y
k
)
=
∇
x
f
(
x
k
+
1
)
+
A
T
(
y
k
+
ρ
(
A
x
k
+
1
−
b
)
)
=
∇
x
f
(
x
k
+
1
)
+
A
T
y
k
+
1
0 =\nabla_xL_\rho(x^{k+1},y^k)\\ =\nabla_xf(x^{k+1})+A^T(y^k+\rho(Ax^{k+1}-b))\\ = \nabla_xf(x^{k+1})+A^Ty^{k+1}
0=∇xLρ(xk+1,yk)=∇xf(xk+1)+AT(yk+ρ(Axk+1−b))=∇xf(xk+1)+ATyk+1
随着迭代的进行,只需要原条件的残差收敛到0,即可
评价:
- 大大增加了收敛性
- 不能进行分解
ADMM算法
m i n i m i z e f ( x ) + g ( z ) s u b j e c t t o A x + B z = c minimize \ \ f(x)+g(z) \\ subject\ to \ \ Ax+Bz=c minimize f(x)+g(z)subject to Ax+Bz=c
最优解表示如下:
p
∗
=
i
n
f
{
f
(
x
)
+
g
(
z
)
∣
A
x
+
B
z
=
c
}
p^*=inf\{ f(x)+g(z)\ |\ Ax+Bz=c \}
p∗=inf{f(x)+g(z) ∣ Ax+Bz=c}
增广拉格朗日:
L
ρ
(
x
,
y
)
=
f
(
x
)
+
g
(
z
)
+
y
T
(
A
x
+
B
z
−
c
)
+
(
ρ
/
2
)
∥
A
x
+
B
z
−
c
∥
2
2
L_\rho(x,y)=f(x)+g(z)+y^T(Ax+Bz-c)+(\rho /2)\left\lVert Ax+Bz-c \right\rVert_2^2
Lρ(x,y)=f(x)+g(z)+yT(Ax+Bz−c)+(ρ/2)∥Ax+Bz−c∥22
算法步骤:
x
k
+
1
:
=
*
a
r
g
m
i
n
x
L
ρ
(
x
,
z
k
,
y
k
)
z
k
+
1
:
=
*
a
r
g
m
i
n
z
L
ρ
(
x
k
+
1
,
z
,
y
k
)
y
k
+
1
:
=
y
k
+
ρ
(
A
x
k
+
1
+
B
z
k
+
1
−
c
)
x^{k+1}:=\operatorname*{argmin}_{x}L_\rho(x,z^k,y^k)\\ z^{k+1}:=\operatorname*{argmin}_{z}L_\rho(x^{k+1},z,y^k)\\ y^{k+1}:=y^k+\rho (Ax^{k+1}+Bz^{k+1}-c)
xk+1:=*argminxLρ(x,zk,yk)zk+1:=*argminzLρ(xk+1,z,yk)yk+1:=yk+ρ(Axk+1+Bzk+1−c)
归一化形式:
令
r
=
A
x
+
B
z
−
c
r=Ax+Bz-c
r=Ax+Bz−c,则
y
T
r
+
(
ρ
/
2
)
∥
r
∥
2
2
=
(
ρ
/
2
)
∥
r
+
u
∥
2
2
−
(
ρ
/
2
)
∥
u
∥
2
2
y^Tr+(\rho/2)\left\lVert r \right\rVert_2^2 = (\rho/2)\left\lVert r+u \right\rVert_2^2-(\rho/2)\left\lVert u \right\rVert_2^2
yTr+(ρ/2)∥r∥22=(ρ/2)∥r+u∥22−(ρ/2)∥u∥22
其中,
u
=
(
1
/
ρ
)
y
u=(1/\rho)y
u=(1/ρ)y是归一化对偶变量
归一化算法步骤如下:
x
k
+
1
=
*
a
r
g
m
i
n
x
(
f
(
x
)
+
(
ρ
/
2
)
∥
A
x
+
B
z
k
−
c
+
u
k
∥
2
2
)
z
k
+
1
=
*
a
r
g
m
i
n
z
(
g
(
z
)
+
(
ρ
/
2
)
∥
A
x
k
+
1
+
B
z
−
c
+
u
k
∥
2
2
)
u
k
+
1
=
u
k
+
A
x
k
+
1
+
B
z
k
+
1
−
c
x^{k+1}=\operatorname*{argmin}_{x}(f(x)+(\rho/2)\left\lVert Ax+Bz^k-c+u^k \right\rVert_2^2)\\ z^{k+1}=\operatorname*{argmin}_{z}(g(z)+(\rho/2)\left\lVert Ax^{k+1}+Bz-c+u^k \right\rVert_2^2)\\ u^{k+1}=u^k+Ax^{k+1}+Bz^{k+1}-c
xk+1=*argminx(f(x)+(ρ/2)∥∥Ax+Bzk−c+uk∥∥22)zk+1=*argminz(g(z)+(ρ/2)∥∥Axk+1+Bz−c+uk∥∥22)uk+1=uk+Axk+1+Bzk+1−c
可见,
u
k
=
u
0
+
∑
j
=
i
k
r
j
r
k
=
A
x
k
+
B
z
k
−
c
u^k=u^0+\sum_{j=i}^{k}r^j \\ r^k = Ax^k+Bz^k-c
uk=u0+j=i∑krjrk=Axk+Bzk−c
收敛性
- 假设1: e p i f = { ( x , t ) ∈ R n × R ∣ f ( x ) ≤ t } epi \ f=\{(x,t)\in R^n×R \ |\ f(x)\le t \} epi f={(x,t)∈Rn×R ∣ f(x)≤t}是闭的且非空的凸集
- 假设2:存在 ( x ∗ , z ∗ , y ∗ ) (x^*,z^*,y^*) (x∗,z∗,y∗)使得下式成立
L 0 ( x ∗ , z ∗ , y ) ≤ L 0 ( x ∗ , z ∗ , y ∗ ) ≤ L 0 ( x , z , y ∗ ) L_0(x^*,z^*,y)\le L_0(x^*,z^*,y^*) \le L_0(x,z,y^*) L0(x∗,z∗,y)≤L0(x∗,z∗,y∗)≤L0(x,z,y∗)
满足上述条件下,有
- 残差收敛: r k → 0 a s k → ∞ r^k \to 0 \ as \ k \to ∞ rk→0 as k→∞
- 目标函数收敛: f ( x k ) + g ( z k ) → p ∗ a s k → ∞ f(x^k)+g(z^k) \to p^* \ as \ k \to ∞ f(xk)+g(zk)→p∗ as k→∞。其中, p ∗ p^* p∗是最优值
- 对偶变量收敛: y k → y ∗ a s k → ∞ y^k \to y^* \ as \ k \to ∞ yk→y∗ as k→∞。其中, y ∗ y^* y∗是对偶最优点
注意, x k , y k x^k,y^k xk,yk不一定收敛到最优值,仅在上述条件满足下。
最优条件
- 原问题的可行性
A x ∗ + B z ∗ − c = 0 Ax^*+Bz^*-c=0 Ax∗+Bz∗−c=0
- 对偶可行性
0 = ∇ f ( x ∗ ) + A T y ∗ 0=\nabla f(x^*)+A^Ty^* \\ 0=∇f(x∗)+ATy∗
0 = ∇ g ( z ∗ ) + B T y ∗ 0=\nabla g(z^*)+B^Ty^* 0=∇g(z∗)+BTy∗
因为
z
k
+
1
z^{k+1}
zk+1最小化
L
ρ
(
x
k
+
1
,
z
,
y
k
)
L_\rho (x^{k+1},z,y^k)
Lρ(xk+1,z,yk),所以有
0
=
∇
g
(
z
k
+
1
)
+
B
T
y
k
+
ρ
B
T
(
A
x
k
+
1
+
B
z
k
+
1
−
c
)
=
∇
g
(
z
k
+
1
+
+
B
T
y
k
+
ρ
B
T
r
k
+
1
)
=
∇
g
(
z
k
+
1
)
+
B
T
y
k
+
1
0 =\nabla g(z^{k+1})+B^Ty^k+\rho B^T(Ax^{k+1}+Bz^{k+1}-c) \\ =\nabla g(z^{k+1}++B^Ty^k+\rho B^Tr^{k+1}) \\ = \nabla g(z^{k+1})+B^Ty^{k+1}
0=∇g(zk+1)+BTyk+ρBT(Axk+1+Bzk+1−c)=∇g(zk+1++BTyk+ρBTrk+1)=∇g(zk+1)+BTyk+1
所以,式(25)总是成立的。
因为
x
k
+
1
x^{k+1}
xk+1最小化
L
ρ
(
x
,
z
k
,
y
k
)
L_\rho (x,z^k,y^k)
Lρ(x,zk,yk),所以有
0
=
∇
f
(
x
k
+
1
)
+
A
T
y
k
+
ρ
A
T
(
A
x
k
+
1
+
B
z
k
−
c
)
=
∇
f
(
x
k
+
1
+
+
A
T
(
y
k
+
ρ
r
k
+
1
+
ρ
B
(
z
k
−
z
k
+
1
)
)
)
=
∇
f
(
x
k
+
1
)
+
A
T
y
k
+
1
+
ρ
A
T
B
(
z
k
−
z
k
+
1
)
0 =\nabla f(x^{k+1})+A^Ty^k+\rho A^T(Ax^{k+1}+Bz^{k}-c) \\ =\nabla f(x^{k+1}++A^T(y^k+\rho r^{k+1}+\rho B(z^k-z^{k+1}))) \\ = \nabla f(x^{k+1})+A^T y^{k+1}+\rho A^TB(z^k-z^{k+1})
0=∇f(xk+1)+ATyk+ρAT(Axk+1+Bzk−c)=∇f(xk+1++AT(yk+ρrk+1+ρB(zk−zk+1)))=∇f(xk+1)+ATyk+1+ρATB(zk−zk+1)
因此,
s
k
+
1
=
ρ
A
T
B
(
z
k
+
1
−
z
k
)
s^{k+1}=\rho A^TB(z^{k+1}-z^k)
sk+1=ρATB(zk+1−zk)可以看做式(24)的残差
停止条件
收敛条件得出:
f
(
x
k
)
+
g
(
z
k
)
−
p
∗
≤
−
(
y
k
)
T
r
k
+
(
x
k
−
x
∗
)
T
s
k
f(x^k)+g(z^k)-p^*\le -(y^k)^Tr^k+(x^k-x^*)^Ts^k
f(xk)+g(zk)−p∗≤−(yk)Trk+(xk−x∗)Tsk
假设
∣
∣
x
k
−
x
∗
∣
∣
≤
d
||x^k-x^*||\le d
∣∣xk−x∗∣∣≤d,则
f
(
x
k
)
+
g
(
z
k
)
−
p
∗
≤
−
(
y
k
)
T
r
k
+
d
∣
∣
s
k
∣
∣
2
≤
∣
∣
y
k
∣
∣
2
∣
∣
r
k
∣
∣
2
+
d
∣
∣
s
k
∣
∣
2
f(x^k)+g(z^k)-p^*\le -(y^k)^Tr^k+d||s^k||_2 \le ||y^k||_2||r^k||_2+d||s^k||_2
f(xk)+g(zk)−p∗≤−(yk)Trk+d∣∣sk∣∣2≤∣∣yk∣∣2∣∣rk∣∣2+d∣∣sk∣∣2
因此,终止条件设为:
∣
∣
r
k
∣
∣
2
≤
ϵ
p
r
i
a
n
d
∣
∣
s
k
∣
∣
2
≤
ϵ
d
u
a
l
||r^k||_2\le \epsilon^{pri} \ \ and \ \ ||s^k||_2\le \epsilon^{dual}
∣∣rk∣∣2≤ϵpri and ∣∣sk∣∣2≤ϵdual
其中,
ϵ
p
r
i
=
p
ϵ
a
b
s
+
ϵ
r
e
l
m
a
x
{
∣
∣
A
x
k
∣
∣
2
,
∣
∣
B
z
k
∣
∣
2
,
∣
∣
c
∣
∣
2
}
ϵ
d
u
a
l
=
n
ϵ
a
b
s
+
ϵ
r
e
l
∣
∣
A
T
y
k
∣
∣
2
\epsilon^{pri} = \sqrt{p}\epsilon^{abs}+\epsilon^{rel}max\{ ||Ax^k||_2,||Bz^k||_2 ,||c||_2\} \\ \epsilon^{dual} = \sqrt{n}\epsilon^{abs}+\epsilon^{rel}||A^Ty^k||_2
ϵpri=pϵabs+ϵrelmax{∣∣Axk∣∣2,∣∣Bzk∣∣2,∣∣c∣∣2}ϵdual=nϵabs+ϵrel∣∣ATyk∣∣2
其中,
p
,
n
p,n
p,n是各自向量的维度
ADMM变种
变化的惩罚参数
ρ k + 1 : = { τ i n c r ρ k , if ∣ ∣ r k ∣ ∣ 2 > μ ∣ ∣ s k ∣ ∣ 2 ρ k / τ d e c r , if ∣ ∣ s k ∣ ∣ 2 > μ ∣ ∣ r k ∣ ∣ 2 ρ k , o t h e r w i s e \rho^{k+1}:=\begin{cases} \tau^{incr}\rho^k, & \text{if} ||r^k||_2>\mu ||s^k||_2 \\ \rho^k/\tau^{decr}, & \text{if} ||s^k||_2>\mu ||r^k||_2 \\ \rho^k, &otherwise \end{cases} ρk+1:=⎩⎪⎨⎪⎧τincrρk,ρk/τdecr,ρk,if∣∣rk∣∣2>μ∣∣sk∣∣2if∣∣sk∣∣2>μ∣∣rk∣∣2otherwise
其中,一般 μ = 10 > 1 , τ i n c r = 2 > 1 , τ d e c r = 2 > 1 \mu=10>1,\tau^{incr}=2>1,\tau^{decr}=2>1 μ=10>1,τincr=2>1,τdecr=2>1
直观地, ρ \rho ρ变大,会使得 r k r^k rk变小
广义增广
( ρ / 2 ) ∣ ∣ r ∣ ∣ 2 2 → ( 1 / 2 ) r T P r (\rho/2)||r||_2^2 \to (1/2)r^TPr (ρ/2)∣∣r∣∣22→(1/2)rTPr
其中,P是正定的。这可以看做一个新的标准ADMM
。不过,
r
=
0
→
F
r
=
0
r=0 \to Fr=0
r=0→Fr=0。其中,
F
T
F
=
P
F^TF=P
FTF=P
超松弛
在更新z和y时,
A
x
k
+
1
→
α
k
A
x
k
+
1
−
(
1
−
α
k
)
(
B
z
k
−
c
)
Ax^{k+1} \to \alpha^kAx^{k+1}-(1-\alpha^k)(Bz^k-c)
Axk+1→αkAxk+1−(1−αk)(Bzk−c)
一般,选择超松弛参数
α
k
∈
[
1.5
,
1.8
]
\alpha^k \in [1.5,1.8]
αk∈[1.5,1.8]
常见形式
下面只考虑
x
x
x更新步骤,其他类似
x
+
=
a
r
g
m
i
n
x
(
f
(
x
)
+
ρ
/
2
∣
∣
A
x
−
v
∣
∣
2
2
)
w
h
e
r
e
v
=
−
B
z
+
c
−
u
x^+=argmin_x(f(x)+\rho/2||Ax-v||_2^2)\\ where \ \ v=-Bz+c-u
x+=argminx(f(x)+ρ/2∣∣Ax−v∣∣22)where v=−Bz+c−u
二次目标函数
f ( x ) = ( 1 / 2 ) x T P x + q T x + r f(x)=(1/2)x^TPx+q^Tx+r f(x)=(1/2)xTPx+qTx+r
则
x
+
=
(
P
+
ρ
A
T
A
)
−
1
(
ρ
A
T
v
−
q
)
x^+=(P+\rho A^TA)^{-1}(\rho A^Tv-q)
x+=(P+ρATA)−1(ρATv−q)
式(43)可以利用稀疏性、保存重复的分解等来加快速度
元素软阈值
当
f
(
x
)
=
λ
∣
∣
x
∣
∣
1
,
A
=
I
f(x)=\lambda ||x||_1,A=I
f(x)=λ∣∣x∣∣1,A=I时。对于每个标量
x
i
x_i
xi的更新为
x
i
+
:
=
a
r
g
m
i
n
x
i
(
λ
∣
x
i
∣
+
(
ρ
/
2
)
(
x
i
−
v
i
)
2
)
x_i^+:=argmin_{x_i}(\lambda|x_i|+(\rho/2)(x_i-v_i)^2)
xi+:=argminxi(λ∣xi∣+(ρ/2)(xi−vi)2)
尽管,
f
(
x
)
f(x)
f(x)不可导。但是存在简单的解析解:
x
i
+
:
=
S
λ
/
ρ
(
v
i
)
x_i^+:=S_{\lambda/\rho}(v_i)
xi+:=Sλ/ρ(vi)
软阈值操作符
S
S
S:
S
k
(
a
)
=
{
a
−
k
a
>
k
0
∣
a
∣
≤
k
a
+
k
a
<
−
k
S_k(a)=\begin{cases} a-k & a>k\\ 0 & |a|\le k \\ a+k & a<-k \end{cases}
Sk(a)=⎩⎪⎨⎪⎧a−k0a+ka>k∣a∣≤ka<−k
或者表示为
S
k
(
a
)
=
(
a
−
k
)
+
−
(
−
a
−
k
)
+
S_k(a)=(a-k)_+-(-a-k)_+
Sk(a)=(a−k)+−(−a−k)+
这也是shrinkage operator
,如下
S
k
(
a
)
=
(
1
−
k
/
∣
a
∣
)
+
a
,
f
o
r
a
≠
0
S_k(a)=(1-k/|a|)_+a,\ \ for \ \ a \ne 0
Sk(a)=(1−k/∣a∣)+a, for a̸=0
一阶范数问题
最小化
∣
∣
A
x
−
b
∣
∣
1
||Ax-b||_1
∣∣Ax−b∣∣1,转化为ADMM形式
m
i
n
i
m
i
z
e
∣
∣
z
∣
∣
1
s
u
b
j
e
c
t
t
o
A
x
−
z
=
b
minimize \ \ ||z||_1 \\ subject\ \ to \ \ Ax-z=b
minimize ∣∣z∣∣1subject to Ax−z=b
上面式子中,
f
=
0
,
g
=
∣
∣
∗
∣
∣
1
f=0,g=||*||_1
f=0,g=∣∣∗∣∣1。假设
A
T
A
A^TA
ATA可逆,则ADMM算法步骤如下:
x
k
+
1
=
(
A
T
A
)
−
1
A
T
(
b
+
z
k
−
u
k
)
z
k
+
1
=
S
1
/
ρ
(
A
x
k
+
1
−
b
+
u
k
)
u
k
+
1
=
u
k
+
A
x
k
+
1
−
z
k
+
1
−
b
x^{k+1} = (A^TA)^{-1}A^T(b+z^k-u^k) \\ z^{k+1} = S_{1/\rho}(Ax^{k+1}-b+u^k) \\ u^{k+1} = u^k+Ax^{k+1}-z^{k+1}-b
xk+1=(ATA)−1AT(b+zk−uk)zk+1=S1/ρ(Axk+1−b+uk)uk+1=uk+Axk+1−zk+1−b
openMVG
代码
eigen
线性代数
template <typename Linear_SolverT>
inline void Compute
(
const Eigen::SparseMatrix<double>& spd_mat,
Linear_SolverT * linear_solver
)
{
linear_solver->compute(spd_mat);
}
template <typename Linear_SolverT>
inline void Compute
(
const Eigen::MatrixXd& spd_mat,
Linear_SolverT * linear_solver
)
{
linear_solver->compute(spd_mat.sparseView());
}
迭代参数
struct Options {
int max_num_iterations = 1000;
// Rho is the augmented Lagrangian parameter.
double rho = 1.0;
// Alpha is the over-relaxation parameter (typically between 1.0 and 1.8).
double alpha = 1.0;
double absolute_tolerance = 1e-4;
double relative_tolerance = 1e-2;
};
私有成员
Options options_;
// Matrix A where || Ax - b ||_1 is the problem we are solving.
MatrixType a_;
// Cholesky linear solver.
#ifdef EIGEN_MPL2_ONLY
using Linear_Solver_T = Eigen::SparseLU<Eigen::SparseMatrix<double>>;
#else
// Since our linear system will be a SPD matrix we can
// utilize the Cholesky factorization.
using Linear_Solver_T = Eigen::SimplicialLLT<Eigen::SparseMatrix<double>>;
#endif
Linear_Solver_T linear_solver_;
Eigen::VectorXd Shrinkage
(
const Eigen::VectorXd& vec, const double kappa
) const
{
Eigen::ArrayXd zero_vec(vec.size());
zero_vec.setZero();
return zero_vec.max( vec.array() - kappa) -
zero_vec.max(-vec.array() - kappa);
}
初始化
L1Solver
(
const Options& options,
const MatrixType& mat
)
: options_(options), a_(mat)
{
// Analyze the sparsity pattern once. Only the values of the entries will be
// changed with each iteration.
const MatrixType spd_mat = a_.transpose() * a_;
l1_solver_internal::Compute(spd_mat, &linear_solver_);
}
核心
bool Solve
(
const Eigen::VectorXd& rhs, // b = rhs
Eigen::VectorXd* solution
)
{
// Since constructor was called before we check Compute status
if (linear_solver_.info() != Eigen::Success)
{
std::cerr << "Cannot compute the matrix factorization" << std::endl;
return false;
}
Eigen::VectorXd& x = *solution;
Eigen::VectorXd z(a_.rows()), u(a_.rows());
z.setZero();
u.setZero();
Eigen::VectorXd a_times_x(a_.rows()), z_old(z.size()), ax_hat(a_.rows());
// Precompute some convergence terms.
const double rhs_norm = rhs.norm();
const double primal_abs_tolerance_eps =
std::sqrt(a_.rows()) * options_.absolute_tolerance;
const double dual_abs_tolerance_eps =
std::sqrt(a_.cols()) * options_.absolute_tolerance;
for (int i = 0; i < options_.max_num_iterations; ++i)
{
// Update x.
x.noalias() = linear_solver_.solve(a_.transpose() * (rhs + z - u));
a_times_x.noalias() = a_ * x;
ax_hat.noalias() = options_.alpha * a_times_x;
ax_hat.noalias() += (1.0 - options_.alpha) * (z + rhs);
// Update z and set z_old.
std::swap(z, z_old);
z.noalias() = Shrinkage(ax_hat - rhs + u, 1.0 / options_.rho);
// Update u.
u.noalias() += ax_hat - z - rhs;
// Compute the convergence terms.
const double r_norm = (a_times_x - z - rhs).norm();
const double s_norm =
(-options_.rho * a_.transpose() * (z - z_old)).norm();
const double max_norm =
std::max({a_times_x.norm(), z.norm(), rhs_norm});
const double primal_eps =
primal_abs_tolerance_eps + options_.relative_tolerance * max_norm;
const double dual_eps =
dual_abs_tolerance_eps +
options_.relative_tolerance *
(options_.rho * a_.transpose() * u).norm();
// Log the result to the screen.
// std::ostringstream os;
// os << "Iteration: " << i << "\n"
// << "R norm: " << r_norm << "\n"
// << "S norm: " << s_norm << "\n"
// << "Primal eps: " << primal_eps << "\n"
// << "Dual eps: " << dual_eps << std::endl;
// std::cout << os.str() << std::endl;
// Determine if the minimizer has converged.
if (r_norm < primal_eps && s_norm < dual_eps)
{
return true;
}
}
return false;
}