最优化学习笔记:交替方向乘子法(3)

8.6 交替方向乘子法(续2)

8.6.5 收敛性分析

本节主要讨论交替方向乘子法 (8.6.5)—(8.6.7) 在问题 (8.6.1) 上的收敛性.在此之前我们先引入一些必要的假设.

假设 8.3 (1) f 1 ( x ) , f 2 ( x ) f_1(x),f_2(x) f1(x)f2(x) 均为闭凸函数,且每个 ADMM 迭代子问题存在唯一解;
\qquad\quad (2) 原始问题 (8.6.1) 的解集非空,且 Slater 条件满足.

假设 8.3 给出的条件是很基本的, f 1 f_1 f1 f 2 f_2 f2 的凸性保证了要求解的问题是凸问题,每个子问题存在唯一解是为了保证迭代的良定义;而在 Slater 条件满足的情况下,原始问题的 KKT 对和最优解是对应的,因此可以很方便地使用 KKT 条件来讨论收敛性.
由于原始问题解集非空,不妨设 ( x 1 ∗ , x 2 ∗ , y ∗ ) (x_1^*,x_2^*,y^*) (x1,x2,y) 是 KKT 对,即满足条件 (8.6.8)
− A 1 T y ∗ ∈ ∂ f 1 ( x 1 ∗ ) , − A 2 T y ∗ ∈ ∂ f 2 ( x 2 ∗ ) , A 1 x 1 ∗ + A 2 x 2 ∗ = b . -A_1^\mathrm{T}y^*\in\partial f_1(x_1^*),\quad-A_2^\mathrm{T}y^*\in\partial f_2(x_2^*),\quad A_1x_1^*+A_2x_2^*=b. A1Tyf1(x1),A2Tyf2(x2),A1x1+A2x2=b.

我们最终的目的是证明 ADMM 迭代序列 { ( x 1 k , x 2 k , y k ) } \{(x_1^k,x_2^k,y^k)\} {(x1k,x2k,yk)} 收敛到原始问题的一个 KKT 对,因此引入如下记号来表示当前迭代点和 KKT 对的误差:
( e 1 k , e 2 k , e y k ) = def ( x 1 k , x 2 k , y k ) − ( x 1 ∗ , x 2 ∗ , y ∗ ) (e_1^k,e_2^k,e_y^k)\stackrel{\text{def}}{=}(x_1^k,x_2^k,y^k)-(x_1^*,x_2^*,y^*) (e1k,e2k,eyk)=def(x1k,x2k,yk)(x1,x2,y)

我们进一步引入如下辅助变量来简化之后的证明:
u k = − A 1 T [ y k + ( 1 − τ ) ρ ( A 1 e 1 k + A 2 e 2 k ) + ρ A 2 ( x 2 k − 1 − x 2 k ) ] v k = − A 2 T [ y k + ( 1 − τ ) ρ ( A 1 e 1 k + A 2 e 2 k ) ] Ψ k = 1 τ ρ ∥ e y k ∥ 2 + ρ ∥ A 2 e 2 k ∥ 2 Φ k = Ψ k + max ⁡   ( 1 − τ , 1 − τ − 1 ) ρ ∥ A 1 e 1 k + A 2 e 2 k ∥ 2 ( 8.6.39 ) \begin{aligned}&u^{k}=-A_{1}{}^{\mathrm{T}}[y^{k}+(1-\tau)\rho(A_{1}e_{1}^{k}+A_{2}e_{2}^{k})+\rho A_{2}(x_{2}^{k-1}-x_{2}^{k})]\\ &v^{k}=-A_{2}{}^{\mathrm{T}}[y^{k}+(1-\tau)\rho(A_{1}e_{1}^{k}+A_{2}e_{2}^{k})]\\ &\Psi_{k}=\frac{1}{\tau\rho}\|e_{y}^{k}\|^{2}+\rho\|A_{2}e_{2}^{k}\|^{2}\\&\Phi_{k}=\Psi_{k}+\max\:(1-\tau,1-\tau^{-1})\rho\|A_{1}e_{1}^{k}+A_{2}e_{2}^{k}\|^{2}\end{aligned}\qquad(8.6.39) uk=A1T[yk+(1τ)ρ(A1e1k+A2e2k)+ρA2(x2k1x2k)]vk=A2T[yk+(1τ)ρ(A1e1k+A2e2k)]Ψk=τρ1eyk2+ρA2e2k2Φk=Ψk+max(1τ,1τ1)ρA1e1k+A2e2k2(8.6.39)

在这些记号的基础上,我们有如下结果:

引理 8.7 假设 { ( x 1 k , x 2 k , y k ) } \{(x_1^k,x_2^k,y^k)\} {(x1k,x2k,yk)} 为交替方向乘子法产生一个迭代序列, 那么,对任意的 k ⩾ 1 k\geqslant 1 k1
u k ∈ ∂ f 1 ( x 1 k ) ,   v k ∈ ∂ f 2 ( x 2 k ) ( 8.6.40 ) Φ k − Φ k + 1 ⩾ min ⁡ ( τ , 1 + τ − τ 2 ) ρ ∥ A 2 ( x 2 k − x 2 k + 1 ) ∥ 2 + min ⁡ ( 1 , 1 + τ − 1 − τ ) ρ ∥ A 1 e 1 k + 1 + A 2 e 2 k + 1 ∥ 2 ( 8.6.41 ) \begin{aligned}&u^k\in\partial f_1(x_1^k),\ v^k\in\partial f_2(x_2^k)\qquad(8.6.40) \\\Phi_{k}-\Phi_{k+1}&\geqslant\min(\tau,1+\tau-\tau^{2})\rho\|A_{2}(x_{2}^{k}-x_{2}^{k+1})\|^{2}+\min(1,1+\tau^{-1}-\tau)\rho\|A_{1}e_{1}^{k+1}+A_{2}e_{2}^{k+1}\|^{2}\qquad(8.6.41)\end{aligned} ΦkΦk+1ukf1(x1k), vkf2(x2k)(8.6.40)min(τ,1+ττ2)ρA2(x2kx2k+1)2+min(1,1+τ1τ)ρA1e1k+1+A2e2k+12(8.6.41)
证明
先证明 (8.6.40) 式的两个结论.根据交替方向乘子法的迭代过程,对 x 1 k + 1 x_1^{k+1} x1k+1 我们有
0 ∈ ∂ f 1 ( x 1 k + 1 ) + A 1 T y k + ρ A 1 T ( A 1 x 1 k + 1 + A 2 x 2 k − b ) 0\in\partial f_1(x_1^{k+1})+A_1^\mathrm{T}y^k+\rho A_1^\mathrm{T}(A_1x_1^{k+1}+A_2x_2^k-b) 0f1(x1k+1)+A1Tyk+ρA1T(A1x1k+1+A2x2kb)

y k = y k + 1 − τ ρ ( A 1 x 1 k + 1 + A 2 x 2 k + 1 − b ) y^k=y^{k+1}-\tau\rho(A_1x_1^{k+1}+A_2x_2^{k+1}-b) yk=yk+1τρ(A1x1k+1+A2x2k+1b) 代入上式,消去 y k y^k yk 就有
− A 1 T ( y k + 1 + ( 1 − τ ) ρ ( A 1 x 1 k + 1 + A 2 x 2 k + 1 − b ) + ρ A 2 ( x 2 k − x 2 k + 1 ) ) ∈ ∂ f 1 ( x 1 k + 1 ) -A_{1}^{\mathrm{T}}\Big(y^{k+1}+(1-\tau)\rho(A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}-b)+\rho A_{2}(x_{2}^{k}-x_{2}^{k+1})\Big)\in\partial f_{1}(x_{1}^{k+1}) A1T(yk+1+(1τ)ρ(A1x1k+1+A2x2k+1b)+ρA2(x2kx2k+1))f1(x1k+1)

根据 u k u^k uk 的定义自然有 u k ∈ ∂ f 1 ( x 1 k ) u^k\in\partial f_1(x_1^k) ukf1(x1k) (注意代回 b = A 1 x 1 ∗ + A 2 x 2 ∗ b=A_1x_1^*+A_2x_2^* b=A1x1+A2x2).
类似地,对 x 2 k + 1 x_2^{k+1} x2k+1 我们有
0 ∈ ∂ f 2 ( x 2 k + 1 ) + A 2 T y k + ρ A 2 T ( A 1 x 1 k + 1 + A 2 x 2 k + 1 − b ) 0\in\partial f_2(x_2^{k+1})+A_2^\mathrm{T}y^k+\rho A_2^\mathrm{T}(A_1x_1^{k+1}+A_2x_2^{k+1}-b) 0f2(x2k+1)+A2Tyk+ρA2T(A1x1k+1+A2x2k+1b)

同样利用 y k y^k yk 的表达式消去 y k y^k yk, 得到
− A 2 T ( y k + 1 + ( 1 − τ ) ρ ( A 1 x 1 k + 1 + A 2 x 2 k + 1 − b ) ) ∈ ∂ f 2 ( x 2 k + 1 ) -A_2^\mathrm{T}\Big(y^{k+1}+(1-\tau)\rho(A_1x_1^{k+1}+A_2x_2^{k+1}-b)\Big)\in\partial f_2(x_2^{k+1}) A2T(yk+1+(1τ)ρ(A1x1k+1+A2x2k+1b))f2(x2k+1)

根据 v k v^k vk 的定义自然有 v k ∈ ∂ f 2 ( x 2 k ) v^k\in\partial f_2(x_2^k) vkf2(x2k).
接下来证明不等式 (8.6.41). 首先根据 ( x 1 ∗ , x 2 ∗ , y ∗ ) (x_1^*,x_2^*,y^*) (x1,x2,y) 的最优性条件以及关系式 (8.6.40),
u k + 1 ∈ ∂ f 1 ( x 1 k + 1 ) , − A 1 T y ∗ ∈ ∂ f 1 ( x 1 ∗ ) , v k + 1 ∈ ∂ f 2 ( x 2 k + 1 ) , − A 2 T y ∗ ∈ ∂ f 2 ( x 2 ∗ ) . u^{k+1}\in\partial f_{1}(x_{1}^{k+1}),\quad-A_{1}^{\mathrm{T}}y^{*}\in\partial f_{1}(x_{1}^{*}),\\v^{k+1}\in\partial f_{2}(x_{2}^{k+1}),\quad-A_{2}^{\mathrm{T}}y^{*}\in\partial f_{2}(x_{2}^{*}). uk+1f1(x1k+1),A1Tyf1(x1),vk+1f2(x2k+1),A2Tyf2(x2).

根据凸函数的单调性,
⟨ u k + 1 + A 1 T y ∗ , x 1 k + 1 − x 1 ∗ ⟩ ⩾ 0 ⟨ v k + 1 + A 2 T y ∗ , x 2 k + 1 − x 2 ∗ ⟩ ⩾ 0 \left\langle u^{k+1}+A_{1}^{\mathrm{T}}y^{*},x_{1}^{k+1}-x_{1}^{*}\right\rangle\geqslant 0 \\\left\langle v^{k+1}+A_{2}^{\mathrm{T}}y^{*},x_{2}^{k+1}-x_{2}^{*}\right\rangle\geqslant 0 uk+1+A1Ty,x1k+1x10vk+1+A2Ty,x2k+1x20

将上述两个不等式相加,结合 u k + 1 , v k + 1 u^{k+1},v^{k+1} uk+1,vk+1 的定义,并注意到恒等式
A 1 x 1 k + 1 + A 2 x 2 k + 1 − b = ( τ ρ ) − 1 ( y k + 1 − y k ) = ( τ ρ ) − 1 ( e y k + 1 − e y k ) ( 8.6.42 ) A_1x_1^{k+1}+A_2x_2^{k+1}-b=(\tau\rho)^{-1}(y^{k+1}-y^k)=(\tau\rho)^{-1}(e_y^{k+1}-e_y^k)\qquad(8.6.42) A1x1k+1+A2x2k+1b=(τρ)1(yk+1yk)=(τρ)1(eyk+1eyk)(8.6.42)

⟨ u k + 1 + A 1 T y ∗ , x 1 k + 1 − x 1 ∗ ⟩ + ⟨ v k + 1 + A 2 T y ∗ , x 2 k + 1 − x 2 ∗ ⟩ = ⟨ − A 1 T [ y k + 1 + ( 1 − τ ) ρ ( A 1 e 1 k + 1 + A 2 e 2 k + 1 ) + ρ A 2 ( x 2 k − x 2 k + 1 ) ] + A 1 T y ∗ , x 1 k + 1 − x 1 ∗ ⟩ + ⟨ − A 2 T [ y k + 1 + ( 1 − τ ) ρ ( A 1 e 1 k + 1 + A 2 e 2 k + 1 ) ] + A 2 T y ∗ , x 2 k + 1 − x 2 ∗ ⟩ = ⟨ − A 1 T e y k + 1 , x 1 k + 1 − x 1 ∗ ⟩ + ⟨ − A 1 T [ ( 1 − τ ) ρ ( A 1 e 1 k + 1 + A 2 e 2 k + 1 ) ] , x 1 k + 1 − x 1 ∗ ⟩ + ⟨ − A 2 T e y k + 1 , x 2 k + 1 − x 2 ∗ ⟩ + ⟨ − A 2 T [ ( 1 − τ ) ρ ( A 1 e 1 k + 1 + A 2 e 2 k + 1 ) ] , x 2 k + 1 − x 2 ∗ ⟩ + ⟨ − A 1 T [ ρ A 2 ( x 2 k − x 2 k + 1 ) ] , x 1 k + 1 − x 1 ∗ ⟩ = ⟨ − A 1 T e y k + 1 , x 1 k + 1 − x 1 ∗ ⟩ + ⟨ − A 2 T e y k + 1 , x 2 k + 1 − x 2 ∗ ⟩ + ⟨ − A 1 T [ ( 1 − τ ) ρ ( A 1 x 1 k + 1 + A 2 x 2 k + 1 − b ) ] , x 1 k + 1 − x 1 ∗ ⟩ + ⟨ − A 2 T [ ( 1 − τ ) ρ ( A 1 x 1 k + 1 + A 2 x 2 k + 1 − b ) ] , x 2 k + 1 − x 2 ∗ ⟩ + ⟨ − A 1 T [ ρ A 2 ( x 2 k − x 2 k + 1 ) ] , x 1 k + 1 − x 1 ∗ ⟩ = 1 τ ρ ⟨ e y k + 1 , e y k − e y k + 1 ⟩ − ( 1 − τ ) ρ ∥ A 1 x 1 k + 1 + A 2 x 2 k + 1 − b ∥ 2 + ⟨ − A 1 T [ ρ A 2 ( x 2 k − x 2 k + 1 ) ] , x 1 k + 1 − x 1 ∗ ⟩ \begin{aligned}&\left\langle u^{k+1}+A_{1}^{\mathrm{T}}y^{*},x_{1}^{k+1}-x_{1}^{*}\right\rangle+\left\langle v^{k+1}+A_{2}^{\mathrm{T}}y^{*},x_{2}^{k+1}-x_{2}^{*}\right\rangle\\=&\left\langle -A_{1}{}^{\mathrm{T}}[y^{k+1}+(1-\tau)\rho(A_{1}e_{1}^{k+1}+A_{2}e_{2}^{k+1})+\rho A_{2}(x_{2}^{k}-x_{2}^{k+1})]+A_{1}^{\mathrm{T}}y^{*},x_{1}^{k+1}-x_{1}^{*}\right\rangle+\left\langle -A_{2}{}^{\mathrm{T}}[y^{k+1}+(1-\tau)\rho(A_{1}e_{1}^{k+1}+A_{2}e_{2}^{k+1})]+A_{2}^{\mathrm{T}}y^{*},x_{2}^{k+1}-x_{2}^{*}\right\rangle\\=&\left\langle-A_1^{\mathrm{T}}e_y^{k+1},x_{1}^{k+1}-x_{1}^{*}\right\rangle+\left\langle -A_{1}{}^{\mathrm{T}}[(1-\tau)\rho(A_{1}e_{1}^{k+1}+A_{2}e_{2}^{k+1})],x_{1}^{k+1}-x_{1}^{*}\right\rangle\\+&\left\langle-A_2^{\mathrm{T}}e_y^{k+1},x_{2}^{k+1}-x_{2}^{*}\right\rangle+\left\langle -A_{2}{}^{\mathrm{T}}[(1-\tau)\rho(A_{1}e_{1}^{k+1}+A_{2}e_{2}^{k+1})],x_{2}^{k+1}-x_{2}^{*}\right\rangle+\left\langle -A_{1}{}^{\mathrm{T}}[\rho A_{2}(x_{2}^{k}-x_{2}^{k+1})],x_{1}^{k+1}-x_{1}^{*}\right\rangle\\=&\left\langle-A_1^{\mathrm{T}}e_y^{k+1},x_{1}^{k+1}-x_{1}^{*}\right\rangle+\left\langle-A_2^{\mathrm{T}}e_y^{k+1},x_{2}^{k+1}-x_{2}^{*}\right\rangle\\+&\left\langle -A_{1}{}^{\mathrm{T}}[(1-\tau)\rho(A_1x_1^{k+1}+A_2x_2^{k+1}-b)],x_{1}^{k+1}-x_{1}^{*}\right\rangle+\left\langle -A_{2}{}^{\mathrm{T}}[(1-\tau)\rho(A_1x_1^{k+1}+A_2x_2^{k+1}-b)],x_{2}^{k+1}-x_{2}^{*}\right\rangle+\left\langle -A_{1}{}^{\mathrm{T}}[\rho A_{2}(x_{2}^{k}-x_{2}^{k+1})],x_{1}^{k+1}-x_{1}^{*}\right\rangle\\=&\frac{1}{\tau\rho}\left\langle e_{y}^{k+1},e_{y}^{k}-e_{y}^{k+1}\right\rangle-(1-\tau)\rho\|A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}-b\|^{2}\\+&\left\langle -A_{1}{}^{\mathrm{T}}[\rho A_{2}(x_{2}^{k}-x_{2}^{k+1})],x_{1}^{k+1}-x_{1}^{*}\right\rangle\end{aligned} ==+=+=+uk+1+A1Ty,x1k+1x1+vk+1+A2Ty,x2k+1x2A1T[yk+1+(1τ)ρ(A1e1k+1+A2e2k+1)+ρA2(x2kx2k+1)]+A1Ty,x1k+1x1+A2T[yk+1+(1τ)ρ(A1e1k+1+A2e2k+1)]+A2Ty,x2k+1x2A1Teyk+1,x1k+1x1+A1T[(1τ)ρ(A1e1k+1+A2e2k+1)],x1k+1x1A2Teyk+1,x2k+1x2+A2T[(1τ)ρ(A1e1k+1+A2e2k+1)],x2k+1x2+A1T[ρA2(x2kx2k+1)],x1k+1x1A1Teyk+1,x1k+1x1+A2Teyk+1,x2k+1x2A1T[(1τ)ρ(A1x1k+1+A2x2k+1b)],x1k+1x1+A2T[(1τ)ρ(A1x1k+1+A2x2k+1b)],x2k+1x2+A1T[ρA2(x2kx2k+1)],x1k+1x1τρ1eyk+1,eykeyk+1(1τ)ρA1x1k+1+A2x2k+1b2A1T[ρA2(x2kx2k+1)],x1k+1x1

最后可以得到
1 τ ρ ⟨ e y k + 1 , e y k − e y k + 1 ⟩ − ( 1 − τ ) ρ ∥ A 1 x 1 k + 1 + A 2 x 2 k + 1 − b ∥ 2 + ρ ⟨ A 2 ( x 2 k + 1 − x 2 k ) , A 1 x 1 k + 1 + A 2 x 2 k + 1 − b ⟩ − ρ ⟨ A 2 ( x 2 k + 1 − x 2 k ) , A 2 e 2 k + 1 ⟩ ⩾ 0 ( 8.6.43 ) \begin{aligned}&\frac{1}{\tau\rho}\left\langle e_{y}^{k+1},e_{y}^{k}-e_{y}^{k+1}\right\rangle-(1-\tau)\rho\|A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}-b\|^{2} \\&+\rho\left\langle A_{2}(x_{2}^{k+1}-x_{2}^{k}),A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}-b\right\rangle \\&-\rho\left\langle A_{2}(x_{2}^{k+1}-x_{2}^{k}),A_{2}e_{2}^{k+1}\right\rangle\geqslant 0\end{aligned}\qquad(8.6.43) τρ1eyk+1,eykeyk+1(1τ)ρA1x1k+1+A2x2k+1b2+ρA2(x2k+1x2k),A1x1k+1+A2x2k+1bρA2(x2k+1x2k),A2e2k+10(8.6.43)

不等式 (8.6.43) 的形式和不等式 (8.6.41) 还有一定差异,主要的差别就在
ρ ⟨ A 2 ( x 2 k + 1 − x 2 k ) , A 1 x 1 k + 1 + A 2 x 2 k + 1 − b ⟩ \rho\left\langle A_2(x_2^{k+1}-x_2^k),A_1x_1^{k+1}+A_2x_2^{k+1}-b\right\rangle ρA2(x2k+1x2k),A1x1k+1+A2x2k+1b

这一项上. 接下来估计这一项的上界. 为了方便,引入新符号
ν k + 1 = y k + 1 + ( 1 − τ ) ρ ( A 1 x 1 k + 1 + A 2 x 2 k + 1 − b ) M k + 1 = ( 1 − τ ) ρ ⟨ A 2 ( x 2 k + 1 − x 2 k ) , A 1 x 1 k + A 2 x 2 k − b ⟩ \begin{aligned}\nu^{k+1}&=y^{k+1}+(1-\tau)\rho(A_1x_1^{k+1}+A_2x_2^{k+1}-b) \\M^{k+1}&=(1-\tau)\rho\left\langle A_2(x_2^{k+1}-x_2^k),A_1x_1^k+A_2x_2^k-b\right\rangle\end{aligned} νk+1Mk+1=yk+1+(1τ)ρ(A1x1k+1+A2x2k+1b)=(1τ)ρA2(x2k+1x2k),A1x1k+A2x2kb

− A 2 T ν k + 1 ∈ ∂ f 2 ( x 2 k + 1 ) -A_2^\mathrm{T}\nu^{k+1}\in\partial f_2(x_2^{k+1}) A2Tνk+1f2(x2k+1) 以及 − A 2 T ν k ∈ ∂ f 2 ( x 2 k ) -A_2^\mathrm{T}\nu^k\in\partial f_2(x_2^k) A2Tνkf2(x2k). 再利用单调性知
⟨ − A 2 T ( ν k + 1 − ν k ) , x 2 k + 1 − x 2 k ⟩ ⩾ 0 ( 8.6.44 ) \left\langle-A_2^\mathrm{T}(\nu^{k+1}-\nu^k),x_2^{k+1}-x_2^k\right\rangle\geqslant 0\qquad(8.6.44) A2T(νk+1νk),x2k+1x2k0(8.6.44)

根据这些不等式关系我们最终得到
ρ ⟨ A 2 ( x 2 k + 1 − x 2 k ) , A 1 x 1 k + 1 + A 2 x 2 k + 1 − b ⟩ = ( 1 − τ ) ρ ⟨ A 2 ( x 2 k + 1 − x 2 k ) , A 1 x 1 k + 1 + A 2 x 2 k + 1 − b ⟩ + ⟨ A 2 ( x 2 k + 1 − x 2 k ) , y k + 1 − y k ⟩ = M k + 1 + ⟨ ν k + 1 − ν k , A 2 ( x 2 k + 1 − x 2 k ) ⟩ ⩽ M k + 1 \begin{aligned}&\rho\left\langle A_{2}(x_{2}^{k+1}-x_{2}^{k}),A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}-b\right\rangle \\=&(1-\tau)\rho\left\langle A_{2}(x_{2}^{k+1}-x_{2}^{k}),A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}-b\right\rangle+\left\langle A_{2}(x_{2}^{k+1}-x_{2}^{k}),y^{k+1}-y^{k}\right\rangle \\=&M^{k+1}+\left\langle\nu^{k+1}-\nu^{k},A_{2}(x_{2}^{k+1}-x_{2}^{k})\right\rangle\\\leqslant&M^{k+1}\end{aligned} ==ρA2(x2k+1x2k),A1x1k+1+A2x2k+1b(1τ)ρA2(x2k+1x2k),A1x1k+1+A2x2k+1b+A2(x2k+1x2k),yk+1ykMk+1+νk+1νk,A2(x2k+1x2k)Mk+1
估计完这一项之后,不等式 (8.6.43) 可以放缩成
1 τ ρ ⟨ e y k + 1 , e y k − e y k + 1 ⟩ − ( 1 − τ ) ρ ∥ A 1 x 1 k + 1 + A 2 x 2 k + 1 − b ∥ 2 + M k + 1 − ρ ⟨ A 2 ( x 2 k + 1 − x 2 k ) , A 2 e 2 k + 1 ⟩ ⩾ 0 \begin{gathered} \frac{1}{\tau\rho}\left\langle e_{y}^{k+1},e_{y}^{k}-e_{y}^{k+1}\right\rangle-(1-\tau)\rho\|A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}-b\|^{2}+M^{k+1}-\rho\left\langle A_{2}(x_{2}^{k+1}-x_{2}^{k}),A_{2}e_{2}^{k+1}\right\rangle\geqslant 0\end{gathered} τρ1eyk+1,eykeyk+1(1τ)ρA1x1k+1+A2x2k+1b2+Mk+1ρA2(x2k+1x2k),A2e2k+10

上式中含有内积项,利用恒等式
⟨ a , b ⟩ = 1 2 ( ∥ a ∥ 2 + ∥ b ∥ 2 − ∥ a − b ∥ 2 ) = 1 2 ( ∥ a + b ∥ 2 − ∥ a ∥ 2 − ∥ b ∥ 2 ) \langle a,b\rangle=\frac{1}{2}(\|a\|^2+\|b\|^2-\|a-b\|^2)=\frac{1}{2}(\|a+b\|^2-\|a\|^2-\|b\|^2) a,b=21(a2+b2ab2)=21(a+b2a2b2)

进一步得到
1 τ ρ ( ∥ e y k ∥ 2 − ∥ e y k + 1 ∥ 2 ) − ( 2 − τ ) ρ ∥ A 1 x 1 k + 1 + A 2 x 2 k + 1 − b ∥ 2 + 2 M k + 1 − ρ ∥ A 2 ( x 2 k + 1 − x 2 k ) ∥ 2 − ρ ∥ A 2 e 2 k + 1 ∥ 2 + ρ ∥ A 2 e 2 k ∥ 2 ⩾ 0 ( 8.6.45 ) \begin{aligned}&\frac{1}{\tau\rho}(\|e_{y}^{k}\|^{2}-\|e_{y}^{k+1}\|^{2})-(2-\tau)\rho\|A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}-b\|^{2}\\&+2M^{k+1}-\rho\|A_{2}(x_{2}^{k+1}-x_{2}^{k})\|^{2}-\rho\|A_{2}e_{2}^{k+1}\|^{2}+\rho\|A_{2}e_{2}^{k}\|^{2}\geqslant 0\end{aligned}\qquad(8.6.45) τρ1(eyk2eyk+12)(2τ)ρA1x1k+1+A2x2k+1b2+2Mk+1ρA2(x2k+1x2k)2ρA2e2k+12+ρA2e2k20(8.6.45)

此时除了 M k + 1 M^{k+1} Mk+1 中的项,(8.6.45) 中的其他项均在不等式 (8.6.41) 中出现. 由于 M k + 1 M^{k+1} Mk+1 的符号和 τ \tau τ 的取法有关,下面我们针对 τ \tau τ 的两种取法进行讨论.
情形一 τ ∈ ( 0 , 1 ] \tau\in(0,1] τ(0,1], 此时 M k + 1 ⩾ 0 M^{k+1}\geqslant 0 Mk+10, 根据基本不等式,
2 ⟨ A 2 ( x 2 k + 1 − x 2 k ) , A 1 x 1 k + A 2 x 2 k − b ⟩ ⩽ ∥ A 2 ( x 2 k + 1 − x 2 k ) ∥ 2 + ∥ A 1 x 1 k + A 2 x 2 k − b ∥ 2 2\left\langle A_2(x_2^{k+1}-x_2^k),A_1x_1^k+A_2x_2^k-b\right\rangle\leqslant\|A_{2}(x_{2}^{k+1}-x_{2}^{k})\|^{2}+\|A_{1}x_{1}^{k}+A_{2}x_{2}^{k}-b\|^{2} 2A2(x2k+1x2k),A1x1k+A2x2kbA2(x2k+1x2k)2+A1x1k+A2x2kb2

代入不等式 (8.6.45) 得到
1 τ ρ ∥ e y k ∥ 2 + ρ ∥ A 2 e 2 k ∥ 2 + ( 1 − τ ) ρ ∥ A 1 e 1 k + A 2 e 2 k ∥ 2 [ 1 τ ρ ∥ e y k + 1 ∥ 2 + ρ ∥ A 2 e 2 k + 1 ∥ 2 + ( 1 − τ ) ρ ∥ A 1 e 1 k + 1 + A 2 e 2 k + 1 ∥ 2 ] ⩾ ρ ∥ A 1 x 1 k + 1 + A 2 x 2 k + 1 − b ∥ 2 + τ ρ ∥ A 2 ( x 2 k + 1 − x 2 k ) ∥ 2 ( 8.6.46 ) \begin{aligned}&\frac{1}{\tau\rho}\|e_{y}^{k}\|^{2}+\rho\|A_{2}e_{2}^{k}\|^{2}+(1-\tau)\rho\|A_{1}e_{1}^{k}+A_{2}e_{2}^{k}\|^{2}\left[\frac{1}{\tau\rho}\|e_{y}^{k+1}\|^{2}+\rho\|A_{2}e_{2}^{k+1}\|^{2}+(1-\tau)\rho\|A_{1}e_{1}^{k+1}+A_{2}e_{2}^{k+1}\|^{2}\right]\\&\geqslant\rho\|A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}-b\|^{2}+\tau\rho\|A_{2}(x_{2}^{k+1}-x_{2}^{k})\|^{2}\end{aligned}\qquad(8.6.46) τρ1eyk2+ρA2e2k2+(1τ)ρA1e1k+A2e2k2[τρ1eyk+12+ρA2e2k+12+(1τ)ρA1e1k+1+A2e2k+12]ρA1x1k+1+A2x2k+1b2+τρA2(x2k+1x2k)2(8.6.46)

情形二 τ > 1 \tau>1 τ>1, 此时 M k + 1 < 0 M^{k+1}<0 Mk+1<0, 根据基本不等式,
− 2 ⟨ A 2 ( x 2 k + 1 − x 2 k ) , A 1 x 1 k + A 2 x 2 k − b ⟩ ⩽ τ ∥ A 2 ( x 2 k + 1 − x 2 k ) ∥ 2 + 1 τ ∥ A 1 x 1 k + A 2 x 2 k − b ∥ 2 -2\left\langle A_{2}(x_{2}^{k+1}-x_{2}^{k}),A_{1}x_{1}^{k}+A_{2}x_{2}^{k}-b\right\rangle\leqslant\tau\|A_{2}(x_{2}^{k+1}-x_{2}^{k})\|^{2}+\frac{1}{\tau}\|A_{1}x_{1}^{k}+A_{2}x_{2}^{k}-b\|^{2} 2A2(x2k+1x2k),A1x1k+A2x2kbτA2(x2k+1x2k)2+τ1A1x1k+A2x2kb2

同样代入不等式 (8.6.45) 可以得到
1 τ ρ ∥ e y k ∥ 2 + ρ ∥ A 2 e 2 k ∥ 2 + ( 1 − 1 τ ) ρ ∥ A 1 e 1 k + A 2 e 2 k ∥ 2 − [ 1 τ ρ ∥ e y k + 1 ∥ 2 + ρ ∥ A 2 e 2 k + 1 ∥ 2 + ( 1 − 1 τ ) ρ ∥ A 1 e 1 k + 1 + A 2 e 2 k + 1 ∥ 2 ] ⩾ ( 1 + 1 τ − τ ) ρ ∥ A 1 x 1 k + 1 + A 2 x 2 k + 1 − b ∥ 2 + ( 1 + τ − τ 2 ) ρ ∥ A 2 ( x 2 k + 1 − x 2 k ) ∥ 2 ( 8.6.47 ) \begin{aligned}&\frac{1}{\tau\rho}\|e_{y}^{k}\|^{2}+\rho\|A_{2}e_{2}^{k}\|^{2}+\left(1-\frac{1}{\tau}\right)\rho\|A_{1}e_{1}^{k}+A_{2}e_{2}^{k}\|^{2}-\left[\frac{1}{\tau\rho}\|e_{y}^{k+1}\|^{2}+\rho\|A_{2}e_{2}^{k+1}\|^{2}+\left(1-\frac{1}{\tau}\right)\rho\|A_{1}e_{1}^{k+1}+A_{2}e_{2}^{k+1}\|^{2}\right]\\&\geqslant\left(1+\frac{1}{\tau}-\tau\right)\rho\|A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}-b\|^{2}+(1+\tau-\tau^{2})\rho\|A_{2}(x_{2}^{k+1}-x_{2}^{k})\|^{2}\end{aligned}\qquad(8.6.47) τρ1eyk2+ρA2e2k2+(1τ1)ρA1e1k+A2e2k2[τρ1eyk+12+ρA2e2k+12+(1τ1)ρA1e1k+1+A2e2k+12](1+τ1τ)ρA1x1k+1+A2x2k+1b2+(1+ττ2)ρA2(x2k+1x2k)2(8.6.47)

整合(8.6.46)式和(8.6.47)式即可得到不等式 (8.6.41). 注意,只有当 τ ∈ ( 0 , 1 + 5 2 ) \tau\in\left(0,\dfrac{1+\sqrt{5}}{2}\right) τ(0,21+5 ) 时,(8.6.41) 式中不等号右侧的项才为非负.

引理 8.7 中 (8.6.40) 式直接利用了每个子问题的最优性条件以及 KKT 条件,不等式 (8.6.41) 的直观解释是迭代点误差的某种度量 Φ k \Phi_k Φk单调有界的.

定理 8.16 在假设 8.3 的条件下,进一步假定 A 1 , A 2 A_1,A_2 A1,A2 列满秩. 如果 τ ∈ ( 0 , 1 + 5 2 ) \tau\in\left(0,\dfrac{1+\sqrt{5}}{2}\right) τ(0,21+5 ), 则序列 { ( x 1 k , x 2 k , y k ) } \left\{(x_{1}^{k},x_{2}^{k},y^{k})\right\} {(x1k,x2k,yk)} 收敛到原始问题的一个 KKT 对.

证明
引理 8.7 表明 Φ k \Phi_k Φk 是有界列,根据 Φ k \Phi_k Φk 的定义(8.6.39), 我们有:
Φ k = Ψ k + max ⁡ ( 1 − τ , 1 − τ − 1 ) ρ ∥ A 1 e 1 k + A 2 e 2 k ∥ 2 \Phi_k=\Psi_k+\max(1-\tau,1-\tau^{-1})\rho\|A_1e_1^k+A_2e_2^k\|^2 Φk=Ψk+max(1τ,1τ1)ρA1e1k+A2e2k2

由于 Φ k \Phi_k Φk 是有界的,所以 Ψ k \Psi_k Ψk 也是有界的。再根据 Ψ k \Psi_k Ψk 的定义:
Ψ k = 1 τ ρ ∥ e y k ∥ 2 + ρ ∥ A 2 e 2 k ∥ 2 \Psi_k=\frac{1}{\tau\rho}\|e_y^k\|^2+\rho\|A_2e_2^k\|^2 Ψk=τρ1eyk2+ρA2e2k2

可知
∥ e y k ∥ , ∥ A 2 e 2 k ∥ , ∥ A 1 e 1 k + A 2 e 2 k ∥ \|e_y^k\|,\quad\|A_2e_2^k\|,\quad\|A_1e_1^k+A_2e_2^k\| eyk,A2e2k,A1e1k+A2e2k

均有界. 根据不等式
∥ A 1 e 1 k ∥ ⩽ ∥ A 1 e 1 k + A 2 e 2 k ∥ + ∥ A 2 e 2 k ∥ \|A_1e_1^k\|\leqslant\|A_1e_1^k+A_2e_2^k\|+\|A_2e_2^k\| A1e1kA1e1k+A2e2k+A2e2k

可以进一步推出 { ∥ A 1 e 1 k ∥ } \{\|A_1e_1^k\|\} {A1e1k} 也是有界序列. 注意到 A 1 T A 1 ≻ 0 , A 2 T A 2 ≻ 0 A_1^\mathrm{T}A_1\succ 0,A_2^\mathrm{T}A_2\succ 0 A1TA10,A2TA20, 因此以上有界性也等价于 { ( x 1 k , x 2 k , y k ) } \{(x_1^k,x_2^k,y^k)\} {(x1k,x2k,yk)} 是有界序列.
另一个直接结果就是无穷级数
∑ k = 0 ∞ ∥ A 1 e 1 k + A 2 e 2 k ∥ 2 , ∑ k = 0 ∞ ∥ A 2 ( x 2 k + 1 − x 2 k ) ∥ 2 \sum\limits_{k=0}^{\infty}\|A_{1}e_{1}^{k}+A_{2}e_{2}^{k}\|^{2},\quad \sum\limits_{k=0}^{\infty}\|A_{2}(x_{2}^{k+1}-x_{2}^{k})\|^{2} k=0A1e1k+A2e2k2,k=0A2(x2k+1x2k)2

都是收敛的,这表明
∥ A 1 e 1 k + A 2 e 2 k ∥ = ∥ A 1 x 1 k + A 2 x 2 k − b ∥ → 0 ∥ A 2 ( x 2 k + 1 − x 2 k ) ∥ → 0 ( 8.6.48 ) \begin{aligned}\|A_1e_1^k+A_2e_2^k\|=\|A_1x_1^k+A_2x_2^k-b\|\to 0\\\|A_2(x_2^{k+1}-x_2^k)\|\to 0\end{aligned}\qquad(8.6.48) A1e1k+A2e2k=A1x1k+A2x2kb0A2(x2k+1x2k)0(8.6.48)

下面推导收敛性.
首先证明迭代点子列的收敛性. 由于 { ( x 1 k , x 2 k , y k ) } \{(x_1^k,x_2^k,y^k)\} {(x1k,x2k,yk)} 是有界序列,因此它存在一个收敛子列,设
( x 1 k j , x 2 k j , y k j ) → ( x 1 ∞ , x 2 ∞ , y ∞ ) (x_1^{k_j},x_2^{k_j},y^{k_j})\to(x_1^\infty,x_2^\infty,y^\infty) (x1kj,x2kj,ykj)(x1,x2,y)

利用 (8.6.39) 式中的 u k u^k uk v k v^k vk 的定义以及 (8.6.48) 式,有:
u k + 1 = − A 1 T [ y k + 1 + ( 1 − τ ) ρ ( A 1 e 1 k + 1 + A 2 e 2 k + 1 ) + ρ A 2 ( x 2 k − x 2 k + 1 ) ] v k + 1 = − A 2 T [ y k + 1 + ( 1 − τ ) ρ ( A 1 e 1 k + 1 + A 2 e 2 k + 1 ) ] \begin{aligned}&u^{k+1}=-A_1^\mathrm{T}\left[y^{k+1}+(1-\tau)\rho(A_1e_1^{k+1}+A_2e_2^{k+1})+\rho A_2(x_2^k-x_2^{k+1})\right]\\&v^{k+1}=-A_2^\mathrm{T}\left[y^{k+1}+(1-\tau)\rho(A_1e_1^{k+1}+A_2e_2^{k+1})\right]\end{aligned} uk+1=A1T[yk+1+(1τ)ρ(A1e1k+1+A2e2k+1)+ρA2(x2kx2k+1)]vk+1=A2T[yk+1+(1τ)ρ(A1e1k+1+A2e2k+1)]

k → ∞ k\to\infty k 时,由于 ∥ A 2 ( x 2 k + 1 − x 2 k ) ∥ → 0 \|A_2(x_2^{k+1}-x_2^k)\|\to 0 A2(x2k+1x2k)0, 以及 ∥ A 1 e 1 k + A 2 e 2 k ∥ → 0 \|A_1e_1^k+A_2e_2^k\|\to 0 A1e1k+A2e2k0, 可得 { u k } \{u^k\} {uk} { v k } \{v^k\} {vk} 相应的子列也收敛:
u ∞ = d e f lim ⁡ j → ∞ u k j = − A 1 T y ∞ , v ∞ = lim ⁡ j → ∞ v k j = − A 2 T y ∞ ( 8.6.49 ) u^{\infty}\stackrel{\mathrm{def}}{=}\lim_{j\to\infty}u^{k_{j}}=-A_{1}^{\mathrm{T}}y^{\infty},\quad v^{\infty}=\lim_{j\to\infty}v^{k_{j}}=-A_{2}^{\mathrm{T}}y^{\infty}\qquad(8.6.49) u=defjlimukj=A1Ty,v=jlimvkj=A2Ty(8.6.49)

从 (8.6.40) 式可知对于任意的 k ⩾ 1 k\geqslant 1 k1, 有 u k ∈ ∂ f 1 ( x 1 k ) , v k ∈ ∂ f 2 ( x 2 k ) u^k\in\partial f_1(x_1^k), v^k\in\partial f_2(x_2^k) ukf1(x1k),vkf2(x2k). 利用定理 2.19 中次梯度映射的图像是闭集可知
− A 1 y ∞ ∈ ∂ f 1 ( x 1 ∞ ) , − A 2 y ∞ ∈ ∂ f 2 ( x 2 ∞ ) -A_1y^\infty\in\partial f_1(x_1^\infty),\quad-A_2y^\infty\in\partial f_2(x_2^\infty) A1yf1(x1),A2yf2(x2)

由 (8.6.48) 的第一式可知
lim ⁡ j → ∞ ∥ A 1 x 1 k j + A 2 x 2 k j − b ∥ = ∥ A 1 x 1 ∞ + A 2 x 2 ∞ − b ∥ = 0 \lim\limits_{j\to\infty}\|A_1x_1^{k_j}+A_2x_2^{k_j}-b\|=\|A_1x_1^{\infty}+A_2x_2^{\infty}-b\|=0 jlimA1x1kj+A2x2kjb=A1x1+A2x2b=0

这表明 ( x 1 ∞ , x 2 ∞ , y ∞ ) (x_1^\infty,x_2^\infty,y^\infty) (x1,x2,y) 是原始问题的一个 KKT 对. 因此上述分析中的 ( x 1 ∗ , x 2 ∗ , y ∗ ) (x_1^*,x_2^*,y^*) (x1,x2,y) 均可替换为 ( x 1 ∞ , x 2 ∞ , y ∞ ) (x_1^\infty,x_2^\infty,y^\infty) (x1,x2,y).
为了说明 { ( x 1 k , x 2 k , y k ) } \{(x_1^k,x_2^k,y^k)\} {(x1k,x2k,yk)} 全序列的收敛性,我们注意到 Φ k \Phi_k Φk 是单调下降的,且对子列 { Φ k j } \left\{\Phi_{k_j}\right\} {Φkj}
lim ⁡ j → ∞ Φ k j = lim ⁡ j → ∞ ( 1 τ ρ ∥ e y k j ∥ 2 + ρ ∥ A 2 e 2 k j ∥ 2 + max ⁡   { 1 − τ , 1 − 1 τ } ρ ∥ A 1 e 1 k j + A 2 e 2 k j ∥ 2 ) = 0 \begin{aligned}&\lim_{j\to\infty}\Phi_{k_{j}}\\=&\lim\limits_{j\to\infty}\left(\frac{1}{\tau\rho}\|e_{y}^{k_{j}}\|^{2}+\rho\|A_{2}e_{2}^{k_{j}}\|^{2}+\max\:\left\{1-\tau,1-\frac{1}{\tau}\right\}\rho\|A_{1}e_{1}^{k_{j}}+A_{2}e_{2}^{k_{j}}\|^{2}\right)\\=&0\end{aligned} ==jlimΦkjjlim(τρ1eykj2+ρA2e2kj2+max{1τ,1τ1}ρA1e1kj+A2e2kj2)0

由于单调序列的子列收敛等价于全序列收敛,因此 lim ⁡ k → ∞ Φ k = 0 \lim\limits_{k\to\infty}\Phi_k=0 klimΦk=0, 从而可以立即得到
0 ⩽ lim sup ⁡ k → ∞ 1 τ ρ ∥ e y k ∥ 2 ⩽ lim sup ⁡ k → ∞ Φ k = 0 0 ⩽ lim sup ⁡ k → ∞ ρ ∥ A 2 e 2 k ∥ 2 ⩽ lim sup ⁡ k → ∞ Φ k = 0 0 ⩽ lim sup ⁡ k → ∞ { max ⁡   { 1 − τ , 1 − 1 τ } ρ ∥ A 1 e 1 k + A 2 e 2 k ∥ 2 } ⩽ lim sup ⁡ k → ∞ Φ k = 0 \begin{aligned}&0\leqslant\limsup_{k\to\infty}\frac{1}{\tau\rho}\|e_{y}^{k}\|^{2}\leqslant\limsup_{k\to\infty}\Phi_{k}=0\\&0\leqslant\limsup_{k\to\infty}\rho\|A_{2}e_{2}^{k}\|^{2}\leqslant\limsup_{k\to\infty}\Phi_{k}=0\\&0\leqslant\limsup_{k\to\infty}\left\{\max\:\{1-\tau,1-\frac{1}{\tau}\}\rho\|A_{1}e_{1}^{k}+A_{2}e_{2}^{k}\|^{2}\right\}\leqslant\limsup_{k\to\infty}\Phi_{k}=0\end{aligned} 0klimsupτρ1eyk2klimsupΦk=00klimsupρA2e2k2klimsupΦk=00klimsup{max{1τ,1τ1}ρA1e1k+A2e2k2}klimsupΦk=0

这说明
∥ e y k ∥ → 0 , ∥ A 2 e 2 k ∥ → 0 , ∥ A 1 e 1 k + A 2 e 2 k ∥ → 0 , \|e_y^k\|\to 0,\quad\|A_2e_2^k\|\to 0,\quad\|A_1e_1^k+A_2e_2^k\|\to 0, eyk0,A2e2k0,A1e1k+A2e2k0,

进一步有
0 ⩽ lim sup ⁡ k → ∞ ∥ A 1 e 1 k ∥ ⩽ lim ⁡ k → ∞ ( ∥ A 2 e 2 k ∥ + ∥ A 1 e 1 k + A 2 e 2 k ∥ ) = 0 0\leqslant\limsup\limits_{k\to\infty}\|A_1e_1^k\|\leqslant\lim\limits_{k\to\infty}\left(\|A_2e_2^k\|+\|A_1e_1^k+A_2e_2^k\|\right)=0 0klimsupA1e1kklim(A2e2k+A1e1k+A2e2k)=0

注意到 A 1 T A 1 ≻ 0 , A 2 T A 2 ≻ 0 A_1^\mathrm{T}A_1\succ 0,A_2^\mathrm{T}A_2\succ 0 A1TA10,A2TA20, 所以最终我们得到全序列收敛:
( x 1 k , x 2 k , y k ) → ( x 1 ∞ , x 2 ∞ , y ∞ ) (x_1^k,x_2^k,y^k)\to(x_1^\infty,x_2^\infty,y^\infty) (x1k,x2k,yk)(x1,x2,y)

参考教材《最优化:建模、算法与理论》
  • 21
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值