写在前面
原论文:Harnessing Smoothness to Accelerate Distributed Optimization.
本文是Qu 20181的笔记,主要是对离散系统分布式梯度跟踪算法证明过程的总结。
问题描述和算法
问题描述
min
x
∈
R
d
f
(
x
)
=
1
n
∑
i
=
1
n
f
i
(
x
)
\min_{x\in\mathbb R^d} f(x)=\frac{1}{n}\sum_{i=1}^n f_i(x)
x∈Rdminf(x)=n1i=1∑nfi(x)
假设1:
f
i
f_i
fi是
α
\alpha
α强凸和
β
\beta
β光滑的函数。即
f
i
(
y
)
−
f
i
(
x
)
≥
∇
f
i
(
x
)
T
(
y
−
x
)
+
α
2
∥
y
−
x
∥
2
∥
∇
f
i
(
x
)
−
∇
f
i
(
y
)
∥
≤
β
∥
x
−
y
∥
\begin{aligned} f_i(y)-f_i(x)&\geq \nabla f_i(x)^T(y-x)+\frac{\alpha}{2}\|y-x\|^2\\ \|\nabla f_i(x)-\nabla f_i(y)\|&\leq \beta\|x-y\| \end{aligned}
fi(y)−fi(x)∥∇fi(x)−∇fi(y)∥≥∇fi(x)T(y−x)+2α∥y−x∥2≤β∥x−y∥
离散系统分布式优化算法
x
i
(
t
+
1
)
=
∑
i
=
1
n
w
i
j
x
j
(
t
)
−
η
s
i
(
t
)
s
i
(
t
+
1
)
=
∑
j
=
1
n
w
i
j
s
j
(
t
)
+
∇
f
i
(
x
i
(
t
+
1
)
)
−
∇
f
i
(
x
i
(
t
)
)
(
1
)
\begin{aligned} x_i(t+1)&=\sum_{i=1}^nw_{ij}x_j(t)-\eta s_i(t)\\ s_i(t+1)&=\sum_{j=1}^nw_{ij}s_j(t)+\nabla f_i(x_i(t+1))-\nabla f_i(x_i(t)) \end{aligned}\qquad (1)
xi(t+1)si(t+1)=i=1∑nwijxj(t)−ηsi(t)=j=1∑nwijsj(t)+∇fi(xi(t+1))−∇fi(xi(t))(1)
其中
x
i
(
t
)
∈
R
1
×
d
x_i(t)\in \mathbb R^{1\times d}
xi(t)∈R1×d,
s
i
(
t
)
∈
R
1
×
d
s_i(t)\in \mathbb R^{1\times d}
si(t)∈R1×d写成行向量形式,同时满足
s
i
(
0
)
=
∇
f
i
(
x
i
(
0
)
)
s_i(0)=\nabla f_i(x_i(0))
si(0)=∇fi(xi(0))。可以看出,
s
i
(
t
)
s_i(t)
si(t)用来跟踪梯度的均值,即
1
n
∇
f
i
(
x
i
(
t
)
)
\frac{1}{n}\nabla f_i(x_i(t))
n1∇fi(xi(t))。上述算法称为分布式梯度跟踪法(distributed gradient tracking, DGT)。
收敛性证明
给定
W
W
W为双随机(doubly stochastic)矩阵。算法(1)矩阵形式
x
(
t
+
1
)
=
W
x
(
t
)
−
η
s
(
t
)
s
(
t
+
1
)
=
W
s
(
t
)
+
∇
(
t
+
1
)
−
∇
(
t
)
(
2
)
\begin{aligned} x(t+1)&=W x(t)-\eta s(t)\\ s(t+1)&=W s(t)+\nabla(t+1)-\nabla (t) \end{aligned}\qquad (2)
x(t+1)s(t+1)=Wx(t)−ηs(t)=Ws(t)+∇(t+1)−∇(t)(2)
其中
s
(
0
)
=
∇
(
0
)
s(0)=\nabla (0)
s(0)=∇(0),
∇
∈
R
n
×
d
\nabla\in\mathbb R^{n\times d}
∇∈Rn×d是梯度
∇
f
i
(
x
i
(
t
)
)
\nabla f_i(x_i(t))
∇fi(xi(t))以行向量形式堆叠的矩阵,同样的
x
∈
R
n
×
d
x\in\mathbb R^{n\times d}
x∈Rn×d,
s
∈
R
n
×
d
s\in\mathbb R^{n\times d}
s∈Rn×d也是以行向量形式堆叠的矩阵。
令 x ˉ ( t ) = ( 1 / n ) 1 n T x ( t ) \bar x(t)=(1/n)1_n^T x(t) xˉ(t)=(1/n)1nTx(t), s ˉ ( t ) = ( 1 / n ) 1 n T s ( t ) \bar s(t)=(1/n)1_n^T s(t) sˉ(t)=(1/n)1nTs(t), g ( t ) = ( 1 / n ) 1 n T ∇ ( t ) g(t)=(1/n)1_n^T \nabla(t) g(t)=(1/n)1nT∇(t)。
引理1(Lemma 7):以下等式成立:
- s ˉ ( t + 1 ) = s ˉ ( t ) + g ( t + 1 ) − g ( t ) = g ( t + 1 ) \bar s(t+1)=\bar s(t)+g(t+1)-g(t)=g(t+1) sˉ(t+1)=sˉ(t)+g(t+1)−g(t)=g(t+1)
- x ˉ ( t + 1 ) = x ˉ ( t ) − η s ˉ ( t ) = x ˉ ( t ) − η g ( t ) \bar x(t+1)=\bar x(t)-\eta \bar s(t)=\bar x(t)-\eta g(t) xˉ(t+1)=xˉ(t)−ηsˉ(t)=xˉ(t)−ηg(t)
证明:式(2)中两边同乘 ( 1 / n ) 1 n T (1/n)1_n^T (1/n)1nT可得 g ( t ) = s ˉ ( t ) = ( 1 / n ) 1 n T s ( t ) g(t)=\bar s(t)=(1/n)1_n^T s(t) g(t)=sˉ(t)=(1/n)1nTs(t),因为 s ( 0 ) = ∇ ( 0 ) s(0)=\nabla (0) s(0)=∇(0)。同样的, x ˉ ( t + 1 ) = x ˉ ( t ) − η g ( t ) \bar x(t+1)=\bar x(t)-\eta g(t) xˉ(t+1)=xˉ(t)−ηg(t)。
下面证明收敛性,证明思路是分别证明梯度跟踪误差、一致性误差和最优解误差的收敛性,最后证明目标误差的收敛性。
第一步,先看梯度跟踪误差
∥
s
(
k
)
−
1
n
g
(
k
)
∥
\|s(k)-1_ng(k)\|
∥s(k)−1ng(k)∥,有
s
(
k
)
−
1
n
g
(
k
)
=
[
W
s
(
k
−
1
)
−
1
n
g
(
k
−
1
)
]
+
[
∇
(
k
)
−
∇
(
k
−
1
)
]
−
1
n
[
g
(
k
)
−
g
(
k
−
1
)
]
.
(
3
)
s(k)-1_ng(k)=[W s(k-1)-1_ng(k-1)]+[\nabla(k)-\nabla(k-1)]-1_n[g(k)-g(k-1)].\qquad (3)
s(k)−1ng(k)=[Ws(k−1)−1ng(k−1)]+[∇(k)−∇(k−1)]−1n[g(k)−g(k−1)].(3)
令
σ
∈
(
0
,
1
)
\sigma\in(0,1)
σ∈(0,1)是
W
−
(
1
/
n
)
1
n
1
n
T
W-(1/n)1_n1_n^T
W−(1/n)1n1nT的谱范数(spectral norm)。对任意
ω
∈
R
n
\omega\in \mathbb R^n
ω∈Rn,都有
∥
W
ω
−
(
1
/
n
)
1
n
1
n
T
ω
∥
=
∥
(
W
−
(
1
/
n
)
1
n
1
n
T
)
(
ω
−
(
1
/
n
)
1
n
1
n
T
ω
)
∥
≤
σ
∥
ω
−
(
1
/
n
)
1
n
1
n
T
ω
∥
.
(
4
)
\|W\omega-(1/n)1_n1_n^T \omega\|=\|(W-(1/n)1_n1_n^T)(\omega-(1/n)1_n1_n^T \omega)\|\leq \sigma\|\omega-(1/n)1_n1_n^T \omega\|.\qquad (4)
∥Wω−(1/n)1n1nTω∥=∥(W−(1/n)1n1nT)(ω−(1/n)1n1nTω)∥≤σ∥ω−(1/n)1n1nTω∥.(4)
由式(4)和引理1,式(3)可以取范数得到
∥
s
(
k
)
−
1
n
g
(
k
)
∥
≤
∥
W
s
(
k
−
1
)
−
1
n
g
(
k
−
1
)
∥
+
∥
[
∇
(
k
)
−
∇
(
k
−
1
)
]
−
1
n
[
g
(
k
)
−
g
(
k
−
1
)
]
∥
≤
σ
∥
s
(
k
−
1
)
−
1
n
g
(
k
−
1
)
∥
+
∥
[
∇
(
k
)
−
∇
(
k
−
1
)
]
−
1
n
[
g
(
k
)
−
g
(
k
−
1
)
]
∥
.
(
5
)
\begin{aligned} \|s(k)-1_ng(k)\|&\leq \|Ws(k-1)-1_ng(k-1)\|+\|[\nabla(k)-\nabla(k-1)]-1_n[g(k)-g(k-1)]\|\\ &\leq \sigma\|s(k-1)-1_ng(k-1)\|+\|[\nabla(k)-\nabla(k-1)]-1_n[g(k)-g(k-1)]\|. \end{aligned}\qquad (5)
∥s(k)−1ng(k)∥≤∥Ws(k−1)−1ng(k−1)∥+∥[∇(k)−∇(k−1)]−1n[g(k)−g(k−1)]∥≤σ∥s(k−1)−1ng(k−1)∥+∥[∇(k)−∇(k−1)]−1n[g(k)−g(k−1)]∥.(5)
式(5)后半部分可得
∥
[
∇
(
k
)
−
∇
(
k
−
1
)
]
−
1
n
[
g
(
k
)
−
g
(
k
−
1
)
]
∥
2
=
∥
∇
(
k
)
−
∇
(
k
−
1
)
∥
2
+
∥
(
1
/
n
)
1
n
1
n
T
(
∇
(
k
)
−
∇
(
k
−
1
)
)
∥
2
−
(
∇
(
k
)
−
∇
(
k
−
1
)
)
T
(
1
/
n
)
1
n
1
n
T
(
∇
(
k
)
−
∇
(
k
−
1
)
)
≤
∥
∇
(
k
)
−
∇
(
k
−
1
)
∥
2
.
\begin{aligned} \|[\nabla(k)-\nabla(k-1)]-1_n[g(k)-g(k-1)]\|^2&=\|\nabla(k)-\nabla(k-1)\|^2+\|(1/n)1_n1_n^T(\nabla (k)-\nabla(k-1))\|^2\\ &\quad -(\nabla(k)-\nabla(k-1))^T(1/n)1_n1_n^T(\nabla (k)-\nabla(k-1))\\ &\leq \|\nabla(k)-\nabla(k-1)\|^2. \end{aligned}
∥[∇(k)−∇(k−1)]−1n[g(k)−g(k−1)]∥2=∥∇(k)−∇(k−1)∥2+∥(1/n)1n1nT(∇(k)−∇(k−1))∥2−(∇(k)−∇(k−1))T(1/n)1n1nT(∇(k)−∇(k−1))≤∥∇(k)−∇(k−1)∥2.
代入(3)再结合假设1得到
∥
s
(
k
)
−
1
n
g
(
k
)
∥
≤
σ
∥
s
(
k
−
1
)
−
1
n
g
(
k
−
1
)
∥
+
β
∥
x
(
k
)
−
x
(
k
−
1
)
∥
.
(
6
)
\|s(k)-1_ng(k)\|\leq \sigma\|s(k-1)-1_ng(k-1)\|+\beta\|x(k)-x(k-1)\|.\qquad (6)
∥s(k)−1ng(k)∥≤σ∥s(k−1)−1ng(k−1)∥+β∥x(k)−x(k−1)∥.(6)
第二步,同样的,一致性误差由式(4)可得
∥
x
(
k
)
−
1
n
x
ˉ
(
k
)
∥
=
∥
W
x
(
k
−
1
)
−
η
s
(
k
−
1
)
−
x
ˉ
(
k
−
1
)
+
η
1
n
g
(
k
−
1
)
∥
≤
σ
∥
x
(
k
−
1
)
−
x
ˉ
(
k
−
1
)
∥
+
η
∥
s
(
k
−
1
)
−
1
n
g
(
k
−
1
)
∥
(
7
)
\begin{aligned} \|x(k)-1_n\bar x(k)\|&= \|Wx(k-1)-\eta s(k-1)-\bar x(k-1)+\eta1_n g(k-1)\|\\ &\leq \sigma\|x(k-1)-\bar x(k-1)\|+\eta\|s(k-1)-1_ng(k-1)\| \end{aligned}\qquad (7)
∥x(k)−1nxˉ(k)∥=∥Wx(k−1)−ηs(k−1)−xˉ(k−1)+η1ng(k−1)∥≤σ∥x(k−1)−xˉ(k−1)∥+η∥s(k−1)−1ng(k−1)∥(7)
定义
f
=
1
n
∑
i
=
1
n
f
i
f=\frac{1}{n}\sum_{i=1}^n f_i
f=n1∑i=1nfi在
x
ˉ
(
t
)
\bar x(t)
xˉ(t)的梯度向量为
h
(
t
)
=
∇
f
(
x
ˉ
(
t
)
)
∈
R
n
×
d
h(t)=\nabla f(\bar x(t))\in\mathbb R^{n\times d}
h(t)=∇f(xˉ(t))∈Rn×d。
第三步,再看最优解误差
∥
x
ˉ
−
x
∗
∥
\|\bar x-x^*\|
∥xˉ−x∗∥,得到
x
ˉ
(
k
)
=
x
ˉ
(
k
−
1
)
−
η
h
(
k
−
1
)
−
η
[
g
(
k
−
1
)
−
h
(
k
−
1
)
]
.
(
8
)
\bar x(k)=\bar x(k-1)-\eta h(k-1)-\eta[g(k-1)-h(k-1)].\qquad (8)
xˉ(k)=xˉ(k−1)−ηh(k−1)−η[g(k−1)−h(k−1)].(8)
引理2(Lemma 3.11 2):如果 f : R d → R f:\mathbb R^d\to\mathbb R f:Rd→R满足假设1,那么 ∀ x , y ∈ R d \forall x,y\in\mathbb R^d ∀x,y∈Rd,有
( ∇ f ( x ) − ∇ f ( y ) ) T ( x − y ) ≥ α β α + β ∥ x − y ∥ 2 + 1 α + β ∥ ∇ f ( x ) − ∇ f ( y ) ∥ 2 . (\nabla f(x)-\nabla f(y))^T(x-y)\geq \frac{\alpha\beta}{\alpha+\beta}\|x-y\|^2+\frac{1}{\alpha+\beta}\|\nabla f(x)-\nabla f(y)\|^2. (∇f(x)−∇f(y))T(x−y)≥α+βαβ∥x−y∥2+α+β1∥∇f(x)−∇f(y)∥2.
引理3: ∀ x ∈ R d \forall x\in\mathbb R^d ∀x∈Rd,定义 x + = x − η ∇ f ( x ) x^+=x-\eta \nabla f(x) x+=x−η∇f(x),其中 0 < η < 2 β 0<\eta<\frac{2}{\beta} 0<η<β2且 f f f满足假设1,那么
∥ x + − x ∗ ∥ ≤ λ ∥ x − x ∗ ∥ \|x^+-x^*\|\leq \lambda \|x-x^*\| ∥x+−x∗∥≤λ∥x−x∗∥
其中 λ = max ( ∣ 1 − η α ∣ , ∣ 1 − η β ∣ ) \lambda =\max(|1-\eta \alpha|,|1-\eta \beta|) λ=max(∣1−ηα∣,∣1−ηβ∣)。
证明:如果
0
<
η
≤
2
α
+
β
0<\eta\leq \frac{2}{\alpha+\beta}
0<η≤α+β2且
α
<
β
\alpha< \beta
α<β,那么
2
η
−
α
≥
β
\frac{2}{\eta}-\alpha\geq\beta
η2−α≥β且
∣
1
−
η
α
∣
≥
∣
1
−
η
β
∣
|1-\eta\alpha|\geq |1-\eta\beta|
∣1−ηα∣≥∣1−ηβ∣。可知
f
f
f也是
2
η
−
α
\frac{2}{\eta}-\alpha
η2−α光滑的函数,我们有
∥
x
−
x
∗
−
η
∇
f
(
x
)
∥
2
=
∥
x
−
x
∗
∥
2
+
η
2
∥
∇
f
(
x
)
∥
2
−
2
η
∇
f
(
x
)
T
(
x
−
x
∗
)
≤
(
1
−
2
η
α
(
2
η
−
α
)
α
+
(
2
η
−
α
)
)
∥
x
−
x
∗
∥
2
+
(
η
2
−
2
η
α
+
(
2
η
−
α
)
)
∥
∇
f
(
x
)
∥
2
=
(
1
−
η
α
)
2
∥
x
−
x
∗
∥
2
=
λ
2
∥
x
−
x
∗
∥
2
\begin{aligned} \|x-x^*-\eta \nabla f(x)\|^2&=\|x-x^*\|^2+\eta^2\|\nabla f(x)\|^2-2\eta \nabla f(x)^T(x-x^*)\\ &\leq \left(1-2\eta\frac{\alpha(\frac{2}{\eta}-\alpha)}{\alpha+(\frac{2}{\eta}-\alpha)}\right)\|x-x^*\|^2+\left(\eta^2-\frac{2\eta}{\alpha+(\frac{2}{\eta}-\alpha)}\right)\|\nabla f(x)\|^2\\ &=(1-\eta \alpha)^2\|x-x^*\|^2\\ &=\lambda^2\|x-x^*\|^2 \end{aligned}
∥x−x∗−η∇f(x)∥2=∥x−x∗∥2+η2∥∇f(x)∥2−2η∇f(x)T(x−x∗)≤(1−2ηα+(η2−α)α(η2−α))∥x−x∗∥2+(η2−α+(η2−α)2η)∥∇f(x)∥2=(1−ηα)2∥x−x∗∥2=λ2∥x−x∗∥2
如果
2
α
+
β
<
η
<
2
β
\frac{2}{\alpha+\beta}<\eta<\frac{2}{\beta}
α+β2<η<β2且
α
≥
β
\alpha\geq\beta
α≥β,同样可以进行上面的分析。
由引理3和式(8)可得
∥
x
ˉ
(
k
)
−
x
∗
∥
≤
λ
∥
x
ˉ
(
k
−
1
)
−
x
∗
∥
+
η
∥
g
(
k
−
1
)
−
h
(
k
−
1
)
∥
≤
λ
∥
x
ˉ
(
k
−
1
)
−
x
∗
∥
+
(
η
β
/
n
)
∥
x
(
k
−
1
)
−
1
n
x
ˉ
(
k
−
1
)
∥
(
9
)
\begin{aligned} \|\bar x(k)-x^*\|&\leq \lambda \|\bar x(k-1)-x^*\|+\eta \|g(k-1)-h(k-1)\|\\ &\leq \lambda \|\bar x(k-1)-x^*\|+(\eta\beta/\sqrt{n}) \|x(k-1)-1_n\bar x(k-1)\| \end{aligned}\qquad(9)
∥xˉ(k)−x∗∥≤λ∥xˉ(k−1)−x∗∥+η∥g(k−1)−h(k−1)∥≤λ∥xˉ(k−1)−x∗∥+(ηβ/n)∥x(k−1)−1nxˉ(k−1)∥(9)
其中
λ
=
max
(
∣
1
−
η
α
∣
,
∣
1
−
η
β
∣
)
\lambda =\max(|1-\eta \alpha|,|1-\eta \beta|)
λ=max(∣1−ηα∣,∣1−ηβ∣)。
第四步,看
∥
x
(
k
)
−
x
(
k
−
1
)
∥
\|x(k)-x(k-1)\|
∥x(k)−x(k−1)∥的有界性,由假设1得到
∥
h
(
k
−
1
)
∥
=
∥
∇
f
(
x
ˉ
(
k
−
1
)
)
∥
≤
β
∥
x
ˉ
(
k
−
1
)
−
x
∗
∥
.
\|h(k-1) \|=\|\nabla f(\bar x(k-1)) \|\leq \beta\|\bar x(k-1)-x^* \|.
∥h(k−1)∥=∥∇f(xˉ(k−1))∥≤β∥xˉ(k−1)−x∗∥.
结合式(9)的方法
∥
s
(
k
−
1
)
∥
≤
∥
s
(
k
−
1
)
−
1
n
g
(
k
−
1
)
∥
+
∥
1
n
g
(
k
−
1
)
−
1
n
h
(
k
−
1
)
∥
+
∥
1
n
h
(
k
−
1
)
∥
≤
∥
s
(
k
−
1
)
−
1
n
g
(
k
−
1
)
∥
+
β
∥
x
(
k
−
1
)
−
1
n
x
ˉ
(
k
−
1
)
∥
+
β
n
∥
x
ˉ
(
k
−
1
)
−
x
∗
∥
.
\begin{aligned} \|s(k-1)\|&\leq \|s(k-1)-1_ng(k-1) \|+\|1_ng(k-1)-1_n h(k-1) \|+\|1_n h(k-1)\|\\ &\leq \|s(k-1)-1_ng(k-1) \|+\beta\|x(k-1)-1_n \bar x(k-1) \|+\beta \sqrt{n}\|\bar x(k-1)-x^*\|. \end{aligned}
∥s(k−1)∥≤∥s(k−1)−1ng(k−1)∥+∥1ng(k−1)−1nh(k−1)∥+∥1nh(k−1)∥≤∥s(k−1)−1ng(k−1)∥+β∥x(k−1)−1nxˉ(k−1)∥+βn∥xˉ(k−1)−x∗∥.
因此
∥
x
(
k
)
−
x
(
k
−
1
)
∥
=
∥
W
x
(
k
−
1
)
−
x
(
k
−
1
)
−
η
s
(
k
−
1
)
∥
=
∥
(
W
−
I
)
(
x
(
k
−
1
)
−
1
n
x
ˉ
(
k
−
1
)
)
−
η
s
(
k
−
1
)
∥
≤
2
∥
(
x
(
k
−
1
)
−
1
n
x
ˉ
(
k
−
1
)
)
∥
+
η
∥
s
(
k
−
1
)
∥
≤
η
∥
s
(
k
−
1
)
−
1
n
g
(
k
−
1
)
∥
+
(
η
β
+
2
)
∥
x
(
k
−
1
)
−
1
n
x
ˉ
(
k
−
1
)
∥
+
η
β
n
∥
x
ˉ
(
k
−
1
)
−
x
∗
∥
.
\begin{aligned} \|x(k)-x(k-1) \|&=\|Wx(k-1)-x(k-1)-\eta s(k-1) \|\\ &=\|(W-I)(x(k-1)-1_n\bar x(k-1))-\eta s(k-1) \|\\ &\leq 2\|(x(k-1)-1_n\bar x(k-1))\|+\eta\|s(k-1) \|\\ &\leq \eta\|s(k-1)-1_ng(k-1) \|\\ &\quad+(\eta\beta+2)\|x(k-1)-1_n \bar x(k-1) \|+\eta\beta \sqrt{n}\|\bar x(k-1)-x^*\|. \end{aligned}
∥x(k)−x(k−1)∥=∥Wx(k−1)−x(k−1)−ηs(k−1)∥=∥(W−I)(x(k−1)−1nxˉ(k−1))−ηs(k−1)∥≤2∥(x(k−1)−1nxˉ(k−1))∥+η∥s(k−1)∥≤η∥s(k−1)−1ng(k−1)∥+(ηβ+2)∥x(k−1)−1nxˉ(k−1)∥+ηβn∥xˉ(k−1)−x∗∥.
将上式代入(6)得到
∥
s
(
k
)
−
1
n
g
(
k
)
∥
≤
(
σ
+
β
η
)
∥
s
(
k
−
1
)
−
1
n
g
(
k
−
1
)
∥
+
β
(
η
β
+
2
)
∥
x
(
k
−
1
)
−
1
n
x
ˉ
(
k
−
1
)
∥
+
η
β
2
n
∥
x
ˉ
(
k
−
1
)
−
x
∗
∥
(
10
)
\begin{aligned} \|s(k)-1_n g(k)\|&\leq (\sigma+\beta \eta)\|s(k-1)-1_n g(k-1) \|\\ &\quad +\beta(\eta \beta+2)\|x(k-1)-1_n \bar x(k-1) \|+\eta\beta^2 \sqrt{n}\|\bar x(k-1)-x^*\| \end{aligned}\qquad (10)
∥s(k)−1ng(k)∥≤(σ+βη)∥s(k−1)−1ng(k−1)∥+β(ηβ+2)∥x(k−1)−1nxˉ(k−1)∥+ηβ2n∥xˉ(k−1)−x∗∥(10)
结合(7)(9)和(10)得到
因为
z
(
k
)
z(k)
z(k)和
G
(
η
)
G(\eta)
G(η)非负,可以直接迭代展开得到
z
(
k
)
≤
G
(
η
)
k
z
(
0
)
.
z(k)\leq G(\eta)^kz(0).
z(k)≤G(η)kz(0).
因为
G
(
η
)
G(\eta)
G(η)非负,
G
(
η
)
2
G(\eta)^2
G(η)2为正,所以
G
(
η
)
k
G(\eta)^k
G(η)k每一项都是
O
(
ρ
(
G
(
η
)
)
k
)
O(\rho(G(\eta))^k)
O(ρ(G(η))k)3。
z
(
h
)
z(h)
z(h)每一项以
ρ
(
G
(
η
)
)
k
\rho(G(\eta))^k
ρ(G(η))k速度收敛。
Qu, G., & Li, N. (2018). Harnessing smoothness to accelerate distributed optimization. IEEE Transactions on Control of Network Systems, 5(3), 1245–1260. https://doi.org/10.1109/TCNS.2017.2698261 ↩︎
Bubeck, S. (2015). Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 8(3–4), 231–357. https://doi.org/10.1561/2200000050. ↩︎
R. A. Horn and C. R. Johnson. (2012). Matrix Analysis. Cambridge. U.K.: Cam- bridge Univ. Press. ↩︎