VAMP由浅入深(Part-3:状态演进分析数学基础(续))

考虑一般的收敛结论

对任意维度 N N N,给定一个正交阵 V ∈ R N × N \boldsymbol V \in \mathbb R^{N \times N} VRN×N,以及一个初始向量 u 0 ∈ R N \boldsymbol u_0 \in \mathbb R^N u0RN,同时给定两个“干扰”向量(所谓“干扰”向量,不含贬义,这里确实不太知道该怎么翻译比较好):
w p = ( w 1 p , … , w n p ) ,   w q = ( w 1 q , … , w n q ) \boldsymbol w_p = (w^p_1,\ldots,w^p_n), \ \boldsymbol w^q=(w^q_1, \ldots, w^q_n) wp=(w1p,,wnp), wq=(w1q,,wnq)
其中 w n p ∈ R n p w^p_n \in \mathbb R^{n_p} wnpRnp w n q ∈ R n q w^q_n \in \mathbb R^{n_q} wnqRnq。构造以下迭代公式:
p k = V u k α 1 k = < f p ′ ( p k , w p , γ 1 k ) > ,   γ 2 k = Γ 1 ( γ 1 k , α 1 k ) v k = C 1 ( α 1 k ) [ f p ( p k , w p , γ 1 k ) − α 1 k p k ] q k = V T v k α 2 k = < f q ′ ( q k , w q , γ 2 k ) > ,   γ 1 , k + 1 = Γ 2 ( γ 2 k , α 2 k ) u k + 1 = C 2 ( α 2 k ) [ f q ( q k , w q , γ 2 k ) − α 2 k q k ] (91) \begin{aligned} \boldsymbol p_k &= \boldsymbol V \boldsymbol u_k \\ \alpha_{1k} &= <\pmb f^{\prime}_p(\boldsymbol p_k, \boldsymbol w_p,\gamma_{1k})>, \ \gamma_{2k}=\Gamma_1(\gamma_{1k}, \alpha_{1k}) \\ \boldsymbol v_k &= C_1(\alpha_{1k}) \left[ \pmb f_p(\boldsymbol p_k, \boldsymbol w^p, \gamma_{1k}) - \alpha_{1k } \boldsymbol p_k \right] \\ \boldsymbol q_k &= \boldsymbol V^T \boldsymbol v_k \\ \alpha_{2k} &= <\pmb f^{\prime}_q(\boldsymbol q_k, \boldsymbol w_q,\gamma_{2k})>, \ \gamma_{1,k+1}= \Gamma_2(\gamma_{2k}, \alpha_{2k}) \\ \boldsymbol u_{k+1} &= C_2(\alpha_{2k}) \left [ \pmb f^{}_q(\boldsymbol q_k, \boldsymbol w_q,\gamma_{2k}) -\alpha_{2k} \boldsymbol q_k \right] \tag{91} \end{aligned} pkα1kvkqkα2kuk+1=Vuk=<fffp(pk,wp,γ1k)>, γ2k=Γ1(γ1k,α1k)=C1(α1k)[fffp(pk,wp,γ1k)α1kpk]=VTvk=<fffq(qk,wq,γ2k)>, γ1,k+1=Γ2(γ2k,α2k)=C2(α2k)[fffq(qk,wq,γ2k)α2kqk](91)

向量 u 0 \boldsymbol u_0 u0和标量 γ 10 \gamma_{10} γ10需要初始化得到,函数 f p , f q \pmb f_p, \pmb f_q fffp,fffq是元素可分的,即
[ f p ( p , w p , γ 1 ) ] n = f p ( p n , w n p , γ 1 ) [ f q ( q , w q , γ 1 ) ] n = f q ( q n , w n q , γ 1 ) (92) \left [ \pmb f_p(\boldsymbol p, \boldsymbol w^p, \gamma_{1}) \right ]_n = f_p( p_n, w_n^p, \gamma_{1}) \\ \left [ \pmb f_q(\boldsymbol q, \boldsymbol w^q, \gamma_{1}) \right ]_n = f_q( q_n, w_n^q, \gamma_{1}) \tag{92} [fffp(p,wp,γ1)]n=fp(pn,wnp,γ1)[fffq(q,wq,γ1)]n=fq(qn,wnq,γ1)(92)

我们假设 u 0 , w p , w q \boldsymbol u_0, \boldsymbol w^p, \boldsymbol w^q u0,wp,wq都是确定的序列,其块成分经验收敛于
lim ⁡ N → ∞ { u 0 n } = P L ( 2 ) U 0 (93) \lim_{N \rightarrow \infty} \{ u_{0n} \} \overset{PL(2)}{=} U_0 \tag{93} Nlim{u0n}=PL(2)U0(93)

lim ⁡ N → ∞ { w n p } = P L ( 2 ) W p ,   lim ⁡ N → ∞ { w n q } = P L ( 2 ) W q (94) \lim_{N \rightarrow \infty} \{ w^p_n \} \overset{PL(2)}{=} W^p , \ \lim_{N \rightarrow \infty} \{ w^q_n \} \overset{PL(2)}{=} W^q \tag{94} Nlim{wnp}=PL(2)Wp, Nlim{wnq}=PL(2)Wq(94)

此外,我们假设初始化的常数 γ 10 \gamma_{10} γ10收敛为
lim ⁡ N → ∞ γ 10 = γ ˉ 10 (95) \lim_{N \rightarrow \infty} \gamma_{10} = \bar {\gamma}_{10} \tag{95} Nlimγ10=γˉ10(95)

假设矩阵 V ∈ R N × N \boldsymbol V \in \mathbb R^{N \times N} VRN×N均匀分布在正交阵集合中,且独立于 u 0 , w p , w q \boldsymbol u_0, \boldsymbol w^p, \boldsymbol w^q u0,wp,wq。而且 u 0 , w p , w q \boldsymbol u_0, \boldsymbol w^p, \boldsymbol w^q u0,wp,wq都是确定的,唯一的随机项只剩矩阵 V \boldsymbol V V
在上述这些假设下,定义状态演进方程:
α ˉ 1 k = E [ f p ′ ( P k , W p , γ ˉ 1 k ) ] τ 2 k = C 1 2 ( α ˉ 1 k ) { E [ f p 2 ( P k , W p , γ ˉ 1 k ) ] − α ˉ 1 k 2 τ 1 k } γ ˉ 2 k = Γ 1 ( γ ˉ 1 k , α ˉ 1 k ) α ˉ 2 k = E [ f q ′ ( Q k , W q , γ ˉ 2 k ) ] τ 1 , k + 1 = C 2 2 ( α ˉ 2 k ) { E [ f q 2 ( Q k , W q , γ ˉ 2 k ) ] − α ˉ 2 k 2 τ 2 k } γ 1 , k + 1 = Γ 2 ( γ ˉ 2 k , α ˉ 2 k ) (96) \begin{aligned} \bar {\alpha}_{1k} &= \mathbb E \left [ f^{\prime}_p(P_k, W^p, \bar {\gamma}_{1k}) \right ] \\ \tau_{2k} &= C^2_1(\bar {\alpha}_{1k}) \{ \mathbb E \left [ f^{2}_p(P_k, W^p, \bar {\gamma}_{1k}) \right ] - \bar {\alpha}^2_{1k} \tau_{1k} \} \\ \bar {\gamma}_{2k} &= \Gamma_1( \bar {\gamma}_{1k}, \bar {\alpha}_{1k} ) \\ \bar {\alpha}_{2k} &= \mathbb E \left [ f^{\prime}_q(Q_k, W^q, \bar {\gamma}_{2k}) \right ] \\ \tau_{1,k+1} &= C^2_2(\bar {\alpha}_{2k})\{ \mathbb E \left [ f^{2}_q(Q_k, W^q, \bar {\gamma}_{2k}) \right ] -\bar {\alpha}^2_{2k} \tau_{2k} \} \\ \gamma_{1,k+1} &= \Gamma_2(\bar \gamma_{2k}, \bar \alpha_{2k}) \end{aligned} \tag{96} αˉ1kτ2kγˉ2kαˉ2kτ1,k+1γ1,k+1=E[fp(Pk,Wp,γˉ1k)]=C12(αˉ1k){E[fp2(Pk,Wp,γˉ1k)]αˉ1k2τ1k}=Γ1(γˉ1k,αˉ1k)=E[fq(Qk,Wq,γˉ2k)]=C22(αˉ2k){E[fq2(Qk,Wq,γˉ2k)]αˉ2k2τ2k}=Γ2(γˉ2k,αˉ2k)(96)

其中所求的期望是关于随机变量
P k ∼ N ( 0 , τ 1 k ) ,   Q k ∼ N ( 0 , τ 2 k ) P_k \sim \mathcal N(0, \tau_{1k}), \ Q_k \sim \mathcal N(0, \tau_{2k}) PkN(0,τ1k), QkN(0,τ2k)

定理4:考虑式(91)中的迭代式和式(96)中的状态演进方程。另外,对每一次迭代 k k k,假设以下三个条件成立:
(1)当 i = 1 , 2 i=1,2 i=1,2时,函数
C i ( α i ) ,   Γ i ( γ i , α i ) C_i(\alpha_i), \ \Gamma_i(\gamma_i, \alpha_i) Ci(αi), Γi(γi,αi)

在点 ( γ i , α i ) = ( γ ˉ i k , α ˉ i k ) (\gamma_i, \alpha_i)=(\bar \gamma_{ik}, \bar \alpha_{ik}) (γi,αi)=(γˉik,αˉik)(由状态演进得到)处连续;
(2)当 γ 1 = γ ˉ 1 k \gamma_1=\bar \gamma_{1k} γ1=γˉ1k时,函数 f p ( p , w p , γ 1 ) f^{}_p(p, w^p, {\gamma}_{1}) fp(p,wp,γ1)及其一阶导 f p ′ ( p , w p , γ 1 ) f^{\prime}_p(p, w^p, {\gamma}_{1}) fp(p,wp,γ1) ( p , w p ) (p,w_p) (p,wp)处均匀Lipschitz连续(uniformly Lipschitz continuous);
(3)当 γ 2 = γ ˉ 2 k \gamma_2=\bar \gamma_{2k} γ2=γˉ2k时,函数 f q ( q , w q , γ 2 ) f^{}_q(q, w^q, {\gamma}_{2}) fq(q,wq,γ2)及其一阶导 f q ′ ( q , w q , γ 1 ) f^{\prime}_q(q, w^q, {\gamma}_{1}) fq(q,wq,γ1) ( q , w q ) (q,w_q) (q,wq)处均匀Lipschitz连续(uniformly Lipschitz continuous);
那么有如下结论:
(a)对任意给定的 k k k ( w p , p 0 , … , p k ) (\boldsymbol w^p, \boldsymbol p_0, \ldots, \boldsymbol p_k ) (wp,p0,,pk)的块成分几乎经验收敛为
lim ⁡ N → ∞ { ( w n p , p 0 n , … , p k n ) } = P L ( 2 ) ( W p , P 0 , … , P k ) (97) \lim_{N \rightarrow \infty} \{ (w^p_n,p_{0n},\ldots,p_{kn}) \} \overset{PL(2)}{=} (W^p,P_0,\ldots,P_k) \tag{97} Nlim{(wnp,p0n,,pkn)}=PL(2)(Wp,P0,,Pk)(97)

其中 W p W^p Wp是式(94)极限条件下的随机变量, ( P 0 , … , P k ) (P_0,\ldots,P_k) (P0,,Pk)是一个零均值的高斯随机向量,独立于 W p W^p Wp,且 E [ P k 2 ] = τ 1 k \mathbb E[P^2_k]=\tau_{1k} E[Pk2]=τ1k,此外我们有
lim ⁡ N → ∞ ( α 1 k , γ 1 k ) = ( α ˉ 1 k , γ ˉ 1 k ) (98) \lim_{N \rightarrow \infty} (\alpha_{1k}, \gamma_{1k}) = (\bar \alpha_{1k}, \bar \gamma_{1k}) \tag{98} Nlim(α1k,γ1k)=(αˉ1k,γˉ1k)(98)

(b)对任意给定的 k k k ( w q , q 0 , … , q k ) (\boldsymbol w^q, \boldsymbol q_0, \ldots, \boldsymbol q_k ) (wq,q0,,qk)的块成分几乎经验收敛为
lim ⁡ N → ∞ { ( w n q , q 0 n , … , q k n ) } = P L ( 2 ) ( W q , Q 0 , … , Q k ) (99) \lim_{N \rightarrow \infty} \{ (w^q_n,q_{0n},\ldots,q_{kn}) \} \overset{PL(2)}{=} (W^q,Q_0,\ldots,Q_k) \tag{99} Nlim{(wnq,q0n,,qkn)}=PL(2)(Wq,Q0,,Qk)(99)

其中 W q W^q Wq是式(94)极限条件下的随机变量, ( Q 0 , … , Q k ) (Q_0,\ldots,Q_k) (Q0,,Qk)是一个零均值的高斯随机向量,独立于 W q W^q Wq,且 E [ Q k 2 ] = τ 2 k \mathbb E[Q^2_k]=\tau_{2k} E[Qk2]=τ2k,此外我们有
lim ⁡ N → ∞ ( α 2 k , γ 2 k ) = ( α ˉ 2 k , γ ˉ 2 k ) (100) \lim_{N \rightarrow \infty} (\alpha_{2k}, \gamma_{2k}) = (\bar \alpha_{2k}, \bar \gamma_{2k}) \tag{100} Nlim(α2k,γ2k)=(αˉ2k,γˉ2k)(100)

对定理4的证明

证明方法

采用数学归纳法。给定迭代次数 k , l ≥ 0 k,l \geq 0 k,l0,定义假设条件 H k , l H_{k,l} Hk,l
定理(4)中的(a)到 k k k是成立的;
定理(4)中的(b)到 l l l是成立的。
数学归纳法的证明将考虑以下三个部分:
(1) H 0 , − 1 H_{0,-1} H0,1正确
(2)若 H k , k − 1 H_{k,k-1} Hk,k1正确,则 H k , k H_{k,k} Hk,k正确
(3)若 H k , k H_{k,k} Hk,k正确,则 H k + 1 , k H_{k+1,k} Hk+1,k正确

对初始条件的确认

要说明 H 0 , − 1 H_{0,-1} H0,1是正确的,就是要说明式(97)和(98)在 k = 0 k=0 k=0时成立。根据引理5,对任意维度 N N N,令 U = I N \boldsymbol U= \boldsymbol I_N U=IN,令 x = p 0 \boldsymbol x = \boldsymbol p_0 x=p0。因为 p 0 = V u 0 \boldsymbol p_0 = \boldsymbol V \boldsymbol u_0 p0=Vu0 ( V \boldsymbol V V服从Haar分布),则
lim ⁡ N → ∞ ∥ p 0 ∥ 2 = lim ⁡ N → ∞ ∥ u 0 ∥ 2 = E [ U 0 2 ] = τ 10 \lim_{N \rightarrow \infty} {\Vert \boldsymbol p_0 \Vert}^2 = \lim_{N \rightarrow \infty} {\Vert \boldsymbol u_0 \Vert}^2 = \mathbb E[U^2_0]=\tau_{10} Nlimp02=Nlimu02=E[U02]=τ10

又因为 p 0 = U p 0 \boldsymbol p_0 = \boldsymbol U \boldsymbol p_0 p0=Up0,根据引理5, p 0 \boldsymbol p_0 p0的块成分经验收敛为:
lim ⁡ N → ∞ { p 0 n } = P L ( 2 ) P 0 ∼ N ( 0 , τ 10 ) \lim_{N \rightarrow \infty} \{p_{0n}\} \overset{PL(2)}{=} P_0 \sim \mathcal N(0, \tau_{10}) Nlim{p0n}=PL(2)P0N(0,τ10)

结合式(94),
lim ⁡ N → ∞ { w n p , p 0 n } = P L ( 2 ) ( W p , P 0 ) \lim_{N \rightarrow \infty} \{ w^p_n, p_{0n} \} \overset {PL(2)}{=} (W^p, P_0) Nlim{wnp,p0n}=PL(2)(Wp,P0)

其中 W p W^p Wp独立于 P 0 P_0 P0,这说明了式(97)在 k = 0 k=0 k=0时成立。
在式(95)中,我们假设 lim ⁡ N → ∞ γ 10 → γ ˉ 10 \lim_{N \rightarrow \infty} \gamma_{10} \rightarrow \bar \gamma_{10} limNγ10γˉ10,此外,当 γ 1 = γ ˉ 10 \gamma_1 = \bar \gamma_{10} γ1=γˉ10时, f p ′ ( p , w p , γ 1 ) f^{\prime}_p(p,w^p,\gamma_1) fp(p,wp,γ1)在点 ( p , w p ) (p,w^p) (p,wp)处满足均匀Lipschitz连续,因此
α 1 k = < f p ′ ( p k , w p , γ 1 k ) > → E [ f p ′ ( P k , W p , γ ˉ 1 k ) ] = α ˉ 1 k \alpha_{1k} = <\pmb f^{\prime}_p(\boldsymbol p_k, \boldsymbol w_p,\gamma_{1k})> \rightarrow \mathbb E \left [ f^{\prime}_p(P_k, W^p, \bar {\gamma}_{1k}) \right ]=\bar \alpha_{1k} α1k=<fffp(pk,wp,γ1k)>E[fp(Pk,Wp,γˉ1k)]=αˉ1k

这说明了式(98)在 k = 0 k=0 k=0时成立。

数学归纳的推导

该部分要证明 H k , k − 1 ⟹ H k , k H_{k,k-1} \Longrightarrow H_{k,k} Hk,k1Hk,k H k , k − 1 ⟹ H k , k H_{k,k-1} \Longrightarrow H_{k,k} Hk,k1Hk,k的证明类似)。因此我们固定 k k k,并假设 H k , k − 1 H_{k,k-1} Hk,k1成立。
因为 Γ 1 ( γ i , α i ) \Gamma_1(\gamma_i,\alpha_i) Γ1(γi,αi)在点 ( γ ˉ 1 k , γ ˉ 1 k ) (\bar \gamma_{1k}, \bar \gamma_{1k}) (γˉ1k,γˉ1k)连续,结合式(98)和式(96)中的 γ ˉ 2 k = Γ 1 ( γ ˉ 1 k , α ˉ 1 k ) \bar {\gamma}_{2k} = \Gamma_1( \bar {\gamma}_{1k}, \bar {\alpha}_{1k} ) γˉ2k=Γ1(γˉ1k,αˉ1k)可得
lim ⁡ N → ∞ γ 2 k = lim ⁡ N → ∞ Γ 1 ( γ 1 k , α 1 k ) = γ ˉ 2 k \lim_{N \rightarrow \infty} \gamma_{2k} = \lim_{N \rightarrow \infty} \Gamma_1(\gamma_{1k}, \alpha_{1k})= \bar \gamma_{2k} Nlimγ2k=NlimΓ1(γ1k,α1k)=γˉ2k

另外,我们知道, ∀ l ∈ { 0 , … , k } \forall l \in \{0, \ldots, k \} l{0,,k} ( w p , p l ) (\boldsymbol w^p, \boldsymbol p_l) (wp,pl)的块成分几乎经验收敛为
lim ⁡ N → ∞ { ( w n p , p l n ) } = P L ( 2 ) ( W p , P l ) \lim_{N \rightarrow \infty} \{ (w^p_n,p_{ln}) \} \overset{PL(2)}{=} (W^p,P_l) Nlim{(wnp,pln)}=PL(2)(Wp,Pl)

其中 P l ∼ N ( 0 , τ 1 l ) P_l \sim \mathcal N(0,\tau_{1l}) PlN(0,τ1l) τ 1 l \tau_{1l} τ1l由状态演进方程确定。又因为 f P ( ⋅ ) f_P(\cdot) fP()为Lipschitz连续函数,且 C 1 ( α 1 l ) C_1(\alpha_{1l}) C1(α1l)在点 α ˉ 1 l \bar \alpha_{1l} αˉ1l处连续,因此
lim ⁡ N → ∞ { ( w n p , p l n , v l n ) } = P L ( 2 ) ( W p , P l , V l ) \lim_{N \rightarrow \infty} \{ (w^p_n,p_{ln}, v_{ln}) \} \overset{PL(2)}{=} (W^p,P_l, V_l) Nlim{(wnp,pln,vln)}=PL(2)(Wp,Pl,Vl)

其中 V l V_l Vl是随机变量,
V l = g p ( P l , W p , γ ˉ 1 l , α ˉ 1 l ) (101) V_l = \mathrm{g}_p(P_l, W_p, \bar \gamma_{1l}, \bar \alpha_{1l}) \tag{101} Vl=gp(Pl,Wp,γˉ1l,αˉ1l)(101)

其中,
g p ( p , w p , γ 1 , α 1 ) ≔ C 1 ( α 1 ) [ f p ( p , w p , γ 1 ) − α 1 p ] (102) \mathrm g_p(p,w^p,\gamma_1,\alpha_1) \coloneqq C_1(\alpha_1) \left [ f_p(p, w^p,\gamma_1) - \alpha_1 p \right] \tag{102} gp(p,wp,γ1,α1):=C1(α1)[fp(p,wp,γ1)α1p](102)

类似地,我们有,

lim ⁡ N → ∞ { ( w n q , q l n , v l n ) } = P L ( 2 ) ( W q , q l , V l ) \lim_{N \rightarrow \infty} \{ (w^q_n,q_{ln}, v_{ln}) \} \overset{PL(2)}{=} (W^q,q_l, V_l) Nlim{(wnq,qln,vln)}=PL(2)(Wq,ql,Vl)

其中 V l V_l Vl是随机变量,
U l = g q ( Q l , W q , γ ˉ 2 l , α ˉ 2 l ) (103) U_l = \mathrm{g}_q(Q_l, W_q, \bar \gamma_{2l}, \bar \alpha_{2l}) \tag{103} Ul=gq(Ql,Wq,γˉ2l,αˉ2l)(103)

其中,
g q ( q , w q , γ 1 , α 1 ) ≔ C 2 ( α 2 ) [ f q ( q , w q , γ 2 ) − α 2 q ] (104) \mathrm g_q(q,w^q,\gamma_1,\alpha_1) \coloneqq C_2(\alpha_2) \left [ f_q(q, w^q,\gamma_2) - \alpha_2 q \right] \tag{104} gq(q,wq,γ1,α1):=C2(α2)[fq(q,wq,γ2)α2q](104)

定义
U ≔ [ u 0 , … , u k ] ∈ R N × ( k + 1 ) \boldsymbol U \coloneqq [\boldsymbol u_0, \ldots, \boldsymbol u_k] \in \mathbb R^{N \times (k+1)} U:=[u0,,uk]RN×(k+1)

V k , P k , Q k \boldsymbol V_k, \boldsymbol P_k, \boldsymbol Q_k Vk,Pk,Qk也做类似定义,让 G k G_k Gk表示这些矩阵的元组( G k G_k Gk可简单理解为 k k k次迭代为止的事件)
G k ≔ { U k , P k , V k , Q k − 1 } (105) G_k \coloneqq \{ \boldsymbol U_k, \boldsymbol P_k, \boldsymbol V_k, \boldsymbol Q_{k-1}\} \tag{105} Gk:={Uk,Pk,Vk,Qk1}(105)

此外,我们还把 G k G_k Gk理解为是这些变量的 σ − algebra \sigma-\text{algebra} σalgebra(理解为所有可能集合的一组排列即可)。集合(105)包含了算法(91)在第 k k k次迭代 q k = V T v k \boldsymbol q_k = \boldsymbol V^T \boldsymbol v_k qk=VTvk之前的所有输出。
定义
A k ≔ [ P k   V k − 1 ] ,   B k ≔ [ U k , Q k − 1 ] (106) \boldsymbol A_k \coloneqq [\boldsymbol P_k \ \boldsymbol V_{k-1}], \ \boldsymbol B_k \coloneqq [\boldsymbol U_k, \boldsymbol Q_{k-1}] \tag{106} Ak:=[Pk Vk1], Bk:=[Uk,Qk1](106)

则,根据式(91),有
A k = V B k (107) \boldsymbol A_k = \boldsymbol V \boldsymbol B_k \tag{107} Ak=VBk(107)

由引理4可得,
V ∣ G k = d A k ( A k T A k ) − 1 B k T + U A k ⊥ V ~ U B k ⊥ T (108) \boldsymbol V |_{G_k} \overset{d}{=} \boldsymbol A_k (\boldsymbol A^T_k \boldsymbol A_k)^{-1} \boldsymbol B^T_k + \boldsymbol U_{\boldsymbol A_k^{\bot}} \tilde {\boldsymbol V} \boldsymbol U^T_{\boldsymbol B_k^{\bot}} \tag{108} VGk=dAk(AkTAk)1BkT+UAkV~UBkT(108)

把式(108)代入到算法(91)在第 k k k次迭代 q k = V T v k \boldsymbol q_k = \boldsymbol V^T \boldsymbol v_k qk=VTvk,表示为
q k = V T v k = q k det + q k ran (109) \boldsymbol q_k =\boldsymbol V^T \boldsymbol v_k = \boldsymbol q_k^{\text{det}} + \boldsymbol q_k^{\text{ran}} \tag{109} qk=VTvk=qkdet+qkran(109)

其中 q k det \boldsymbol q_k^{\text{det}} qkdet是确定的项
q k det = B k ( A k T A k ) − 1 A k T v k (110) \boldsymbol q_k^{\text{det}} = \boldsymbol B_k (\boldsymbol A^T_k \boldsymbol A_k)^{-1} \boldsymbol A^T_k \boldsymbol v_k \tag{110} qkdet=Bk(AkTAk)1AkTvk(110)

q k ran \boldsymbol q_k^{\text{ran}} qkran是随机的项
q k ran = U B k ⊥ V T ~ U A k ⊥ T v k (111) \boldsymbol q_k^{\text{ran}}= \boldsymbol U_{\boldsymbol B_k^{\bot}} \tilde {\boldsymbol V^T} \boldsymbol U^T_{\boldsymbol A_k^{\bot}} \boldsymbol v_k \tag{111} qkran=UBkVT~UAkTvk(111)

下面将要阐述的引理6、7、8会解释式(110)和式(111)的逼近分布。

引理6:在 H k , k − 1 H_{k,k-1} Hk,k1的假设成立下,存在常数 β k , 0 , … , β k , k − 1 \beta_{k,0}, \ldots, \beta_{k,k-1} βk,0,,βk,k1,使得 ( q 0 , … , q k − 1 , q k det ) (\boldsymbol q_0, \ldots, \boldsymbol q_{k-1},\boldsymbol q_k^{\text{det}} ) (q0,,qk1,qkdet)的块成分经验收敛为
lim ⁡ N → ∞ { w n q , q 0 n , … , q k − 1 , n , q k n det } = P L ( 2 ) ( W q , Q 0 , … , Q k − 1 , Q k d e t ) (112) \lim_{N \rightarrow \infty} \{ w^q_n,q_{0n},\ldots,q_{k-1,n} ,q_{kn}^{\text{det}}\} \overset{PL(2)}{=}(W^q,Q_0,\ldots,Q_{k-1},Q^{det}_k) \tag{112} Nlim{wnq,q0n,,qk1,n,qkndet}=PL(2)(Wq,Q0,,Qk1,Qkdet)(112)

其中 Q l , l = 0 , … , k − 1 Q_l, l=0,\ldots,k-1 Ql,l=0,,k1是高斯随机变量(由式99可知),且
Q k det = β k 0 Q 0 + … + β k , k − 1 Q k − 1 (113) Q^{\text{det}}_k=\beta_{k0} Q_0 + \ldots + \beta_{k,k-1}Q_{k-1} \tag{113} Qkdet=βk0Q0++βk,k1Qk1(113)

证明:使用式(106)的定义
A k T A k = [ P k T P k P k T V k − 1 V k − 1 T P k V k − 1 T V k − 1 ] \boldsymbol{A}_{k}^{T}\boldsymbol{A}_k=\left[ \begin{matrix} \boldsymbol{P}_{k}^{T}\boldsymbol{P}_k& \boldsymbol{P}_{k}^{T}\boldsymbol{V}_{k-1}\\ \boldsymbol{V}_{k-1}^{T}\boldsymbol{P}_k& \boldsymbol{V}_{k-1}^{T}\boldsymbol{V}_{k-1}\\ \end{matrix} \right] AkTAk=[PkTPkVk1TPkPkTVk1Vk1TVk1]

P k T P k \boldsymbol{P}_{k}^{T}\boldsymbol{P}_k PkTPk,有
lim ⁡ N → ∞ 1 N [ P k T P k ] i j = 1 N p i T p j = 1 N ∑ n = 1 N p i n p j n = b E [ P i P j ] = c [ Q k p ] i j \lim_{N \rightarrow \infty} \frac{1}{N} {\left [ \boldsymbol{P}_{k}^{T}\boldsymbol{P}_k \right ]}_{ij} = \frac{1}{N} \boldsymbol p^T_i \boldsymbol p_j=\frac{1}{N} \sum_{n=1}^N p_{in} p_{jn} \overset{b}{=}\mathbb E[P_i P_j] \overset{c}{=}[\boldsymbol Q^p_k]_{ij} NlimN1[PkTPk]ij=N1piTpj=N1n=1Npinpjn=bE[PiPj]=c[Qkp]ij

其中(b)是根据式(97),(c): Q k p \boldsymbol Q^p_k Qkp指协方差矩阵。
类似地,我们定义
lim ⁡ N → ∞ 1 N V k − 1 T V k − 1 = Q k v \lim_{N \rightarrow \infty} \frac{1}{N} \boldsymbol V^T_{k-1} \boldsymbol V_{k-1} =\boldsymbol Q^v_k NlimN1Vk1TVk1=Qkv

对于 A k T A k \boldsymbol{A}_{k}^{T}\boldsymbol{A}_k AkTAk中的交叉项:
E [ V i P j ] = a E [ g p ( P i , W p , γ ˉ 1 i , α ˉ 1 i ) P j ] = b E [ g p ′ ( P i , W p , γ ˉ 1 i , α ˉ 1 i ) P j ] E [ P i P j ] = c E [ P i P j ] [ E [ f p ′ ( P i , W p , γ ˉ 1 i ) ] − α ˉ 1 i ] = d 0 \begin{aligned} \mathbb E[V_iP_j] & \overset{a}{=}\mathbb E[\mathrm{g}_p(P_i,W^p,\bar \gamma_{1i},\bar \alpha_{1i}) P_j] \\ & \overset{b}{=} \mathbb E[\mathrm{g}^{\prime}_p(P_i,W^p,\bar \gamma_{1i},\bar \alpha_{1i}) P_j] \mathbb E[P_iP_j] \\ & \overset{c}{=} \mathbb E[P_iP_j] \left [ \mathbb E [f^{\prime}_p(P_i,W^p,\bar \gamma_{1i})] - \bar \alpha_{1i} \right ] \\ & \overset{d}{=} 0 \end{aligned} E[ViPj]=aE[gp(Pi,Wp,γˉ1i,αˉ1i)Pj]=bE[gp(Pi,Wp,γˉ1i,αˉ1i)Pj]E[PiPj]=cE[PiPj][E[fp(Pi,Wp,γˉ1i)]αˉ1i]=d0

其中,(a)是因为式(101),(b)是依据Stein引理,(c)是直接依据式(102),(d)是依据式(96)。
因此,
lim ⁡ N → ∞ 1 N A k T A k = a . s . [ Q k p 0 0 Q k − 1 v ] (114) \lim_{N \rightarrow \infty} \frac{1}{N} \boldsymbol{A}_{k}^{T}\boldsymbol{A}_k \overset{a.s.}{=} \left[ \begin{matrix} \boldsymbol{Q}_{k}^{p}& \boldsymbol{0}\\ \boldsymbol{0}& \boldsymbol{Q}_{k-1}^{v}\\ \end{matrix} \right] \tag{114} NlimN1AkTAk=a.s.[Qkp00Qk1v](114)

通过类似的计算,还可以得到
lim ⁡ N → ∞ A k T v k = [ 0 b k v ] (115) \lim_{N \rightarrow \infty} \boldsymbol A^T_k \boldsymbol v_k= \left[ \begin{array}{c} \boldsymbol{0}\\ \boldsymbol{b}_{k}^{v}\\ \end{array} \right] \tag{115} NlimAkTvk=[0bkv](115)
其中
b k v = [ E [ V 0 V k ] , E [ V 1 V k ] , … , E [ V k − 1 V k ] ] T (116) \boldsymbol{b}_{k}^{v}=[\mathbb E[V_0V_k], \mathbb E[V_1V_k], \ldots, \mathbb E[V_{k-1}V_k]]^T \tag{116} bkv=[E[V0Vk],E[V1Vk],,E[Vk1Vk]]T(116)

据此可得
lim ⁡ N → ∞ 1 N ( A k T A k ) − 1 A k T v k = a . s . [ 0 β k ] (117) \lim_{N \rightarrow \infty} \frac{1}{N} (\boldsymbol{A}_{k}^{T}\boldsymbol{A}_k)^{-1} \boldsymbol A^T_k \boldsymbol v_k \overset{a.s.}{=} \left[ \begin{array}{c} \boldsymbol{0}\\ \boldsymbol{\beta}_{k}\\ \end{array} \right] \tag{117} NlimN1(AkTAk)1AkTvk=a.s.[0βk](117)

其中
β k ≔ [ Q k − 1 v ] − 1 b k v \boldsymbol \beta_k \coloneqq [\boldsymbol Q^v_{k-1}]^{-1} \boldsymbol b^v_k βk:=[Qk1v]1bkv

因此
q k det = B k ( A k T A k ) − 1 A k T v k = [ U k , Q k − 1 ] [ 0 β k ] + ξ = ∑ l = 0 k − 1 β k l q l + ξ (118) \begin{aligned} \boldsymbol q_k^{\text{det}}& = \boldsymbol B_k (\boldsymbol A^T_k \boldsymbol A_k)^{-1} \boldsymbol A^T_k \boldsymbol v_k \\ &= [\boldsymbol U_k, \boldsymbol Q_{k-1}] \left[ \begin{array}{c} \boldsymbol{0}\\ \boldsymbol{\beta}_{k}\\ \end{array} \right] + \boldsymbol \xi \\ &=\sum_{l=0}^{k-1} \beta_{kl} \boldsymbol q_l + \boldsymbol \xi \tag{118} \end{aligned} qkdet=Bk(AkTAk)1AkTvk=[Uk,Qk1][0βk]+ξ=l=0k1βklql+ξ(118)

其中 ξ \boldsymbol \xi ξ 表示收敛结果与真实值之间的误差,表示为
ξ = B k s ,   s ≔ ( A k T A k ) − 1 A k T v k − [ 0 β k ] (119) \boldsymbol \xi = \boldsymbol B_k \boldsymbol s, \ \boldsymbol s \coloneqq (\boldsymbol A^T_k \boldsymbol A_k)^{-1} \boldsymbol A^T_k \boldsymbol v_k - \left[ \begin{array}{c} \boldsymbol{0}\\ \boldsymbol{\beta}_{k}\\ \end{array} \right] \tag{119} ξ=Bks, s:=(AkTAk)1AkTvk[0βk](119)

可以证明, lim ⁡ N → ∞ 1 N ∥ ξ ∥ 2 → 0 \lim_{N \rightarrow \infty} \frac{1}{N} {\Vert \boldsymbol \xi \Vert}^2 \rightarrow 0 limNN1ξ20,因此
lim ⁡ N → ∞ { w n q , q 0 n , … , q k − 1 , n , q k n det } = P L ( 2 ) ( W q , Q 0 , … , Q k − 1 , Q k d e t ) \lim_{N \rightarrow \infty} \{ w^q_n,q_{0n},\ldots,q_{k-1,n} ,q_{kn}^{\text{det}}\} \overset{PL(2)}{=}(W^q,Q_0,\ldots,Q_{k-1},Q^{det}_k) Nlim{wnq,q0n,,qk1,n,qkndet}=PL(2)(Wq,Q0,,Qk1,Qkdet)

引理7:在 H k , k − 1 H_{k,k-1} Hk,k1的假设条件下,
lim ⁡ N → ∞ 1 N ∥ U A k ⊥ T s k ∥ 2 = ρ k (120) \lim_{N \rightarrow \infty} \frac{1}{N} {\Vert \boldsymbol U^T_{\boldsymbol A^{\bot}_k} \boldsymbol s_k \Vert}^2 = \rho_k \tag{120} NlimN1UAkTsk2=ρk(120)

证明:
∥ U A k ⊥ T s k ∥ 2 = s k T U A k ⊥ U A k ⊥ T s k = s k T ( I − U A k U A k T ) s k = s k T s k − s k T A k ( A k T A k ) − 1 A k T s k ⟹ lim ⁡ N → ∞ 1 N ∥ U A k ⊥ T s k ∥ 2 = E [ S k 2 ] − ( b k s ) T [ Q k s ] − 1 b k s   ( ≔ ρ k ) \begin{aligned} {\Vert \boldsymbol U^T_{\boldsymbol A^{\bot}_k} \boldsymbol s_k \Vert}^2 &= \boldsymbol s_k^T \boldsymbol U_{\boldsymbol A^{\bot}_k} \boldsymbol U^T_{\boldsymbol A^{\bot}_k} \boldsymbol s_k \\ &= \boldsymbol s_k^T (\pmb I - \boldsymbol U_{\boldsymbol A^{}_k} \boldsymbol U^T_{\boldsymbol A^{}_k}) \boldsymbol s_k \\ &= \boldsymbol s_k^T \boldsymbol s_k - \boldsymbol s_k^T \boldsymbol A_k (\boldsymbol A^T_k \boldsymbol A_k)^{-1} \boldsymbol A^T_k \boldsymbol s_k \\ \Longrightarrow \lim_{N \rightarrow \infty} \frac{1}{N} {\Vert \boldsymbol U^T_{\boldsymbol A^{\bot}_k} \boldsymbol s_k \Vert}^2 &= \mathbb E[S^2_k] - (\boldsymbol b^s_k)^T [\boldsymbol Q^s_k]^{-1} \boldsymbol b^s_k \ (\coloneqq \rho_k) \end{aligned} UAkTsk2NlimN1UAkTsk2=skTUAkUAkTsk=skT(IIIUAkUAkT)sk=skTskskTAk(AkTAk)1AkTsk=E[Sk2](bks)T[Qks]1bks (:=ρk)

引理8:在 H k , k − 1 H_{k,k-1} Hk,k1的假设条件下,有
lim ⁡ N → ∞ { ( w 0 q , q 0 n , … , q k − 1 , n , q k n ran ) } = P L ( 2 ) ( W q , Q 0 , … , Q k − 1 , U k ) (121) \lim_{N \rightarrow \infty} \{ (w^q_0, q_{0n},\ldots,q_{k-1,n}, q^{\text{ran}}_{kn}) \} \overset{PL(2)}{=} (W^q, Q_0, \ldots, Q_{k-1}, U_k) \tag{121} Nlim{(w0q,q0n,,qk1,n,qknran)}=PL(2)(Wq,Q0,,Qk1,Uk)(121)

其中 U k ∼ N ( 0 , ρ k ) U_k \sim \mathcal N(0, \rho_k) UkN(0,ρk),独立于 ( W q , Q 0 , … , Q k − 1 ) (W^q, Q_0, \ldots, Q_{k-1}) (Wq,Q0,,Qk1)
证明:(直接应用引理5)
x = V ~ T U A k ⊥ T v k \boldsymbol x = \tilde {\boldsymbol V}^T \boldsymbol U^T_{\boldsymbol A^{\bot}_k} \boldsymbol v_k x=V~TUAkTvk,那么
q k ran = U B k ⊥ x k \boldsymbol q^{\text{ran}}_k= \boldsymbol U_{\boldsymbol B^{\bot}_k} \boldsymbol x_k qkran=UBkxk
根据引理5即可直接证得
q k ran ∼ N ( 0 , ρ k ) \boldsymbol q^{\text{ran}}_k \sim \mathcal N(0, \rho_k) qkranN(0,ρk)

结合引理6和引理8,可以得到
lim ⁡ N → ∞ { ( w 0 q , q 0 n , … , q k − 1 , n , q k n ) } = P L ( 2 ) lim ⁡ N → ∞ { ( w 0 q , q 0 n , … , q k − 1 , n , q k n det + q k n ran ) } = P L ( 2 ) ( W q , Q 0 , … , Q k − 1 , Q k ) } \begin{aligned} & \lim_{N \rightarrow \infty} \{ (w^q_0, q_{0n},\ldots,q_{k-1,n}, q^{\text{}}_{kn}) \} \\ & \overset{PL(2)}{=} \lim_{N \rightarrow \infty} \{ (w^q_0, q_{0n},\ldots,q_{k-1,n}, q^{\text{det}}_{kn}+q^{\text{ran}}_{kn}) \} \\ & \overset{PL(2)}{=}(W^q, Q_0, \ldots, Q_{k-1}, Q_k) \} \end{aligned} Nlim{(w0q,q0n,,qk1,n,qkn)}=PL(2)Nlim{(w0q,q0n,,qk1,n,qkndet+qknran)}=PL(2)(Wq,Q0,,Qk1,Qk)}

其中 Q k Q_k Qk是随机变量,满足
Q k = β k 0 Q 0 + … + β k , k − 1 Q k − 1 + U Q^{\text{}}_k=\beta_{k0} Q_0 + \ldots + \beta_{k,k-1}Q_{k-1} + U Qk=βk0Q0++βk,k1Qk1+U

因为 ( Q 0 , … , Q k − 1 ) ( Q_0, \ldots, Q_{k-1}) (Q0,,Qk1)都是高斯的,并且 U k U_k Uk也是高斯,因此 ( W q , Q 0 , … , Q k − 1 , Q k ) (W^q, Q_0, \ldots, Q_{k-1}, Q_k) (Wq,Q0,,Qk1,Qk)全是高斯。这也就证明了式(99)。

最后,我们证明状态演进方程中的 E [ Q k 2 ] = τ 2 k \mathbb E[Q^2_k] = \tau_{2k} E[Qk2]=τ2k
E [ Q k 2 ] = a lim ⁡ N → ∞ 1 N ∥ q k ∥ 2 = b lim ⁡ N → ∞ 1 N ∥ v k ∥ 2 = c E [ g p 2 ( P l , W p , γ ˉ 1 k , α ˉ 1 k ) ] = d C 1 2 ( α ˉ 1 k ) E [ ( f p ( p , w p , γ ˉ 1 k ) − α ˉ 1 k P k ) 2 ] = C 1 2 ( α ˉ 1 k ) { E [ f p 2 ( p , w p , γ ˉ 1 k ) ] − 2 α ˉ 1 k E [ P k f p ( p , w p , γ ˉ 1 k ) ] + α ˉ 1 k 2 E [ P k 2 ] } = e C 1 2   ( α ˉ 1 k ) { E [ f p 2 ( p , w p , γ ˉ 1 k ) ] − 2 α ˉ 1 k τ 1 k E [ f p ′ ( p , w p , γ ˉ 1 k ) ] + α ˉ 1 k 2 τ 1 k } = f C 1 2   ( α ˉ 1 k ) { E [ f p 2 ( p , w p , γ ˉ 1 k ) ] − α ˉ 1 k 2 τ 1 k } = e τ k \begin{aligned} \mathbb E[Q^2_k] & \overset{a}{=} \lim_{N \rightarrow \infty} \frac{1}{N} {\Vert\boldsymbol q_k \Vert}^2 \\ & \overset{b}{=} \lim_{N \rightarrow \infty} \frac{1}{N} {\Vert\boldsymbol v_k \Vert}^2 \\ & \overset{c}{=} \mathbb E[ \mathrm{g}^2_p(P_l, W_p, \bar \gamma_{1k}, \bar \alpha_{1k})] \\ & \overset{d}{=} C^2_1(\bar \alpha_{1k}) \mathbb E \left [ (f_p(p, w^p,\bar \gamma_{1k}) - \bar \alpha_{1k} P_k \right)^2] \\ & \overset{}{=} C^2_1(\bar \alpha_{1k}) \{ \mathbb E[ f^2_p(p, w^p,\bar \gamma_{1k}) ] - 2 \bar \alpha_{1k} \mathbb E[ P_k f_p(p, w^p,\bar \gamma_{1k}) ] + \bar \alpha^2_{1k} \mathbb E[P^2_k] \} \\ & \overset{e}{=} C^2_1 \ (\bar \alpha_{1k}) \{ \mathbb E[ f^2_p(p, w^p,\bar \gamma_{1k}) ] - 2 \bar \alpha_{1k} \tau_{1k} \mathbb E[ f^{\prime}_p(p, w^p,\bar \gamma_{1k}) ] + \bar \alpha^2_{1k} \tau_{1k}\} \\ & \overset{f}{=}C^2_1 \ (\bar \alpha_{1k}) \{ \mathbb E[ f^2_p(p, w^p,\bar \gamma_{1k}) ] - \bar \alpha^2_{1k} \tau_{1k} \} \\ & \overset{e}{=} \tau_k \end{aligned} E[Qk2]=aNlimN1qk2=bNlimN1vk2=cE[gp2(Pl,Wp,γˉ1k,αˉ1k)]=dC12(αˉ1k)E[(fp(p,wp,γˉ1k)αˉ1kPk)2]=C12(αˉ1k){E[fp2(p,wp,γˉ1k)]2αˉ1kE[Pkfp(p,wp,γˉ1k)]+αˉ1k2E[Pk2]}=eC12 (αˉ1k){E[fp2(p,wp,γˉ1k)]2αˉ1kτ1kE[fp(p,wp,γˉ1k)]+αˉ1k2τ1k}=fC12 (αˉ1k){E[fp2(p,wp,γˉ1k)]αˉ1k2τ1k}=eτk

其中(a)是因为 q k \boldsymbol q_k qk经验收敛于 Q k Q_k Qk,(b)是依据式(91),(c)依据式(101),(e)依据Stein引理,(f)依据式(96)。因此 E [ Q k 2 ] = τ 2 k \mathbb E[Q^2_k] = \tau_{2k} E[Qk2]=τ2k,这也就完成了 H k , k − 1 ⟹ H k , k H_{k,k-1} \Longrightarrow H_{k,k} Hk,k1Hk,k的证明。

这部分的小结

在前一篇章的基础上,该篇章的主要结论为定理4,它描述了算法(91)在迭代过程中的一些项( p k , q k \boldsymbol p_k, \boldsymbol q_k pk,qk)的分布会始终保持零均值的高斯分布,这对之后理解VAMP的状态演进分析是重要的。但是,也可以看出,这部分的证明十分繁琐,并且严格建立在Part-2的基础之上(Part-2也需要对矩阵的一些性质有一定理解)。如果认为不容易理解,可以跳过Part-2和Part-3,关注Part-1和后续内容。也许我也只是把理解这些内容看作一种执念,但最起码在推导和撰写的过程中,有了一些不起眼的认识。

  • 3
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值