近似消息传递(Approximate Message Passing)算法简介

ChookStalker

已于 2023-11-13 19:26:24 修改

阅读量8.9k

点赞数 21

文章标签：算法概率论机器学习信号处理

于 2023-11-11 23:35:35 首次发布

本文链接：https://blog.csdn.net/qq_53305859/article/details/134355434

版权

近似消息传递(Approximate Message Passing)算法简介

1 前言

近似消息传递(Approximate Message Passing, AMP)算法是基于消息传递算法，也叫和-积算法(Sum-Product Algorithm, SPA)，还被称为置信传播(Belief Propagation, BP)算法，经过一系列假设与简化得来，这其中包括了中心极限定理(Central Limit Theory, CLT)和泰勒级数(Taylor Series)展开等[@zou_concise_2022]。

2 基础知识

这一节简单介绍基础知识，包括SPA、CLT、高斯分布乘以高斯分布、泰勒级数以及后验概率密度函数的偏导数求法的结果。

2.1 和-积算法(Sum-Product Algorithm, SPA)

消息传递算法是基于因子图(factor graph)的节点(node)之间的消息沿边(edge)相互传递而得名。

我们假设一个线性估计模型： $\mathbf{y = Hx + w} \quad(1)$
其中， $\mathbf{y} \in \mathbb{C}^M$ 是已知的观测数据(observed data)， $\mathbf{H} \in \mathbb{C}^{M\times N}$ 是已知的观测矩阵， $\mathbf{x}\in \mathbb{C}^N$ 是待估计的信号，并且知道先验分布为 $p(\mathbf{x})$ ， $\mathbf{w} \in \mathbb{C}^M$ 是复高斯白噪声，即 $\mathbf{w} \sim \mathcal{CN}(\mathbf{w};\mathbf{0},\sigma^2 \mathbf{I})$ 。

根据公式(1)存在一个全局概率密度分布(global probability density distribution,PDF) $p(\mathbf{y,x|H})$ ，为了简单起见，我们省略 $\mathbf{H}$ ，写为 $p(\mathbf{y,x})$ 。我们假设 $\mathbf{y}$ 和 $\mathbf{x}$ 的各自的元素都是分别各自独立同分布的。那么我们可以将全局PDF进行分解：
$\begin{aligned} p(\mathbf{y,x}) & = p(\mathbf{y|x})p(\mathbf{x}) \quad(2) \\ & = \prod_a^M p(y_a |\mathbf{x}) \prod_i^N p(x_i) \quad(3)\\ & = \prod_a f_a(\mathbf{x}) \quad(4) \end{aligned}$ 公式(2)是条件概率公式，公式(3)是根据独立假设而得来，公式(4)是因子形式分解，中的 $\mathbf{x}$ 是与该因子相连的全部变量，不是待估计的信号 $\mathbf{x}\in \mathbb{C}^N$ 。

从变量节点 $x_i$ 传递到因子节点 $f_a(\mathbf{x})$ 的消息为： $\mu_{i \rightarrow a}(x_i) \propto \prod_{b \neq a} \mu_{i \leftarrow b}(x_i) \quad(5)$

从因子节点 $f_a(\mathbf{x})$ 传递到变量节点 $x_i$ 的消息为： $\ i f a ( x ) ∏ j ≠ i μ j → a ( x j ) d x \ i ( 6 ) \mu_{i \leftarrow a}(x_i) \propto \int_{\mathbf{x}_{\backslash i}} f_a(\mathbf{x}) \prod_{j \neq i}\mu_{j \rightarrow a}(x_j) \mathbf{d} \mathbf{x}_{\backslash i} \quad(6)$
$\neq i$ 和 $\neq a$ 的原因是避免传递自身的消息，即传递外信息，从而不会出现自己不断置信自己的情况。消息的本质是PDF，由于这里未进行归一化，所以使用正比于 $\propto$ 。

对于我们处理的模型， $\times N$ 的因子图为：

图1：因子图

这张图省略了观测变量 $\mathbf{y}$ ，可以画出来，再用斜线涂黑，代表观测。整张图只有两种不同的节点，并且相同类型的节点不会连接，分别是变量节点(variable node)和因子节点(factor node)，又叫校验节点。消息在两类不同的节点之间相互传递。

那么对于上图来说，公式(5)和(6)可以分别写为： $\ i p ( y a ∣ x ) ∏ j ≠ i N μ j → a ( x j ) d x \ i ( 8 ) \begin{aligned} \mu_{i \rightarrow a}(x_i) & \propto p(x_i) \prod_{b \neq a} \mu_{i \leftarrow b}(x_i) \quad(7) \\ \mu_{i \leftarrow a}(x_i) & \propto \int_{\mathbf{x}_{\backslash i}} p(y_a|\mathbf{x}) \prod_{j \neq i}^N \mu_{j \rightarrow a}(x_j) \mathbf{d} \mathbf{x}_{\backslash i} \quad(8) \end{aligned}$
注意公式(5)和(6)是最一般的形式，即公式(4)，而公式(7)和(8)是针对公式(3)得来。可以看出来公式(7)和(8)只是变量节点 $x_i$ 和因子节点 $p(y_a|\mathbf{x})$ 之间在进行消息传递，而没有 $x_i$ 与 $p(x_i)$ 之间的传递。这是因为 $p(x_i)$ 是边缘节点，只与 $x_i$ 相连，只能传递给 $x_i$ 消息 $p(x_i)$ ，所以不需要更新。或者说，因为传递的是外信息， $p(x_i)$ 就算被更新了，也是传递最原始的先验知识 $p(x_i)$ 。

由于这里的因子图不是可以得到精确解的树状结构，而是环状结构，消息会在两种节点间进行迭代，得到近似解。所以，我们把公式(7)和(8)重写为： $\ i p ( y a ∣ x ) ∏ j ≠ i N μ j → a ( x j ) d x \ i ( 10 ) \begin{aligned} \mu_{i \rightarrow a}^{t+1}(x_i) & \propto p(x_i) \prod_{b \neq a} \mu_{i \leftarrow b}(x_i) \quad(9) \\ \mu_{i \leftarrow a}^{t}(x_i) & \propto \int_{\mathbf{x}_{\backslash i}} p(y_a|\mathbf{x}) \prod_{j \neq i}^N \mu_{j \rightarrow a}(x_j) \mathbf{d} \mathbf{x}_{\backslash i} \quad(10) \end{aligned}$ 其中 $t$ 是第 $t$ 次迭代。

对于我们的需要的后验知识，可以利用下式求出后验分布： $p(x_i^t) \propto p(x_i) \prod_{a=1}^{M} \mu_{i \leftarrow a }^t (x_i) \quad(11)$
即把所有与 $x_i$ 相连的因子节点传递给它的消息进行合并，这被称为因子图的边缘分布，但并不是一般的概率分布的边缘分布(marginal
distribution)，它实际上是后验分布 $p(x_i|\mathbf{y})$ 。

至此，我们完成了简要讨论SPA。

2.2 中心极限定理(Central Limit Theory, CLT)

我们这里考虑最简单的独立同分布(independent identical distribution, i.i.d.)的形式，也叫Lindeberg-Feller中心极限定理。

定理 1 (独立同分布中心极限定理).
假设 $\{X_n\}^N_{n=1}$ 的 $N$ 个样本是i.i.d.的，有 $\mathbb{E}(X_n) = m$ 且 ${\rm Var}(X_n) = \sigma^2 > 0$ 。当 $N$ 足够大时， $\bar{X}_n = \frac{1}{N}\sum_{n=1}^{N}X_n$ 近似服从正态分布 $\mathcal{N}(m, \frac{\sigma^2}{N})$ 。

我们这里将正态分布重写为标准正态分布，即 $\frac{\bar{X}_n - m}{\sigma / \sqrt{N} } \sim \mathcal{N}(0, 1)$ 。同样的，在复数域下，可以写为 $\frac{\bar{X}_n - m}{\sigma / \sqrt{N} } \sim \mathcal{CN}(0, 1)$

2.3 高斯分布乘高斯分布

接下来介绍高斯分布乘高斯分布的引理：

定理 2 (Gaussian reproduction lemma).
对于两个高斯分布， $\mathcal{N}(x;a,A)$ 和 $\mathcal{N}(x;b,B)$ ，它们的乘积为：
$\mathcal{N}(x;a,A)\mathcal{N}(x;b,B) = \mathcal{N}(x;c,C)\mathcal{N}(0;a-b,A+B) \propto \mathcal{N}(x;c,C) \quad(12)$ 其中 $\frac{1}{C} = \frac{1}{A} + \frac{1}{B}$ , $\frac{c}{C} = \frac{a}{A} + \frac{b}{B}$ 。

这里的证明很简单，简单的想办法把两个高斯分布的乘积拼凑高斯分布的形式即可，可以很显式地得出均值和方差，可以将等式两边除以 $\mathcal{N}(0;a-b,A+B)$ 以归一化，得到真正的PDF。同样的，给出复数域的情况：
$\mathcal{CN}(x;a,A)\mathcal{CN}(x;b,B) = \mathcal{CN}(x;c,C)\mathcal{CN}(0;a-b,A+B) \propto \mathcal{CN}(x;c,C)\quad(13)$ 至此，我们完成了基础知识的简单介绍。

2.4 一阶泰勒级数展开

对于一个两变量的函数 $f (x, y)$ ，并假设其Lipschitz连续，我们一阶泰勒级数展开有：
$f(x+\Delta x ,y+\Delta y ) \approx f(x,y) + \Delta x f'_x(x,y)+ \Delta y f'_y(x,y)\quad(14)$ 其中， $f'_x$ 和 $f'_y$ 分别代表了对 $x$ 和 $y$ 分别求偏导。

2.5 后验概率密度函数的偏导数求法

对于一个任意有界非负的函数 $f (x)$ （其实是加权），并定义一个分布 $\mathcal{P}(x) = \frac{f(x)\mathcal{CN}(x;m,v)}{\int f(x)\mathcal{CN}(x;m,v) {\rm d}x}$ 。其均值和方差表示为 $\mathbb{E}(x) = \int x \mathcal{P}(x) {\rm d}x$ 和 ${\rm Var}(x) = \int [x - \mathbb{E}(x)]^2 \mathcal{P}(x) {\rm d}x$ 。我们有：
$\begin{split} \frac{\partial \int x \mathcal{P}(x) {\rm d}x }{\partial m} & = \frac{\int x \frac{x-m}{v} f(x) \mathcal{CN}(x;m,v){\rm d}x \cdot \int f(x) \mathcal{CN}(x;m,v){\rm d}x }{[\int f(x)\mathcal{CN}(x;m,v){\rm d}x ]^2} \\ & \quad - \frac{\int x f(x) \mathcal{CN}(x;m,v){\rm d}x \cdot \int \frac{x-m}{v} f(x) \mathcal{CN}(x;m,v){\rm d}x }{[\int f(x)\mathcal{CN}(x;m,v){\rm d}x ]^2} \\ & = \frac{{\rm Var}(x)}{v} \end{split}\quad(15)$

3 AMP推导

我们回到SPA的模型以及公式(9)和(10)，开始对它们进行简化。

3.1 从因子节点传递到变量节点的消息

对于公式(10)中的 $p(y_a|\mathbf{x})$ 而言，我们这里不再省略 $\mathbf{H}$ ，并用 $\mathbf{H}_{a\sim}$ 表示其第 $a$ 行：
$\begin{split} p(y_a|\mathbf{H,x}) & = p_{w_a}(y_a - \mathbf{H}_{a\sim}\mathbf{x} ) \\ & = \mathcal{CN}(y_a;\mathbf{H}_{a\sim}\mathbf{x} , \sigma^2 ) \\ & = \mathcal{CN}(w_a = y_a - \mathbf{H}_{a\sim}\mathbf{x}; 0, \sigma^2 )\quad(16)\\ & = \frac{1}{\pi \sigma^2}\exp( \frac{-|y_a - \mathbf{H}_{a\sim}\mathbf{x}|^2}{\sigma^2} ) \end{split}$
其中 $p_{w_a}(\cdot)$ 代表是随机变量 $w_a$ 的PDF， $\mathbf{H}_{a\sim}\mathbf{x}$ 也可以被写作 $\sum_{k=1}^{N}h_{ak}x_k$ 。

为了简化公式(10)，我们将其重写为： $\ i p ( y a ∣ H a ∼ x ) ∏ j ≠ i N μ j → a t ( x j ) d x \ i ∝ ∫ x \ i ∫ z a p ( y a ∣ z a ) δ ( z a − ∑ k = 1 N h a k x k ) d z a ∏ j ≠ i N μ j → a t ( x j ) d x \ i ∝ ∫ z a p ( y a ∣ z a ) E { δ ( z a − ∑ j ≠ i N h a j x j − h a i x i ) } d z a ( 17 ) \begin{split} \mu_{i \leftarrow a}^{t}(x_i) & \propto \int_{\mathbf{x}_{\backslash i}} p(y_a|\mathbf{H}_{a\sim}\mathbf{x}) \prod_{j \neq i}^N \mu_{j \rightarrow a}^t(x_j) \mathbf{d} \mathbf{x}_{\backslash i} \\ & \propto \int_{\mathbf{x}_{\backslash i}} \int_{z_a} p(y_a|z_a) \delta(z_a - \sum_{k=1}^{N}h_{ak}x_k) {\rm d}z_a \prod_{j \neq i}^N \mu_{j \rightarrow a}^t(x_j) \mathbf{d} \mathbf{x}_{\backslash i}\\ & \propto \int_{z_a} p(y_a|z_a) \mathbb{E}\left\{\delta \left( z_a - \sum_{j \neq i}^{N}h_{aj}x_j - h_{ai}x_i \right) \right\} {\rm d}z_a \quad(17) \end{split}$
其中，期望是基于 $\prod_{j \neq i}^N \mu_{j \rightarrow a}^t(x_j)$ 求的。我们定义了新的随机变量 $z_a = \mathbf{H}_{a\sim}\mathbf{x}$ ，以及与它相关的随机变量 $\zeta_{i \leftarrow a}^{t}$ 。再定义根据 $\mu_{j \rightarrow a}^{t}(x_j)$ 与 $x_j$ 相关的 $\xi_{j \rightarrow a}^{t}$ 。并且给出 $\xi_{j \rightarrow a}^{t}$ 的均值和方差分别为 $\hat{x}_{j \rightarrow a}^{t}$ 和 $\hat{v}_{j \rightarrow a}^{t}$ 。由公式(17)，当 $N$ 趋近于无穷时，根据CLT，可以得到随机变量 $\zeta_{i \leftarrow a}^{t}$ 收敛到高斯随机变量，其均值和方差为：
$\mathbb{E}(\zeta_{i \leftarrow a}^{t}) = Z^t_{i \leftarrow a}+h_{ai}x_i , \quad {\rm Var}(\zeta_{i \leftarrow a}^{t}) = V_{i \leftarrow a}^t\quad(18)$
其中 $Z^t_{i \leftarrow a} = \sum_{j \neq i} h_{aj}\hat{x}^t_{j \rightarrow a}, \quad V_{i \leftarrow a}^t = \sum_{j \neq i} |h_{aj}|^2 \hat{v}^t_{j \rightarrow a } \quad(19)$
实际上公式(19)可以视为在 $\mathbf{x}_{\sim i}$ 下对 $y_a$ 的估计。

基于这里的高斯假设，我们可以将公式(17)中的 $\mathbb{E}\{\delta ( z_a - \sum_{j \neq i}^{N}h_{aj}x_j - h_{ai}x_i )\}$ 替换为， $\mathcal{CN}(z_a ; h_{ai}x_i + Z^t_{i \leftarrow a}, V_{i \leftarrow a}^t)$ ，再根据高斯分布乘高斯分布的引理， $\mu_{i \leftarrow a}^{t}(x_i)$ 被近似为：
$\begin{split} \mu_{i \leftarrow a}^{t}(x_i) & \propto \int_{z_a} p(y_a|z_a)\mathcal{CN}(z_a ; h_{ai}x_i + Z^t_{i \leftarrow a}, V_{i \leftarrow a}^t) {\rm d}z_a \\ & \propto \int_{z_a} \mathcal{CN}(y_a;\mathbf{H}_{a\sim}\mathbf{x} , \sigma^2 ) \mathcal{CN}(z_a ; h_{ai}x_i + Z^t_{i \leftarrow a}, V_{i \leftarrow a}^t) {\rm d}z_a \\ & \propto \int_{z_a} \mathcal{CN}(z_a; y_a , \sigma^2 ) \mathcal{CN}(z_a ; h_{ai}x_i + Z^t_{i \leftarrow a}, V_{i \leftarrow a}^t) {\rm d}z_a \\ & \propto \mathcal{CN}(0; y_a - h_{ai}x_i - Z^t_{i \leftarrow a} , \sigma^2 + V_{i \leftarrow a}^t) \\ & \propto \mathcal{CN}(x_i;\frac{y_a - Z_{i \leftarrow a}^t}{h_{ai}},\frac{\sigma^2 + V_{i \leftarrow a}^t}{|h_{ai}|^2})\quad(20) \end{split}$
这其中第二个 $\propto$ 到第三个 $\propto$ 之间，利用了高斯分布的性质，更换了PDF的变量，以保证两个复高斯分布都是基于随机变量 $z_a$ ，从而可以使用高斯分布乘高斯分布的引理。

注意到此时消息变成了复数高斯的，且计算时也需要传入消息的均值与方差。高斯消息仅存在两个充分统计量，会让参数传递变得方便。在此，定义此条消息的均值和方差为：
$\hat{x}_{i \leftarrow a}^t = \frac{y_a - Z_{i \leftarrow a}^t}{h_{ai}}; \quad \hat{v}_{i \leftarrow a}^t = \frac{\sigma^2 + V_{i \leftarrow a}^t}{|h_{ai}|^2}\quad(21)$

值得注意的是，这里对与PDF中的方差和均值没有区分";“和”|“，实际上这里更严谨需要使用”|“，但为了表明均值和方差，我们采用”;"。并且，此处如果 $\mathbf{H}$ 中某个元素为0，会导致除以0的情况发生导致错误。事实上，可以在后面的推导中避免这一问题。

3.2 从变量节点传递到因子节点的消息

我们回到公式(9)，我们先对 $\prod_{b \neq a}^{M} \mu_{i \leftarrow b}^t (x_i)$ 进行分析：
$\begin{split} \prod_{b \neq a}^{M} \mu_{i \leftarrow b}^t (x_i) & = \prod_{b \neq a}^{M} \mathcal{CN}(x_i;\hat{x}_{i \leftarrow b}^t , \hat{v}_{i \leftarrow b}^t ) \\ & \propto \mathcal{CN}(x_i;\hat{r}_{i \rightarrow a}^t , \hat{\Sigma}_{i \rightarrow a}^t )\quad(22) \end{split}$ 其中 $\begin{aligned} \frac{1}{\hat{\Sigma}_{i \rightarrow a}^t} & = \sum_{b \neq a}^{M} \frac{1}{\hat{v}_{i \leftarrow b}^t} \\ & = \sum_{b \neq a}^{M} \frac{|h_{bi}|^2 }{\sigma^2 + V_{i \leftarrow b}^t} \quad(23)\\ \frac{\hat{r}_{i \rightarrow a}^t}{\hat{\Sigma}_{i \rightarrow a}^t} & = \sum_{b \neq a}^{M} \frac{\hat{x}_{i \leftarrow b}^t}{ \hat{v}_{i \leftarrow b}^t } \\ & = \sum_{b \neq a}^{M} \frac{y_b - Z_{i \leftarrow b}^t}{h_{bi}} \frac{|h_{bi}|^2 }{\sigma^2 + V_{i \leftarrow b}^t}\\ & = \sum_{b \neq a}^{M} \frac{h_{bi}^* (y_b - Z_{i \leftarrow b}^t) }{\sigma^2 + V_{i \leftarrow b}^t} \quad(24) \end{aligned}$
这里完全是利用的公式(13)。之后，我们再结合先验信息 $p(x_i)$ ： $\begin{split} \mu_{i \rightarrow a}^{t+1}(x_i) & \propto p(x_i) \prod_{b \neq a}^M \mu_{i \leftarrow b}(x_i) \\ & \propto p(x_i)\mathcal{CN}(x_i ;\hat{r}_{i \rightarrow a}^t , \hat{\Sigma}_{i \rightarrow a}^t )\quad(25) \end{split}$
我们不需要知道 $\mu_{i \rightarrow a}^{t+1}(x_i)$ 的确切分布，只需要知道其均值和方差，并近似为一个新的复高斯分布 $\mu_{i \rightarrow a}^{t+1}(x_i) \propto \mathcal{CN}(x_i;\hat{x}^{t+1}_{i \rightarrow a}, \hat{v}^{t+1}_{i \rightarrow a})$ ，其中
$\begin{aligned} \hat{x}^{t+1}_{i \rightarrow a} & = \mathbb{E}(x_i) \notag \\ & = \int_{x_i} x_i \frac{1}{C}p(x_i) \mathcal{CN}(x_i ;\hat{r}_{i \rightarrow a}^t , \hat{\Sigma}_{i \rightarrow a}^t) {\rm d}x_i \notag \\ & \overset{def}{=} F(x_i;\hat{r}_{i \rightarrow a}^t , \hat{\Sigma}_{i \rightarrow a}^t) \quad(26) \\ \hat{v}^{t+1}_{i \rightarrow a} & = {\rm Var}(x_i) \notag \\ & = \int_{x_i} (x_i - \hat{x}^{t+1}_{i \rightarrow a} )^2 \frac{1}{C}p(x_i) \mathcal{CN}(x_i ;\hat{r}_{i \rightarrow a}^t , \hat{\Sigma}_{i \rightarrow a}^t) {\rm d}x_i \notag \\ & \overset{def}{=} G(x_i;\hat{r}_{i \rightarrow a}^t , \hat{\Sigma}_{i \rightarrow a}^t) \quad(27) \end{aligned}$ 其中， $\int_{x_i} p(x_i) \mathcal{CN} (x_i;\hat{r}_{i \rightarrow a}^t , \hat{\Sigma}_{i \rightarrow a}^t){\rm d}x_i$ 是归一化常数。当 $p(x_i)$ 为确定分布时，这两个积分可以直接计算或近似，其复杂度为 $\mathcal{O}(1)$ 。

至此，两种节点之间的消息均推导完毕。到这里，实际上是对循环置信传播(Loopy Belief Propagation, LBP)进行了近似，减少了传递的参数量，将消息传递变成了统计量传递，但到目前为止，我们已经完成高斯化所有信息的处理。

3.3 进一步的简化

上述消息的计算均涉及到 $\mathcal{O}(N)$ 或 $\mathcal{O}(M)$ （两者同等级，以下统一为 $\mathcal{O}(N)$ ）复杂度的计算过程，同时，在系统中，每次迭代存在 $2 MN$ 条消息，故实际上以上算法的复杂度约为 $\mathcal{O}(N^3)$ .考虑到三次复杂度依然难以接受，AMP对上述两类消息进行了进一步的简化。

我们约定， $h_{ij}|$ 的数量级为 $O(1/\sqrt{N})$ ，因此 $h_{ij}|^2$ 的数量级为 $O (1/ N)$ ；同时约定 $x_j$ 的数量级为 $O (1)$ ，因此其估计量 $\hat{x}_i$ 和 $\hat{v}_i$ 数量级同样为 $O (1)$ 。在以上约定下， $y_i$ 的数量级同样的，也是 $O (1)$ ， $Z_a, V_a$ 也同样应为 $O (1)$ 。因此，我们认为在在大系统极限下，单独存在的 $O(1/\sqrt{N})$ 数量级或更低的变量以及求和后数量级为 $O(1/\sqrt{N})$ 或更低可作为无穷小量而被忽略。

回忆公式(11)，我们提到了我们想要知道的后验分布 $p^{t+1}(x_i|\mathbf{y})$ ，我们定义：
$\begin{aligned} \hat{\Sigma}_{i}^t & = \left(\sum_{a=1}^{M} \frac{1}{\hat{v}_{i \leftarrow a}^t}\right)^{-1} \notag \\ & = \left(\sum_{a=1}^{M} \frac{|h_{ai}|^2 }{\sigma^2 + V_{i \leftarrow a}^t}\right)^{-1} \quad(28) \\ \hat{r}_{i}^t & = \hat{\Sigma}_{i}^t \sum_{a = 1}^{M} \frac{h_{ai}^* (y_a - Z_{i \leftarrow a}^t) }{\sigma^2 + V_{i \leftarrow a}^t} \quad(29) \end{aligned}$ 其中公式(11)的 $\prod_{a=1}^{M} \mu^t_{i \leftarrow a}(x_i)$ 正比于 $\mathcal{CN}(x_i; \hat{r}_i^t,\hat{\Sigma}_i^t )$ 。相应的， $p^{t+1}(x_i|\mathbf{y})$ 的近似后验均值和方差可以表示为：
$\begin{aligned} \hat{x}^{t+1}_{i} & = F(x_i;\hat{r}_{i}^t , \hat{\Sigma}_{i}^t) \quad(30) \\ \hat{v}^{t+1}_{i} & = G(x_i;\hat{r}_{i}^t , \hat{\Sigma}_{i}^t) \quad(31) \end{aligned}$

同样地，我们定义 $\begin{aligned} Z_a^t & = \sum_{i = 1}^{N} h_{ai} \hat{x}^t_{i \rightarrow a} \quad(32) \\ V_a^t & = \sum_{i = 1}^{N} |h_{ai}|^2 \hat{v}^t_{i \rightarrow a} \approx V_{i \leftarrow a}^t \quad(33) \end{aligned}$ 其中 $\approx$ 是忽略了无穷小量。

对公式(26)中的 $\hat{x}^{t+1}_{i \rightarrow a}$ 应用一阶泰勒级数展开，有：
$\hat{x}^{t+1}_{i \rightarrow a} \approx \hat{x}^{t+1}_{i} + \Delta \hat{r} \frac{\partial}{\partial r} F(x_i;\hat{r}_{i}^t , \hat{\Sigma}_{i}^t) + \Delta \hat{\Sigma} \frac{\partial}{\partial \Sigma} F(x_i;\hat{r}_{i}^t , \hat{\Sigma}_{i}^t)\quad(34)$
其中 $\begin{aligned} \Delta \Sigma & = \hat{\Sigma}^t_{i \rightarrow a } - \hat{\Sigma^t_{i}} \notag \\ & = \left(\sum_{b \neq a}^{M} \frac{|h_{bi}|^2 }{\sigma^2 + V_{i \leftarrow b}^t}\right)^{-1} - \left(\sum_{b=1}^{M} \frac{|h_{bi}|^2 }{\sigma^2 + V_{i \leftarrow b}^t}\right)^{-1} \notag \\ & = \frac{ \frac{|h_{ai}|^2 }{\sigma^2 + V_{i \leftarrow a}^t} }{\left(\sum_{b \neq a}^{M} \frac{|h_{bi}|^2 }{\sigma^2 + V_{i \leftarrow b}^t}\right)\left(\sum_{b=1}^{M} \frac{|h_{bi}|^2 }{\sigma^2 + V_{i \leftarrow b}^t}\right)} \notag \\ & \approx \frac{ \frac{|h_{ai}|^2 }{\sigma^2 + V_{a}^t} }{\left(\sum_{b \neq a}^{M} \frac{|h_{bi}|^2 }{\sigma^2 + V_{i \leftarrow b}^t}\right)\left(\sum_{b=1}^{M} \frac{|h_{bi}|^2 }{\sigma^2 + V_{i \leftarrow b}^t}\right)} \notag \\ & \approx 0 \quad(35) \\ \Delta r & = \hat{r}^t_{i \rightarrow a } - \hat{r}^t_{i} \notag \\ & = \hat{\Sigma}^t_{i \rightarrow a }\sum_{b \neq a}^{M} \frac{h_{bi}^* (y_b - Z_{i \leftarrow b}^t) }{\sigma^2 + V_{i \leftarrow b}^t} - \hat{\Sigma}_{i}^t \sum_{b = 1}^{M} \frac{h_{bi}^* (y_b - Z_{i \leftarrow b}^t) }{\sigma^2 + V_{i \leftarrow b}^t} \notag\\ & \approx -\hat{\Sigma}_{i}^t \frac{h_{ai}^* (y_a - Z_{i \leftarrow a}^t) }{\sigma^2 + V_{i \leftarrow a}^t} \notag\\ & \approx -\hat{\Sigma}_{i}^t \frac{h_{ai}^* (y_a - Z_{i \leftarrow a}^t) }{\sigma^2 + V_{a}^t} \quad(36) \end{aligned}$
其中利用到了 $V_a^t = V_{i \leftarrow a}^t + O(1/N)$ 和 $\hat{\Sigma}_i^t =\hat{\Sigma}_{i \rightarrow a}^t + O(1/N)$ 的近似。利用后验概率密度函数的偏导数求法，可以得到 $\frac{\partial}{\partial r} F(x_i;r,\Sigma^t_i)|_{r = \hat{r}^t_i} = \frac{\hat{v}_i^{t+1}}{\hat{\Sigma}_i^t}$ ，从而公式(34)可以化简为： $\hat{x}^{t+1}_{i \rightarrow a} \approx \hat{x}_i^{t+1} - \frac{h_{ai}^*(y_a - Z^t_{i \leftarrow a})}{\sigma^2+V_a^t}\hat{v}_i^{t+1} \quad(37)$

同样地，我们对公式(27)进行一阶泰勒展开，有： $\hat{v}^{t+1}_{i \rightarrow a} \approx \hat{v}_i^{t+1} + \Delta r \frac{\partial}{\partial r}G(x_i;\hat{r}_i^t,\hat{\Sigma}^t_i)\quad(38)$
将公式(36)和(38)带入(33)得到： $\begin{split} V_a^t & \approx \sum_{i=1}^{N} |h_{ai}|^2 \left( \hat{v}_i^t -\hat{\Sigma}_{i}^t \frac{h_{ai}^* (y_a - Z_{i \leftarrow a}^t) }{\sigma^2 + V_{a}^t} \times \frac{\partial}{\partial r}G(x_i;\hat{r}_i^t,\hat{\Sigma}^t_i) \right) \\ & \approx \sum_{i=1}^{N} |h_{ai}|^2 \hat{v}_i^t - \sum_{i=1}^{N} |h_{ai}|^2 \left( \sum_{b=1}^{M}\frac{|h_{bi}|^2}{\sigma^2 + V^t_{i \leftarrow b}} \right)^{-1} \frac{h_{ai}^* (y_a - Z_{i \leftarrow a}^t)}{\sigma^2 + V_{a}^t} \times \frac{\partial}{\partial r}G(x_i;\hat{r}_i^t,\hat{\Sigma}^t_i) \\ & = \sum_{i=1}^{N} |h_{ai}|^2 \hat{v}_i^t + O(1/\sqrt{N}) \\ & \approx \sum_{i=1}^{N} |h_{ai}|^2 \hat{v}_i^t \quad(39) \end{split}$
其中 $O(1/\sqrt{N})$ 的来源是 $h_{ai}^*$ 。再将公式(37)代入(32)中，可得： $\begin{split} Z_a^t & \approx \sum_{i=1}^{N} h_{ai} \hat{x}_i^t - \sum_{i=1}^{N} \frac{|h_{ai}|^2(y_a - Z_{i \leftarrow a}^{t-1})}{\sigma^2 + V_{a}^{t-1}} \hat{v}_{i}^{t} \\ & = \sum_{i=1}^{N} h_{ai} \hat{x}_i^t - \sum_{i=1}^{N} \frac{|h_{ai}|^2\hat{v}_{i}^{t}(y_a - Z_{a}^{t-1} + h_{ai}\hat{x}_i^{t-1})}{\sigma^2 + V_{a}^{t-1}} \\ & \approx \sum_{i=1}^{N} h_{ai} \hat{x}_i^t - \frac{V_a^t(y_a - Z_{a}^{t-1})}{\sigma^2 + V_{a}^{t-1}} \quad(40) \end{split}$
其中最后一步的化简是因为 $h_{ai}\hat{x}_i^{t-1}=O(1/\sqrt{N})$ 。再将公式(37)代入(29)中： $\begin{split} \hat{r}_{i}^t & \approx \hat{\Sigma}_{i}^t \sum_{a = 1}^{M} \frac{h_{ai}^* (y_a - Z_{a}^{t} + h_{ai}\hat{x}_i^{t}) }{\sigma^2 + V_{a}^t} \\ & = \hat{x}^t_i + \hat{\Sigma}_i^t \sum_{a=1}^{M} \frac{h_{ai}^* (y_a - Z_{a}^{t} )}{\sigma^2 + V_{a}^t} \quad(41) \end{split}$ 到此为止，我们完成AMP的推导。

4 AMP伪算法及总结

最终，我们把算法表示在这里：
在这里插入图片描述

需要注意的是，此处的上标 $T$ 不是转置(transpose)，而是第 $T$ 迭代之后的结果。还需要强调的是(30)和(31)其实就是后验分布的均值和方差，在伪算法中给出了贝叶斯的形式，注意这里是对后验分布求的均值和方差。 $\hat{r}_{i}^t$ 和 $\hat{\Sigma}_{i}^t$ 是似然函数（复高斯分布）的均值和方差。还有一些文献会引入残差项（一般为 $\hat{s}_a^t$ ）来表示 $(y_a - Z_a^t)/(\sigma^2+V_a^t)$ 等。