DOA估计 基于稀疏贝叶斯的离格DOA估计

本文介绍了稀疏贝叶斯学习在方向估计问题中的应用,特别是在处理离格信号时的算法。文章详细阐述了基于离格空域模型的迭代算法,利用拉普拉斯先验和信号的联合稀疏性进行推断。通过贝叶斯估计和噪声模型建立,提出了信号模型和噪声模型的联合概率密度函数,并通过EM法更新超参数。最后,文章提供了算法的仿真结果,验证了方法的有效性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

前言 

        稀疏贝叶斯学习是贝叶斯统计优化算法中十分重要的一类,它是在贝叶斯理论的基础上发展而来的。这篇文章的工作由杨在教授等人在2013年1月于TSP上发布,从贝叶斯的角度出发,是一个基于离格空域模型的迭代算法,利用所有快拍处的信号都具有拉普拉斯先验及不同快拍之间的联合稀疏性进行推断。本篇文章主要是对这篇TSP其中一些部分进行补充推导,由于第一次接触稀疏贝叶斯,因此内容可能有些繁琐及错误。

P.S.这玩意自带的latex编辑器时不时抽风因此有些公式可能有些诡异的大。

离格估计模型

        网格划分:

        定义角度域模型先验网格划分如下所示:

\tilde{\theta}=\{\tilde{\theta}_1,\cdots,\tilde{\theta}_N\}

        网格在[-\pi/2,\pi/2]均匀划分,显然,网格模型格点数N、阵元数M与信源数K应服从:

N\gg M> K

        对于待估计的入射波角度\theta_k \notin {\{\tilde{\theta}_1,\cdots,\tilde{\theta}_N\}},其导向矢量在角度域上可作如下一阶泰勒展开进行近似:

        \textbf{a}(\theta_k)\approx\textbf{a}\left(\tilde{\theta}_{n_k}\right)+\textbf{b}\left(\tilde{\theta}_{n_k}\right)\left(\theta_k-\tilde{\theta}_{n_k}\right)

        其中\tilde{\theta}_{n_k}n_k为离第k个信号入射角度最近的网格点的角度及其位置,\textbf{b}\left(\tilde{\theta}_{n_k}\right)为该网格点对应的名义导向矢量的导数。假设阵列排布方向的垂直方向对应的角度为0°,则该阵列导向矢量中的第m个元素可表示为:

\textbf{a}\left(\theta \right)_{m} = e^{j2\pi (m-1)d\cos{\theta}/ \lambda}

        则该元素对\theta求导可表示为:

\textbf{b}\left(\theta \right)_{m} = \frac{\partial \textbf{a}\left(\theta \right)_{m} }{\partial \theta}=-j2 \pi (m-1) d \sin{\theta} e^{j2\pi (m-1)d\cos{\theta}/ \lambda}/ \lambda

        因此对于该先验网格集合中的角度对应的名义导向矢量集合的一阶泰勒展开近似即为:

\boldsymbol\Phi(\boldsymbol\beta)=\boldsymbol A+\boldsymbol B\text{diag}(\boldsymbol\beta)

        其中\boldsymbol A = [\tilde{a}(\tilde{\theta}_1),\cdots,\tilde{a}(\tilde{\theta}_N)]为该网格的名义阵列流形状,\boldsymbol B = [\tilde{b}(\tilde{\theta}_1),\cdots,\tilde{b}(\tilde{\theta}_N)]为该网格的名义阵列流形导数,\boldsymbol{\beta}=\left[\beta_{1},\cdots,\beta_{N}\right]^{T}\in\left[-\frac{1}{2}r,\frac{1}{2}r\right]^{N},r = \theta_k-\theta_{k-1}为对应网格点上的误差权重系数。

因此原来通用的信号模型

\textbf{y}(t)=\textbf{A}(\theta)\textbf{s}(t)+\textbf{e}(t), t=1,\cdots,T

即可被重新表述为

\boldsymbol{Y}=\Phi(\beta)\boldsymbol{X}+\boldsymbol{E}

其中\boldsymbol{X}=[\boldsymbol{x}(1),\cdots,\boldsymbol{x}(T)]\boldsymbol{Y}=[\boldsymbol{y}(1),\cdots,\boldsymbol{y}(T)]\boldsymbol{E}=[\boldsymbol{e}(1),\cdots,\boldsymbol{e}(T)]。又因为需要对该模型进行贝叶斯估计,所以首先假定\boldsymbol{X}为联合稀疏的。

稀疏贝叶斯估计

关于稀疏贝叶斯学习的推导与详细步骤可参考以下链接:

稀疏贝叶斯学习详解--证据和后验概率的计算

Derivation of Sparse Bayesian Learning        

稀疏贝叶斯学习(SBL)算法过程推导

噪声模型

        在圆信号的前提下,复高斯分布Z = X+iY,且满足X \sim (\mu_x,\sigma_x^2)Y \sim (\mu_y,\sigma_y^2),显然有\mu = \mu_x = \mu_y\sigma^2 = \sigma_x^2 = \sigma_y^2,则可得\mu_z = \mu_x+i\mu_y\sigma_z^2 = \sigma_x^2+\sigma_y^2 = 2\sigma^2。因此该复随机过程u服从的分布为:u\sim \mathcal{CN}(\pmb{\mu},\Sigma)\mu为其均值,\Sigma为其协方差阵,不难可从高斯过程推出复高斯过程概率密度函数为,或参考复高斯分布

\mathcal{CN}(\boldsymbol{u}|\boldsymbol{\mu},\boldsymbol{\Sigma})=\frac{1}{\pi^N|\boldsymbol{\Sigma}|}\exp\left\{-(\boldsymbol{u}-\boldsymbol{\mu})^H\boldsymbol{\Sigma}^{-1}(\boldsymbol{u}-\boldsymbol{\mu})\right\}

因此噪声模型的联合概率密度函数可表示为:

p(\boldsymbol{E}|\alpha_0)=\prod\limits_{t=1}^T\mathcal{CN}\left(\boldsymbol{e}(t)|\boldsymbol{0},\alpha_0^{-1}\boldsymbol{I}\right)=\prod\limits_{t=1}^T\frac{1}{\pi^M|\alpha_0\boldsymbol{I}|}\exp\{ -\alpha_0\boldsymbol{e}^H(t)\boldsymbol{e}(t)\}

其中\alpha_{0}=\sigma^{-2}为噪声精度,T为快拍数。

        由于假定\boldsymbol{X}为联合稀疏,则其方差均可视为0,因此阵列接收信号\boldsymbol{Y}的联合概率密度函数为: p(\boldsymbol{Y}|\boldsymbol{X},\alpha_0,\boldsymbol{\beta})=\prod\limits_{t=1}^T \mathcal{CN}\left(\boldsymbol{y}(t)|\boldsymbol{\Phi}(\boldsymbol{\beta})\boldsymbol{x}(t),\alpha_0^{-1}\boldsymbol{I}\right)=~\prod _{t=1}^T\frac{1}{\pi^M|\alpha_0^{-1}\boldsymbol{I}|}\exp\{-\alpha_0[\boldsymbol{y}(\boldsymbol{t})-~\boldsymbol{\Phi}\boldsymbol{x}(\boldsymbol{t})]^H[\boldsymbol{y}(\boldsymbol{t})-~\boldsymbol{\Phi}\boldsymbol{x}(\boldsymbol{t})\}

由于不知道噪声精度,因此对噪声精度进行了一个\Gamma分布作为超先验(离散样本假设服从高斯分布,其共轭分布便是\Gamma分布):

p(\alpha_0;c,d)=\Gamma(\alpha_0|c,d)=[\Gamma( c)]^{-1}d^c\alpha_0^{c-1}\text{exp}\{-d\alpha_0\},c,d\rightarrow 0

稀疏信号模型

        同样的,由于信号与干扰同样是假定稀疏先验的,因此同样以两阶段分层先验对输入信号\boldsymbol{X}及其功率精度进行建模,有:

p(\boldsymbol{X}|\boldsymbol{\alpha})=\prod\limits_{t=1}^T\mathcal{CN}(\boldsymbol{x}(t)|\boldsymbol{0},\boldsymbol{\Lambda})=\prod _{t=1}^T\frac{1}{\pi^N|\boldsymbol{\Lambda}|}\exp\{-\boldsymbol{x}(\boldsymbol{t})^H\boldsymbol{\Lambda}^{-1}\boldsymbol{x}(\boldsymbol{t})\}

p(\boldsymbol{\alpha};\rho)=\prod\limits_{n=1}^N\Gamma(\alpha_n|1,\rho)=\prod _{t=1}^{N}[\Gamma(1)]^{-1}\rho\exp(-\rho\alpha_n)

其中\rho>0\boldsymbol\alpha\in\mathbb{R}^N\boldsymbol{\Lambda}=\text{diag}(\boldsymbol{\alpha})

联合概率密度函数

p(\boldsymbol{X},\boldsymbol{Y},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta})=p(\boldsymbol{Y}|\boldsymbol{X},\alpha_{0},\boldsymbol{\beta})p(\boldsymbol{X}|\boldsymbol{\alpha})p(\boldsymbol{\alpha}){p}(\boldsymbol{\beta})p(\alpha_0)\\~~~~= \big[\prod _{t=1}^T\frac{1}{\pi^M|\alpha_0^{-1}\boldsymbol{I}|}\exp\{-\alpha_0[\boldsymbol{y}(\boldsymbol{t})-\boldsymbol{\Phi}\boldsymbol{x}(\boldsymbol{t})]^H[\boldsymbol{y}(\boldsymbol{t})-\boldsymbol{\Phi}\boldsymbol{x}(\boldsymbol{t})]\}\\ ~~~~~~~~~ \prod _{t=1}^T\frac{1}{\pi^N|\boldsymbol{\Lambda}|}\exp\{-\boldsymbol{x}(\boldsymbol{t})^H\boldsymbol{\Lambda}^{-1}\boldsymbol{x}(\boldsymbol{t})\}\big] [\Gamma(c)]^{-1}d^{c}\alpha_0^{c-1}\exp(-d\alpha_0) \\~~~~~~~~~\prod _{t=1}^{N}[\Gamma(1)]^{-1}\rho\exp(-\rho\alpha_n)

后验概率分布推断

在给定先验分布后,后验分布即可根据贝叶斯准则推得:

p(\boldsymbol{X},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta}|\boldsymbol{Y})p(\boldsymbol{Y})=p(\boldsymbol{Y}|\boldsymbol{X},\boldsymbol{\alpha},\alpha_{0},\boldsymbol{\beta})p(\boldsymbol{X},\boldsymbol{\alpha},\alpha_{0},\boldsymbol{\beta}) \\~\Rightarrow p(\boldsymbol{X},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta}|\boldsymbol{Y})=\frac{p(\boldsymbol{Y}|\boldsymbol{X},\boldsymbol{\alpha},\alpha_{0},\boldsymbol{\beta})p(\boldsymbol{X},\boldsymbol{\alpha},\alpha_{0},\boldsymbol{\beta})}{p(\boldsymbol{Y})}

然而

p(\boldsymbol{Y}) = \int\int\int p(\boldsymbol{Y}|\boldsymbol{X},\alpha_0,\boldsymbol{\beta})d\boldsymbol{X}d\alpha_0d\boldsymbol{\beta}

实际上很难获得,因此需要先对稀疏信号\boldsymbol{X}的后验分布进行入手。

        首先对基于\boldsymbol{Y}后验分布进行分解,根据贝叶斯公式,有:

p(\boldsymbol{X},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta}|\boldsymbol{Y})=\frac{p(\boldsymbol{X},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{Y})}{p(\boldsymbol{Y})}=\frac{p(\boldsymbol{X}|\boldsymbol{Y},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta})p(\boldsymbol{Y},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta})}{p(\boldsymbol{Y})} \\~~=p(\boldsymbol{X}|\boldsymbol{Y},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta})p(\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta}|\boldsymbol{Y})

        根据上式,首先对p(\boldsymbol{X}|\boldsymbol{Y},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta})进行入手。根据贝叶斯公式与全概率公式,有:

p(\boldsymbol{X}|\boldsymbol{Y},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta})=\frac{p(\boldsymbol{Y}|\boldsymbol{X},\alpha_{0},\boldsymbol{\beta})p(\boldsymbol{X}|\boldsymbol{\alpha})}{p(\boldsymbol{Y}|\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta})}=\frac{p(\boldsymbol{Y}|\boldsymbol{X},\alpha_{0},\boldsymbol{\beta})p(\boldsymbol{X}|\boldsymbol{\alpha})}{\int p(\boldsymbol{Y}|\boldsymbol{X},\alpha_{0},\boldsymbol{\beta})p(\boldsymbol{X}|\boldsymbol{\alpha})d \boldsymbol{X}}

分子部分已经给出,可得:

\prod _{t_1=1}^T\frac{\alpha_0}{\pi^M} \exp\left \{ -\left [ \boldsymbol{Y}(t_1)- \boldsymbol{\Phi X}(t_1) \right ]^H \alpha_0 \boldsymbol{I} \left [ \boldsymbol{Y}(t_1)- \boldsymbol{\Phi X}(t_1) \right ] \right \}\prod _{t_2=1}^T \frac{1}{ \pi^N\left | \boldsymbol{\Lambda} \right |} \exp\left \{ -\boldsymbol{X}^H(t_2) \boldsymbol{\Lambda}^{-1} \boldsymbol{X}(t_2) \right \}

=\frac{\alpha_0}{\pi^{M+N}\left | \boldsymbol{\Lambda} \right |}\exp \left \{ - \alpha_0\sum _{t_1=1}^T\left [ \boldsymbol{Y}(t_1)- \boldsymbol{\Phi X}(t_1) \right ]^H \left [ \boldsymbol{Y}(t_1)- \boldsymbol{\Phi X}(t_1) \right ] - \sum _{t_2 = 1}^T\boldsymbol{X}(t_2)^H \boldsymbol{\Lambda}^{-1} \boldsymbol{X}(t_2) \right \}

=\prod _{t=1}^T\frac{\alpha_0}{\pi^{M+N}\left | \boldsymbol{\Lambda} \right |} \exp\left \{ -\alpha_0 ( \boldsymbol{Y}^H(t)\boldsymbol{Y}(t) -\boldsymbol{Y}^H(t)\boldsymbol{\Phi X}(t) -\boldsymbol{X}^H(t)\boldsymbol{\Phi}^H \boldsymbol{Y}(t) )-\boldsymbol{X}^H(t)(\alpha_0\boldsymbol{\Phi}^H\boldsymbol{\Phi}+\boldsymbol{\Lambda}^{-1})\boldsymbol{X} (t)\right \}

考察分母部分,由分子得,原式可变为如下所示:

\sum _{t=1}^T\int p(\boldsymbol{Y}|\boldsymbol{X},\alpha_{0},\boldsymbol{\beta})p(\boldsymbol{X}|\boldsymbol{\alpha})d \boldsymbol{X} \\~~=\sum _{t=1}^T\int \frac{\alpha_0}{\pi^M} \exp\left \{ -\left [ \boldsymbol{Y}- \boldsymbol{\Phi X} \right ]^H \alpha_0 \boldsymbol{I} \left [ \boldsymbol{Y}- \boldsymbol{\Phi X} \right ] \right \} \frac{1}{ \pi^N\left | \boldsymbol{\Lambda} \right |} \exp\left \{ -\boldsymbol{X}^H \boldsymbol{\Lambda}^{-1} \boldsymbol{X} \right \} d \boldsymbol{X}\\~~= \frac{\alpha_0}{\pi^{M+N}\left | \boldsymbol{\Lambda} \right |} \sum _{t=1}^T\int \exp\left \{ - \alpha_0\left [ \boldsymbol{Y}- \boldsymbol{\Phi X} \right ]^H \left [ \boldsymbol{Y}- \boldsymbol{\Phi X} \right ] - \boldsymbol{X}^H \boldsymbol{\Lambda}^{-1} \boldsymbol{X} \right \}d\boldsymbol{X}

=\frac{\alpha_0}{\pi^{M+N}\left | \boldsymbol{\Lambda} \right |} \sum _{t=1}^T\int \exp\left \{ -\alpha_0 ( \boldsymbol{Y}^H\boldsymbol{Y} -\boldsymbol{Y}^H\boldsymbol{\Phi X} -\boldsymbol{X}^H\boldsymbol{\Phi}^H \boldsymbol{Y} )-\boldsymbol{X}^H(\alpha_0\boldsymbol{\Phi}^H\boldsymbol{\Phi}+\boldsymbol{\Lambda}^{-1})\boldsymbol{X} \right \} d\boldsymbol{X}

利用高斯函数的性质,即

\int_{\omega}e^{-(A\omega+b)^2}\mathrm{d}\omega=\boldsymbol{C}

对指数括号内的函数对\boldsymbol{X}求一阶导,并将对应的\boldsymbol{X}带入即可消去关于得到积分后的概率密度函数。

令:

\boldsymbol{L} (\boldsymbol{X}) =-\alpha_0 ( \boldsymbol{Y}^H\boldsymbol{Y} -\boldsymbol{Y}^H\boldsymbol{\Phi X} -\boldsymbol{X}^H\boldsymbol{\Phi}^H \boldsymbol{Y} )-\boldsymbol{X}^H(\alpha_0\boldsymbol{\Phi}^H\boldsymbol{\Phi}+\boldsymbol{\Lambda}^{-1})\boldsymbol{X}

\boldsymbol{L} (\boldsymbol{X})的一阶导求零点,可得:

        \boldsymbol{X}= \alpha_0(\alpha_0\boldsymbol{\Phi}^H\boldsymbol{\Phi}+\boldsymbol{\Lambda}^{-1})^{-1}\boldsymbol{\Phi}^H\boldsymbol{Y}

\boldsymbol{L} (\boldsymbol{X})可进一步转化为:

\boldsymbol{L} (\boldsymbol{X})|_{\boldsymbol{X}=(\alpha_0\boldsymbol{\Phi}^H\boldsymbol{\Phi}+\boldsymbol{\Lambda}^{-1})^{-1}\boldsymbol{\Phi}^H\boldsymbol{Y}}= -\alpha_0 \boldsymbol{Y}^H(\boldsymbol{I}-~\alpha_0\boldsymbol{\Phi}(\alpha_0\boldsymbol{\Phi}^H\boldsymbol{\Phi}+~\boldsymbol{\Lambda}^{-1})^{-1}\boldsymbol{\Phi}^H)\boldsymbol{Y}

根据复高斯分布结构即得分母部分服从的复高斯分布为

p(\boldsymbol{Y}|\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta}) =\prod _{t=1}^T \mathcal{CN}(\boldsymbol{y}(t)|0,\alpha_0[\boldsymbol{I}-~\alpha_0\boldsymbol{\Phi}(\alpha_0\boldsymbol{\Phi}^H\boldsymbol{\Phi}+~\boldsymbol{\Lambda}^{-1})^{-1}\boldsymbol{\Phi}^H]^{-1})

进一步利用Woodbury矩阵引理对协方差阵部分进行化简,即得:

p(\boldsymbol{Y}|\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta}) =\prod _{t=1}^T \mathcal{CN}(\boldsymbol{y}(t)|0,\alpha_0^{-1}\boldsymbol{I}+\boldsymbol{\Phi}\boldsymbol{\Lambda}\boldsymbol{\Phi}^H)

至此后验概率等式右边分子与分母服从的复高斯分布的均值与方差均已经得到,代入后验概率的贝叶斯公式中,即得:

p(\boldsymbol{X}|\boldsymbol{Y},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta}) = \frac{\left | \boldsymbol{I+\alpha_0\boldsymbol{\Phi}\boldsymbol{\Lambda}\boldsymbol{\Phi}^H} \right |}{\pi^N\left | \Lambda \right |} \exp\left \{ -\alpha_0 ( \boldsymbol{Y}^H\boldsymbol{Y} -\boldsymbol{Y}^H\boldsymbol{\Phi X} -\boldsymbol{X}^H\boldsymbol{\Phi}^H \boldsymbol{Y} )-\boldsymbol{X}^H(\alpha_0\boldsymbol{\Phi}^H\boldsymbol{\Phi}+\boldsymbol{\Lambda}^{-1})\boldsymbol{X}+\boldsymbol{Y}^H( \alpha_0^{-1}\boldsymbol{I}+\boldsymbol{\Phi}\boldsymbol{\Lambda}\boldsymbol{\Phi}^H)^{-1}\boldsymbol{Y} \right \}

考虑指数内部分,令其为\boldsymbol{L}'(\boldsymbol{X}),则有:

\boldsymbol{L}'(\boldsymbol{X})=-\alpha_0 ( \boldsymbol{Y}^H\boldsymbol{Y} -\boldsymbol{Y}^H\boldsymbol{\Phi X} -~\boldsymbol{X}^H\boldsymbol{\Phi}^H \boldsymbol{Y} )-~\boldsymbol{X}^H(\alpha_0\boldsymbol{\Phi}^H\boldsymbol{\Phi}+~\boldsymbol{\Lambda}^{-1})\boldsymbol{X}+~\boldsymbol{Y}^H( \alpha_0^{-1}\boldsymbol{I}+~\boldsymbol{\Phi}\boldsymbol{\Lambda}\boldsymbol{\Phi}^H)^{-1}\boldsymbol{Y}

\boldsymbol{X}\big|_{\frac{\partial \boldsymbol{L}'(\boldsymbol{X})}{\partial\boldsymbol{X} }}\left [ {\frac{\partial^2 \boldsymbol{L}'(\boldsymbol{X})}{\partial\boldsymbol{X} ^2}} \right ]^{-1}即为后验概率服从的复高斯分布的均值与协方差,为:

p(\boldsymbol{X}|\boldsymbol{Y},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta}) =\prod_{t=1}^T \mathcal{CN}\left (\boldsymbol{x}(t)|\alpha_0\left(\alpha_0\boldsymbol{\Phi}^H\boldsymbol{\Phi}+\boldsymbol{\Lambda}^{-1}\right)^{-1}\boldsymbol{\Phi}^H\boldsymbol{y}(t), \left(\alpha_0\boldsymbol{\Phi}^H\boldsymbol{\Phi}+\boldsymbol{\Lambda}^{-1}\right)^{-1}\right )

即式15、16。

如作者在文中一样,令:

\boldsymbol{\Sigma} =\left(\alpha_0\boldsymbol{\Phi}^H\boldsymbol{\Phi}+\boldsymbol{\Lambda}^{-1}\right)^{-1}=\boldsymbol{\Lambda}-\boldsymbol{\Lambda}\boldsymbol{\Phi}^H(\alpha_0^{-1}\boldsymbol{I}+\boldsymbol{\Phi}\boldsymbol{\Lambda}\boldsymbol{\Phi}^H)^{-1}\boldsymbol{\Phi}\boldsymbol{\Lambda}

\boldsymbol{\mu}(t) = \alpha_0\boldsymbol{\Sigma}\boldsymbol{\Phi}^H\boldsymbol{y}(t)

\boldsymbol{\Sigma}代入\boldsymbol{\mu}即可得到其另外一种形式,有:

\begin{aligned} \boldsymbol{\mu}(t) &= \alpha_0\left [ \boldsymbol{\Lambda}-\boldsymbol{\Lambda}\boldsymbol{\Phi}^H(\alpha_0^{-1}\boldsymbol{I}+\boldsymbol{\Phi}\boldsymbol{\Lambda}\boldsymbol{\Phi}^H)^{-1}\boldsymbol{\Phi}\boldsymbol{\Lambda} \right ]\boldsymbol{\Phi}^H\boldsymbol{y}(t)\\&=\boldsymbol{\Lambda\Phi}^H\left [ \alpha_0\boldsymbol{I}-\alpha_0\boldsymbol{I}\boldsymbol{\Sigma}_y^{-1}\boldsymbol{\Phi}\boldsymbol{\Lambda}\boldsymbol{\Phi}^H \right ]\boldsymbol{y}(t)\\&=\boldsymbol{\Lambda\Phi}^H\left [ \alpha_0\boldsymbol{I}-\alpha_0\boldsymbol{I}\boldsymbol{\Sigma}_y^{-1}(\boldsymbol{\Phi}\boldsymbol{\Lambda}\boldsymbol{\Phi}^H+\alpha_0^{-1}\boldsymbol{I})+\boldsymbol{\Sigma}_y^{-1} \right ]\boldsymbol{y}(t)\\&=\boldsymbol{\Lambda\Phi}^H\boldsymbol{\Sigma}_y^{-1}\boldsymbol{y}(t)\end{aligned}

超参数更新估计

        此时需要对\alpha_0\boldsymbol{\alpha}\boldsymbol{\beta}进行变量更新,通常有两种方法进行更新,其一是通过第二类最大似然函数,其二是通过EM法。作者在文献中是通过EM法,最大化后验概率情况下联合概率密度的对数期望,对\alpha_0\boldsymbol{\alpha}进行更新。

EM法更新超参数

        算法模型的整体因子图如下所示:

        首先简单介绍一下EM法:EM算法是一种迭代优化策略,由于它的计算方法中每一次迭代都分两步,其中一个为期望步(E步),另一个为极大步(M步),所以算法被称为EM算法(Expectation-Maximization Algorithm)。EM的总体步骤可简单概括为以下几步:

  • 输入:观测数据x=(x_1,...,x_n),联合分布p(x,z;\theta),条件分布p(z|x;\theta),其中z=(z_1,...,z_n)为隐变量,最大迭代次数J。(注意,设该简单模型中的待估计参数为\theta = (\theta_1,...,\theta_n))
  • 随机初始化模型参数\theta的初始值\theta^{(0)}
  • E-step:计算联合分布的条件概率期望:

\theta^{(t)}\rightarrow p(z|x,\theta)\rightarrow E_{z|x,\theta^{(t)}}[\ln p(x,z|\theta)]

  • M-step:极大化联合分布的条件概率期望:

\theta^{(t+1)} = \arg\max_\theta E_{z|x,\theta^{(t)}}[\ln p(x,z|\theta)]

        在最大化后验概率的求解过程中,因为后验概率求解式:

p(\boldsymbol{X},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta}|\boldsymbol{Y})=\frac{p(\boldsymbol{X},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{Y})}{p(\boldsymbol{Y})}=\frac{p(\boldsymbol{X}|\boldsymbol{Y},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta})p(\boldsymbol{Y},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta})}{p(\boldsymbol{Y})} \\~~=p(\boldsymbol{X}|\boldsymbol{Y},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta})p(\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta}|\boldsymbol{Y})

中的右半部分的p(\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta}|\boldsymbol{Y})同样难以求得,但其与联合概率密度成正比,因为:

p(\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta}|\boldsymbol{Y})\propto p(\boldsymbol{Y}|\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta})\rightarrow p(\boldsymbol{Y},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta})\rightarrow p(\boldsymbol{Y},\boldsymbol{X},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta})

在这个等价过程中需要注意的是,虽然前文已经提到p(\boldsymbol{Y})难以获得,但是其与各隐变量及超参数是独立关系,因此上式是成立的,同时在下一节中最大化第二类最大似然函数等价最大化各超参数的这一目标是成立的。因此最大化后验概率密度就等价于最大化将\boldsymbol{X}视为隐变量的联合概率密度,其对数似然函数形式为:

\ln(p(\boldsymbol{X},\boldsymbol{Y},\alpha_0,\boldsymbol{\alpha},\boldsymbol{\beta}))\\=-(M+N)T\ln\pi+MT\ln\alpha_0-~\ln\Gamma(c)-~c\ln d-~(c-~1)\ln\alpha_0-~d\alpha_0-~N\ln\Gamma(1)-~N\ln\rho\\-\sum_{t=1}^T(\alpha_0[\boldsymbol{y}(\boldsymbol{t})-\boldsymbol{\Phi}\boldsymbol{x}(\boldsymbol{t})]^H[\boldsymbol{y}(\boldsymbol{t})-\boldsymbol{\Phi}\boldsymbol{x}(\boldsymbol{t})]+~\boldsymbol{x}^H(t)\boldsymbol{\Lambda}^{-1}\boldsymbol{x}(t))-~\sum_{n=1}^N(T\ln\alpha_{n}+~\rho\alpha_n)

通过最大似然估计法即可得到:

\begin{aligned}\alpha_0^{new} = \arg\max_{\alpha_0}E_{p(\boldsymbol{X}|\boldsymbol{Y},\alpha_0,\boldsymbol{\alpha},\boldsymbol{\beta})}(p(\boldsymbol{Y}|\boldsymbol{X},\alpha_0,\boldsymbol{\beta})p(\alpha_0))\rightarrow\alpha_0^{new}=\frac{MT+c-1}{E\{\|\boldsymbol{Y}-\boldsymbol{\Phi}\boldsymbol{X}\|_F^2\}+d}\end{aligned}

\begin{aligned}\alpha_n^{new} = \arg\max_{\alpha_n}E_{p(\boldsymbol{X}|\boldsymbol{Y},\alpha_0,\boldsymbol{\alpha},\boldsymbol{\beta})}(p(\boldsymbol{X}|\boldsymbol{\alpha})p(\boldsymbol{\alpha}))\rightarrow\alpha_{n}^{new}=\frac{\sqrt{T^2+4\rho E\left\{\|\boldsymbol{X}^{n}\|_{2}^{2}\right\}}-T}{2\rho},n=1,\cdots,N\end{aligned}

其中:

E\left\{\|\boldsymbol{X}^{n}\|_{2}^{2}\right\}=\sum _{t=1}^TE\{ |\boldsymbol{x}^{n}(t)|^2 \}=\sum _{t=1}^T\{\boldsymbol{\mu}(t)\boldsymbol{\mu}^H(t)+\boldsymbol{\Sigma}_{nn}\}

E\left\{\|\boldsymbol{Y}-\boldsymbol{\Phi X}\|_{F}^{2}\right\}=||\boldsymbol{Y}-\boldsymbol{\Phi\mu}||_F^2+~Ttr(\boldsymbol{\Sigma\Phi}^H\boldsymbol{\Phi})=~\|\boldsymbol{Y}-~\boldsymbol{\Phi \mu}\|_{F}^{2}+~T\alpha_0^{-1}\sum_{n=1}^{N}(1-~\alpha_n^{-1}\Sigma_{nn})

接下来处理的重点便转到了tr(\boldsymbol{\Sigma\Phi}^H\boldsymbol{\Phi})上,对于该项,根据Mackay的Bayesian Interpolation,有:

tr(\boldsymbol{\Sigma\Phi}^H\boldsymbol{\Phi})=\alpha_0^{-1}\sum_{n=1}^N (1-\alpha_n^{-1}\Sigma_{nn})=\alpha_0^{-1}\sum_{n=1}^N \gamma_n

证明如下:

\boldsymbol{\Sigma} =\left(\alpha_0\boldsymbol{\Phi}^H\boldsymbol{\Phi}+\boldsymbol{\Lambda}^{-1}\right)^{-1}\Rightarrow \boldsymbol{\Sigma} ^{-1} = ~\alpha_0\boldsymbol{\Phi}^H\boldsymbol{\Phi}+~\boldsymbol{\Lambda}^{-1}\Rightarrow~ \boldsymbol{\Phi}^H\boldsymbol{\Phi}=~\alpha_0^{-1}(\boldsymbol{\Sigma}^{-1}-~\boldsymbol{\Lambda}^{-1})\\\Rightarrow tr(\boldsymbol{\Sigma\Phi}^H\boldsymbol{\Phi})=tr(\alpha_0^{-1}\boldsymbol{\Sigma}(\boldsymbol{\Sigma}^{-1}-~\boldsymbol{\Lambda}^{-1}))=~tr(\alpha_0^{-1}\boldsymbol{I}-~\alpha_0^{-1}\boldsymbol\Lambda^{-1}\boldsymbol{\Sigma})=~\alpha_0^{-1}\sum_{n=1}^N (1-~\alpha_n^{-1}\Sigma_{nn})

证毕

第二类最大似然函数法更新超参数

按照Tipping文中的思路,对数化边缘概率密度,得:

\begin{aligned}&\ln [p(\boldsymbol{Y}|\boldsymbol{\alpha},\alpha_0,\boldsymbol{\beta})p(\boldsymbol{\alpha};\rho)p(\alpha_0)]\\=&-M\ln \pi - T\ln|\alpha_0^{-1}\boldsymbol{I}+\boldsymbol{\Phi}\boldsymbol{\Lambda}\boldsymbol{\Phi}^H|\\-&\sum_{t=1}^T\boldsymbol{y}^H(t)(\alpha_0^{-1}\boldsymbol{I}+\boldsymbol{\Phi}\boldsymbol{\Lambda}\boldsymbol{\Phi}^H)^{-1}\boldsymbol{y}(t)-N\ln\Gamma(1)+N\ln \rho-\rho\sum_{n=1}^N\alpha_n-\ln\Gamma(c)+c\ln d+(c-1) \ln \alpha_0-d\alpha_0\\=&- T\ln|\alpha_0^{-1}\boldsymbol{I}+\boldsymbol{\Phi}\boldsymbol{\Lambda}\boldsymbol{\Phi}^H|-\sum_{t=1}^T\boldsymbol{y}^H(t)(\alpha_0^{-1}\boldsymbol{I}+\boldsymbol{\Phi}\boldsymbol{\Lambda}\boldsymbol{\Phi}^H)^{-1}\boldsymbol{y}(t)-\rho\sum_{n=1}^N\alpha_n+(c-~1) \ln \alpha_0-d\alpha_0+C\end{aligned}

首先引入行列式转换定理:

|\mathbf{A}||\beta^{-1}\mathbf{I}+\mathbf{\Phi}\mathbf{A}^{-1}\mathbf{\Phi}^{\text{T}}|=|\beta^{-1}\mathbf{I}||\mathbf{A}+~\beta\mathbf{\Phi}^{\text{T}}\mathbf{\Phi}|\Rightarrow ~|\beta^{-1}\mathbf{I}+~\mathbf{\Phi}\mathbf{A}^{-1}\mathbf{\Phi}^{\text{T}}|=~\frac{|\beta^{-1}\mathbf{I}||\mathbf{A}+\beta\mathbf{\Phi}^{\text{T}}\mathbf{\Phi}|}{|\mathbf{A}|}

因此原行列式项可进一步转化为:

\ln|\alpha_0^{-1}\boldsymbol{I}+\boldsymbol{\Phi}\boldsymbol{\Lambda}\boldsymbol{\Phi}^H|=-M \ln \alpha_0-\ln |\boldsymbol{\Sigma}|+\ln|\boldsymbol{\Lambda}|

对第二项使用Woodbury矩阵引理,则有:

\sum_{t=1}^T\boldsymbol{y}^H(t)(\alpha_0^{-1}\boldsymbol{I}+\boldsymbol{\Phi}\boldsymbol{\Lambda}\boldsymbol{\Phi}^H)^{-1}\boldsymbol{y}(t)=~\sum_{t=1}^T\boldsymbol{y}(t)^H(\alpha_0\boldsymbol{I}-~\alpha_0\boldsymbol{\Phi}(\boldsymbol{\Lambda}^{-1}+~\alpha_0\boldsymbol{\Phi}^H\boldsymbol{\Phi})^{-1}\boldsymbol{\Phi}^H\alpha_0)\boldsymbol{y}(t)\\=\sum_{t=1}^T\alpha_0\boldsymbol{y}^H(t)\boldsymbol{y}(t)-\alpha_0\boldsymbol{y}^H(t)\boldsymbol{\Phi}\boldsymbol{\Sigma}\boldsymbol{\Phi}^H\alpha_0\boldsymbol{y}(t)=~\sum_{t=1}^T\alpha_0\boldsymbol{y}^H(t)(\boldsymbol{y}(t)-~\boldsymbol{\Phi}\boldsymbol{\mu}(t))\\=\sum_{t=1}^T\alpha_0\left \| \boldsymbol{y}(t)-\boldsymbol{\Phi}\boldsymbol{\mu}(t)\right \|_2^2+\alpha_0\boldsymbol{y}^H(t)\boldsymbol{\Phi}\boldsymbol{\mu}(t)-\alpha_0\boldsymbol{\mu}^H(t)\boldsymbol{\Phi}^H\boldsymbol{\Phi}\boldsymbol{\mu}\\=\sum_{t=1}^T\alpha_0\left \| \boldsymbol{y}(t)-\boldsymbol{\Phi}\boldsymbol{\mu}(t)\right \|_2^2+\boldsymbol{\mu}^H(t)\boldsymbol{\Sigma}^{-1}\boldsymbol{\mu}(t)-\alpha_0\boldsymbol{\mu}^H(t)\boldsymbol{\Phi}^H\boldsymbol{\Phi}\boldsymbol{\mu}\\=\sum_{t=1}^T\alpha_0\left \| \boldsymbol{y}(t)-\boldsymbol{\Phi}\boldsymbol{\mu}(t)\right \|_2^2+\boldsymbol{\mu}^H(t)\boldsymbol{\Lambda}^{-1}\boldsymbol{\mu}(t)

则原式为:

\ln [p(\boldsymbol{Y}|\boldsymbol{\alpha},\alpha_0,\boldsymbol{\beta})p(\boldsymbol{\alpha};\rho)p(\alpha_0)]\\=MT \ln \alpha_0+T\ln |\boldsymbol{\Sigma}|-T\ln|\boldsymbol{\Lambda}|-~\left \{ \sum_{t=1}^T\alpha_0\left \| \boldsymbol{y}(t)-\boldsymbol{\Phi}\boldsymbol{\mu}(t)\right \|_2^2+\boldsymbol{\mu}^H(t)\boldsymbol{\Lambda}^{-1}\boldsymbol{\mu}(t) \right \}-~\rho\sum_{n=1}^N\alpha_n+~(c-~1) \ln \alpha_0-~d\alpha_0+~C

这部分需要费点心思的是\ln |\boldsymbol{\Sigma}|的求导,有以下等式成立:

\frac{d\ln |\boldsymbol{A}|}{dt}=\frac{1}{|\boldsymbol{A}|}|\boldsymbol{A}|tr(\boldsymbol{A}^{-1}\frac{d\boldsymbol{A}}{dt})

因此有:

\frac{\partial \ln |\boldsymbol{\Sigma}|}{\partial \alpha_i} = -\frac{\partial \ln |\boldsymbol{\Sigma}^{-1}|}{\partial \alpha_i}=-tr(\frac{\partial \boldsymbol{\Sigma}^{-1} }{\partial \alpha_i}\boldsymbol{\Sigma})=\frac{1}{\alpha_i^2}\Sigma_{ii}

\frac{d \ln [p(\boldsymbol{Y}|\boldsymbol{\alpha},\alpha_0,\boldsymbol{\beta})p(\boldsymbol{\alpha};\rho)p(\alpha_0)]}{d\alpha_i} =T\frac{\Sigma_{ii}}{\alpha_i^2}-\frac{T}{\alpha_i}+\frac{||\boldsymbol{\mu}^n||_2^2}{\alpha_i^2}-\rho=0

该部分若直接求解,即得

\alpha_n = \frac{-T+\sqrt{T^2+4\rho (||\boldsymbol{\mu}^n||_2^2+T\Sigma_{ii})}}{2\rho }

\frac{d \ln [p(\boldsymbol{Y}|\boldsymbol{\alpha},\alpha_0,\boldsymbol{\beta})p(\boldsymbol{\alpha};\rho)p(\alpha_0)]}{d\alpha_0} =\frac{MT}{\alpha_0}-Ttr(\boldsymbol{\Sigma\Phi}^H\boldsymbol{\Phi})-~||\boldsymbol{Y}-~\boldsymbol{\Phi\mu}||_F^2+~\frac{c-1}{\alpha_0}-~d=~0\\\Rightarrow \alpha_0^{new} =\frac{MT+c-1}{||\boldsymbol{Y}-\boldsymbol{\Phi\mu}||_F^2+Ttr(\boldsymbol{\Sigma\Phi}^H\boldsymbol{\Phi})+d}

        对于误差权重系数\boldsymbol{\beta},需要最大化E\{\ln p(\boldsymbol{Y}|\boldsymbol{X},\alpha_0,\boldsymbol{\beta})p(\boldsymbol{\beta})\},该问题等价于最大化E\{\ln p(\boldsymbol{Y}|\boldsymbol{X},\alpha_0,\boldsymbol{\beta})\},将该概率密度函数带入,即得:

E\{\ln p(\boldsymbol{Y}|\boldsymbol{X},\alpha_0,\boldsymbol{\beta})\}=E\{\sum _{t=1}^T -N\ln\pi +~\ln\alpha_0-~\alpha_0(\boldsymbol{y}(t)-~\boldsymbol{\Phi}\boldsymbol{x}(t))^H(\boldsymbol{y}(t)-~\boldsymbol{\Phi}\boldsymbol{x}(t))\}

即最小化E\{||\boldsymbol{y}(t)-~\boldsymbol{\Phi}\boldsymbol{x}(t)||_2^2\},对该式展开,即有:

E\{\sum _{t=1}^T||\boldsymbol{y}(t)-\boldsymbol{\Phi}\boldsymbol{x}(t)||_2^2\} =E\{||\boldsymbol{y}(t)-\boldsymbol{\Phi}\boldsymbol{\mu}(t)+\boldsymbol{\Phi}\boldsymbol{\mu}(t)-\boldsymbol{\Phi}\boldsymbol{x}(t)||_2^2\}\\=\sum _{t=1}^T||\boldsymbol{y}(t)-\boldsymbol{\Phi}\boldsymbol{\mu}(t)||_2^2+E\{[\boldsymbol{y}(t)-\boldsymbol{\Phi}\boldsymbol{\mu}(t)]^H[\boldsymbol{\Phi}\boldsymbol{\mu}(t)-\boldsymbol{\Phi}\boldsymbol{x}(t)]\}\\~~~~+E\{ [\boldsymbol{\Phi}\boldsymbol{\mu}(t)-\boldsymbol{\Phi}\boldsymbol{x}(t)]^H[\boldsymbol{y}(t)-\boldsymbol{\Phi}\boldsymbol{\mu}(t)] \}+ E\{||\boldsymbol{\Phi}\boldsymbol{\mu}(t)-\boldsymbol{\Phi}\boldsymbol{x}(t)||_2^2 \} \\=\sum _{t=1}^T \{\|\boldsymbol{y}(t)-\left(\boldsymbol{A}+\boldsymbol{B}\text{diag}(\boldsymbol{\beta})\right)\boldsymbol{\mu}(t)\|_2^2+~Tr\left\{\left(\boldsymbol{A}+\boldsymbol{B}\text{diag}(\boldsymbol{\beta})\right)\boldsymbol{\Sigma}\left(\boldsymbol{A}+\boldsymbol{B}\text{diag}(\boldsymbol{\beta})\right)^H\right\}\}

对上式行二、三其中各项展开求均值,可得:

\begin{aligned} E\{[\boldsymbol{y}(t)-\boldsymbol{\Phi}\boldsymbol{\mu}(t)]^H[\boldsymbol{\Phi}\boldsymbol{\mu}(t)-\boldsymbol{\Phi}\boldsymbol{x}(t)]\}=E\{ [\boldsymbol{\Phi}\boldsymbol{\mu}(t)-\boldsymbol{\Phi}\boldsymbol{x}(t)]^H[\boldsymbol{y}(t)-\boldsymbol{\Phi}\boldsymbol{\mu}(t)] \}=0\end{aligned}

\begin{aligned}E\{||\boldsymbol{\Phi}\boldsymbol{\mu}(t)-\boldsymbol{\Phi}\boldsymbol{x}(t)||_2^2 \}=E\{\boldsymbol{\mu}^H(t)\boldsymbol \Phi^H\boldsymbol{\Phi}\boldsymbol{\mu}(t)-\boldsymbol{\mu}^H(t)\boldsymbol \Phi^H\boldsymbol{\Phi}\boldsymbol{x}(t)-\boldsymbol{x}^H(t)\boldsymbol \Phi^H\boldsymbol{\Phi}\boldsymbol{\mu}(t)+\boldsymbol{x}^H(t)\boldsymbol \Phi^H\boldsymbol{\Phi}\boldsymbol{x}(t)\} \end{aligned}\\\begin{aligned} =-\boldsymbol{\mu}^H(t)\boldsymbol \Phi^H\boldsymbol{\Phi}\boldsymbol{\mu}(t)+E\{Tr(\boldsymbol{x}^H(t)\boldsymbol \Phi^H\boldsymbol{\Phi}\boldsymbol{x}(t))\}=-\boldsymbol{\mu}^H(t)\boldsymbol \Phi^H\boldsymbol{\Phi}\boldsymbol{\mu}(t)+E\{Tr(\boldsymbol{\Phi}\boldsymbol x(t)\boldsymbol{x}^H(t)\boldsymbol{\Phi}^H)\}\end{aligned}\\\begin{aligned} =-\boldsymbol{\mu}^H(t)\boldsymbol \Phi^H\boldsymbol{\Phi}\boldsymbol{\mu}(t)+Tr(\boldsymbol{\Phi}E[\boldsymbol x(t)\boldsymbol{x}^H(t)]\boldsymbol{\Phi}^H)=-\boldsymbol{\mu}^H(t)\boldsymbol \Phi^H\boldsymbol{\Phi}\boldsymbol{\mu}(t)+Tr(\boldsymbol{\Phi}[\boldsymbol \Sigma+\boldsymbol{\mu}(t)\boldsymbol{\mu}^H(t)]\boldsymbol{\Phi}^H)=Tr(\boldsymbol{\Phi\Sigma\Phi}^H)\end{aligned}

该式共分为两部分。首先引入矩阵的迹与Hadamard积之间的转换

Tr\left\{\mathrm{diag}^H(\boldsymbol{u})\boldsymbol{Q}\mathrm{diag}(\boldsymbol{v})\boldsymbol{R}^T\right\}=\boldsymbol{u}^H(\boldsymbol{Q}\odot\boldsymbol{R})\boldsymbol{v}

结合矩阵的迹的循环性,以及复向量二范数的平方的展开:

||\boldsymbol{a}-\boldsymbol{b}||_2^2 = |||\boldsymbol{a}||_2^2-2\Re(\boldsymbol{a}\boldsymbol{b})+|||\boldsymbol{b}||^2

对于第一部分,有:

\|\boldsymbol{y}-\left(\boldsymbol{A}+\boldsymbol{B}\text{diag}(\boldsymbol{\beta})\right)\boldsymbol{\mu}\|_2^2 =\|(\boldsymbol{y}-\boldsymbol{A}\boldsymbol{\mu})-\boldsymbol{B}\text{diag}(\boldsymbol{\mu})\boldsymbol{\beta}\|_2^2\\=\boldsymbol{\beta}^T\text{diag}\{\bar{\boldsymbol{\mu}}\}\boldsymbol{B}^H\boldsymbol{B}\text{diag}\{\boldsymbol{\mu}\}\boldsymbol{\beta}-2\Re\left\{\text{diag}(\overline{\boldsymbol{\mu}})\boldsymbol{B}^H(\boldsymbol{y}-\boldsymbol{A}\boldsymbol{\mu})\right\}^T\boldsymbol{\beta}+~(\boldsymbol{y}-~\boldsymbol{A}\boldsymbol{\mu})^H(\boldsymbol{y}-~\boldsymbol{A}\boldsymbol{\mu}) \\=\boldsymbol{\beta}^{T}(\overline{\boldsymbol{B}^{H}\boldsymbol{B}}\odot\boldsymbol{\mu\mu}^{H})\boldsymbol{\beta}-2\Re\left\{\mathrm{diag}(\overline{\boldsymbol{\mu}})\boldsymbol{B}^H(\boldsymbol{y}-\boldsymbol{A}\boldsymbol{\mu})\right\}^T\boldsymbol{\beta}+C_1

对于第二部分,有:

Tr\{(\boldsymbol{A}+\boldsymbol{B}\text{diag}(\boldsymbol{\beta}))\boldsymbol{\Sigma}(\boldsymbol{A}+\boldsymbol{B}\text{diag}(\boldsymbol{\beta}))^H\}\\~~=2\Re\{Tr\{\boldsymbol{B}^H\boldsymbol{A}\boldsymbol{\Sigma}\text{diag}\boldsymbol{\beta}\}\}+Tr\{\text{diag}\boldsymbol{\beta}\boldsymbol{\Sigma}\text{diag}\boldsymbol{\beta}\boldsymbol{B}^H\boldsymbol{B}\}+C_2 \\~~=2\Re\left\{\operatorname{diag}(\boldsymbol{B}^H\boldsymbol{A}\boldsymbol{\Sigma})\right\}^T\boldsymbol{\beta}+\boldsymbol{\beta}^T(\boldsymbol{\Sigma}\odot\overline{\boldsymbol{B}^H\boldsymbol{B}})\boldsymbol{\beta}+C_2

则有

E\left\{\frac{1}{T}\sum\limits_{t=1}^T\|\boldsymbol{y}(t)-\left(\boldsymbol{A}+\boldsymbol{B}\text{diag}(\boldsymbol{\beta})\right)\boldsymbol{x}(t)\|_2^2\right\} =\boldsymbol{\beta}^T\boldsymbol{P}\boldsymbol{\beta}-2\boldsymbol{v}^T\boldsymbol{\beta}+C

\boldsymbol{P}=\Re\left\{\overline{\boldsymbol{B}^{H}\boldsymbol{B}}\odot(\underline{\boldsymbol{U}}\cdot\underline{\boldsymbol{U}}^{H}+\boldsymbol{\Sigma})\right\}

 \quad\boldsymbol{v}=\Re\left\{\frac{1}{T}\sum_{t=1}^T\text{diag}\left(\overline{\boldsymbol{\mu}(t)}\right)\boldsymbol{B}^H\left(\boldsymbol{y}(t)-\boldsymbol{A}\boldsymbol{\mu}(t)\right)\right\}-~\Re\left\{\mathrm{diag}(\boldsymbol{B}^H\boldsymbol{A}\boldsymbol{\Sigma})\right\}

其中\odot表示Hadamard积,C_1,C_2为不含\boldsymbol{\beta}的量,即与\boldsymbol{\beta}完全无关。

\underline{​{\mathcal{U}}}=\frac{\mathcal{U}}{\sqrt{T}}=\frac{1}{\sqrt{T}}[\boldsymbol{\mu}(1),\cdots,\boldsymbol{\mu}(T)]=\frac{1}{\sqrt{T}}\alpha_{0}\boldsymbol{\mathbf{\Sigma}}\boldsymbol{\Phi}^H\boldsymbol{Y}

且由此可发现,(\overline{\boldsymbol{B}^{H}\boldsymbol{B}}\odot\boldsymbol{\mu\mu}^{H})(\boldsymbol{\Sigma}\odot\overline{\boldsymbol{B}^H\boldsymbol{B}})均为半正定矩阵。

\boldsymbol{\beta}^{new}=\text{arg}\min\limits_{\boldsymbol{\beta}\in\left[-\frac{1}{2}r,\frac{1}{2}r\right]^N}\left\{\boldsymbol{\beta}^T\boldsymbol{P}\boldsymbol{\beta}-2\boldsymbol{v}^T\boldsymbol{\beta}\right\}

为凸,取其偏导数为0点即为其最小点,有:

\frac{\partial}{\partial\boldsymbol{\beta}}\{\boldsymbol{\beta}^T\boldsymbol{P}\boldsymbol{\beta}-2\boldsymbol{v}^T\boldsymbol{\beta}\}=2(\boldsymbol{P\beta}-\boldsymbol{v})=~0\Rightarrow~ \boldsymbol{\beta^{new}}=~\boldsymbol{P}^{-1}\boldsymbol{v}\in~ \left[-\frac{1}{2}r,\frac{1}{2}r\right]^{K}

当且仅当\boldsymbol{P}可逆。

        如果\boldsymbol{\beta}在更新的过程中迭代出了超出其定义范围外的值,或者矩阵出现不可逆的情况时,作者采用对\boldsymbol{\beta}中的元素逐个迭代的方法进行求解。

        首先令:

f(\boldsymbol{\beta}) = \boldsymbol{\beta}^T\boldsymbol{P}\boldsymbol{\beta}-2\boldsymbol{v}^T\boldsymbol{\beta}=\sum _{i=1}^N \sum _{j=1}^N p_{ij}\beta_i\beta_j-2\sum _{i=1}^N v_i\beta_i

表示为与\beta_i相关的形式即为:

f(\beta_i)=2\sum\limits_{j=1,j\neq i}^{N}p_{ij}\beta_i\beta_j^{(k)}+p_{ii}\Big(\beta_i\Big)^2-2v_i\beta_i+C

其中C为与无关的项,上式可视为一二次函数,求其导数零点即可得到:

\check{\beta}_n=\frac{v_n-\left(\boldsymbol{P}_n\right)_{-n}^T\boldsymbol{\beta}_{-n}}{P_{nn}}

将超出范围的\check{\beta}_n重映射至\boldsymbol{\beta}的分布内,即可得到:

\beta_n^{new}=\begin{cases}\check\beta_n,&\text{if}~\check\beta_n\in\left[-\frac{1}{2}r,\frac{1}{2}r\right]\\ -\frac{1}{2}r,&\text{if}~\check\beta_n<-\frac{1}{2}r\\ \frac{1}{2}r,&\text{otherwise}\end{cases}

在算法具体流程中作者还通过奇异值分解将信号进行PCA降维,一定程度上减小了计算量。

网格权重计算

        通过对网格的权重计算,即可根据不同方向来向的信号形成一个等价空域功率谱。

\begin{aligned}\hat{\mathcal{P}}_n=E\{\mathcal{P}_n\}&=\frac{1}{T}E\left\{\|\boldsymbol{\vec{X}}^{n}\|^2\right\}=\frac{1}{T}\left(\|E\{\hat{\mathbf{X}}^{n}\}\|_{2}^{n}+E\left\{\|\hat{\mathbf{X}}^{n}-E\{\hat{\mathbf{X}}^{n}\}\|_{2}\right\}\right)\\ &=\frac{1}{T}\left(\|\boldsymbol{\hat{U}}^{n}\boldsymbol{\mathbf{V}}_{1}^{H}\|^2+Tr\left\{\hat{\Sigma}_{nn}\mathbf{V}_{1}\boldsymbol{V}_{1}^{H}\right\}\right)=\frac{\|\hat{\mathcal{U}\|}_{2}^{2}}{T}+\frac{K\hat{\mathbf{\Sigma}}_{m n}}{T}\end{aligned}

其中\boldsymbol{V}_1为接收信号奇异值分解后信号对应的右特征向量集。

算法总体流程

1、输入:接收信号、信源数、阵列结构、网格分辨率、
2、初始化\boldsymbol{\alpha}\boldsymbol{\beta}\alpha_0\rho
3、更新\boldsymbol{X}的后验分布概率密度函数的均值与方差
4、更新\boldsymbol{\alpha}\boldsymbol{\beta}\alpha_0
\boldsymbol{\alpha}收敛或达到预先设定的迭代次数则计算等价空域功率谱,否则→2
对得到的网格空域谱进行谱峰搜索即为所得角度。

仿真

部分仿真条件如下所示:

角度-30°、-5°、45°
信噪比20dB
扰噪比20dB
快拍数100
先验网格分辨率
阵元数10   

代码如下所示:

clc;
clear;
%% 初始化及设定参数
array_num = 10;%%阵元数
snapshot_num = 100;%%快拍数
source_aoa = [-30,-5,45];%%信源到达角
c = 340;%%波速
f = 1000;%%频率
lambda = c/f;%%波长
d = 0.5*lambda;
source_num = length(source_aoa);%%信源数
sig_nr = [20,20,20];%%信噪比、扰噪比
reso_num = 91;%%网格数
%% 导向矢量
X = zeros(source_num,snapshot_num);
A = exp(-1i*(0:array_num-1)'*2*pi*(d/lambda)*sind(source_aoa));%%阵列响应矩阵
for ik = 1:length(sig_nr)
     X(ik,:) = sqrt(10^(sig_nr(ik)/10))*(randn(1,snapshot_num)+randn(1,snapshot_num)*1i)/sqrt(2);
end
n = (randn(array_num,snapshot_num)+randn(array_num,snapshot_num)*1i)/sqrt(2);
Y = A*X+n;
% [~,~,D_svd] = svd(Y,'econ');
% Y = Y*D_svd(:,1:source_num);%%信号降维
%% OGSBI算法输入量整理
params.Y = Y;
params.reso_num = reso_num;
params.maxiter = 2000;%%最大迭代次数
params.tolerance = 1e-4;%%误差容忍度
params.sigma2 = mean(var(Y))/100;%%噪声方差估计值
res = OGSBI(params);
xp_rec = res.reso_grid;
x_rec = res.mu;
% x_rec = res.mu * D_svd(:,1:size(res.mu,2))';
xpower_rec = mean(abs(x_rec).^2,2) + real(diag(res.Sigma)) * source_num / snapshot_num;
xpower_rec = abs(xpower_rec)/max(xpower_rec);
[xp_rec,x_index] = sort(xp_rec,'ascend');
xpower_rec = xpower_rec(x_index);
figure();
plot(xp_rec,10*log(xpower_rec));xlabel("角度/°");ylabel("归一化功率/dB");
hold on;
semilogy(source_aoa,max(xpower_rec),'bo');
hold off;

function res = OGSBI(params)
%% 函数参数初始化
Y = params.Y;
reso_num = params.reso_num;
reso_grid = linspace(-90,90,reso_num)';
reso = 180/(reso_num-1);
[array_num, snapshot_num] = size(Y);
r = reso*pi/180;
maxiter = params.maxiter;
tol = params.tolerance;
index_b = randperm(length(reso_grid),array_num)';%%该变量主要记录alpha中最大的几个元素的位置,以后续对这几个位置进行一阶泰勒展开
converged = false;%%判断收敛的Boolen
iter = 0;
A = exp(-1i*(0:array_num-1)'*pi*sind(reso_grid'));
B = -1i*pi*(0:array_num-1)'*cosd(reso_grid').*A;
alpha = mean(abs(A'*Y), 2);
beta = zeros(reso_num,1);
c_sigma0_init = 1e-4;
d_sigma0_init = 1e-4;
c_gamma_init = 1;
d_gamma_init = 1e-4;
alpha_0 = 0.01;
alpha_0_seq = zeros(maxiter,1);%%噪声精度变化迭代
Phi = A;
while ~converged
    iter = iter+1;
    Phi(:,index_b) = exp(-1i*(0:array_num-1)'*pi*sind(reso_grid(index_b)'));%%根据上一轮迭代得到的网格点位置,对这几个点进行进一步迭代。
    B(:,index_b) = -1i*pi*(0:array_num-1)'*cosd(reso_grid(index_b)').*A(:,index_b);
    alpha_last = alpha;%%上一次迭代出的alpha的结果
%% 更新X的后验概率密度函数的均值与方差
    C = 1/alpha_0*eye(array_num)+Phi*diag(alpha)*Phi';
    Cinv = inv(C);%%(16)的woodbury反演形式中的逆矩阵的括号内部分
    Sigma = diag(alpha)-diag(alpha)*Phi'*Cinv*Phi*diag(alpha);%%(16)的woodbury矩阵反演公式形式
    mu = alpha_0*Sigma*Phi'*Y;%%(15)
%% 更新alpha
    musq = mean(abs(mu).^2,2);
    alpha = musq+real(diag(Sigma));
    for ik = 1:reso_num
        alpha(ik) = (-snapshot_num+sqrt(snapshot_num^2+4*d_gamma_init*(mu(ik,:)*mu(ik,:)'+snapshot_num*Sigma(ik,ik))))/(2*d_gamma_init);
    end
%% 更新alpha_0
    alpha_0 = (snapshot_num*array_num+c_sigma0_init-1)/(norm(Y-Phi*mu,'fro')^2+snapshot_num*trace(Phi*Sigma*Phi')+d_sigma0_init);%%(18),范数部分详见18至19之间部分
    alpha_0_seq(iter) = alpha_0;
%% 判断是否停止迭代
    if norm(alpha-alpha_last)/norm(alpha_last) < tol || iter >= maxiter
        converged = true;
    end%%收敛或迭代次数达到上限时进入该循环
%% 更新beta
    [beta,index_b] = off_grid_operation(Y,alpha,array_num,mu,Sigma,Phi,B,beta,r);
    reso_grid(index_b) = reso_grid(index_b)+beta(index_b)*180/pi;
end
res.mu = mu;
res.Sigma = Sigma;
res.beta = beta;
res.alpha = alpha;
res.iter = iter;
res.sigma2 = 1/alpha_0;
res.sigma2seq = 1./alpha_0_seq(1:iter);
res.reso_grid = reso_grid';
end

function [beta,index_b] = off_grid_operation(Y,gamma,iter_size,mu,Sigma,Phi,B,beta,r)
    reso_num = size(B,2);
    snapshot_num = size(Y,2);
    [~, index_b] = sort(gamma, 'descend');
    index_b = index_b(1:iter_size);%%选定位置进行一阶泰勒展开
    temp = beta;
    beta = zeros(reso_num,1);
    beta(index_b) = temp(index_b);
    BHB = B'*B;
    P = real(conj(BHB(index_b,index_b)).*(mu(index_b,:)*mu(index_b,:)'+snapshot_num*Sigma(index_b,index_b)));%%(20)
    v = zeros(length(index_b), 1);%%(21)
    for t = 1:snapshot_num
        v = v+real(conj(mu(index_b,t)).*(B(:,index_b)'*(Y(:,t)-Phi*mu(:,t))));
    end
    v = v-snapshot_num*real(diag(B(:,index_b)'*Phi*Sigma(:,index_b)));
    eigP = svd(P);
    eigP = sort(eigP,'descend');
    if eigP(end)/eigP(1) > 1e-5 || any(diag(P) == 0)
        for n = 1:iter_size
            temp_beta = beta(index_b);
            temp_beta(n) = 0;
            beta(index_b(n)) = (v(n)-P(n,:)*temp_beta)/P(n,n);%%(26.1)
            if abs(beta(index_b(n))) > r/2%%(26.2)
                beta(index_b(n)) = r/2*sign(beta(index_b(n)));
            end
            if P(n,n) == 0
                beta(index_b(n)) = 0;
            end
        end
    else
        beta = zeros(reso_num,1);
        beta(index_b) = P\v;
    end  
end

仿真结果如下所示:

角度估计:

 部分参考文献:

Tipping. Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res., vol. 1, pp. 211–244, 2001.

Z. Yang, L. Xie and C. Zhang. Off-Grid Direction of Arrival Estimation Using Sparse Bayesian Inference. IEEE Transactions on Signal Processing, vol. 61, no. 1, pp. 38-43, Jan.1, 2013.

评论 33
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值