前言
在上一篇文章中,我对Cramer-Rao Lower Bound在单个参数以及矢量参数这两种不同的情况下分别进行了推导。对于矢量参数而言,CRLB表述为: C θ ≥ F I M − 1 C_\theta \geq FIM^{-1} Cθ≥FIM−1。在本文中,将对FIM分别在实数矢量以及复数矢量下的形式进行推导。
实高斯矢量参数的FIM
- 假定我们有一组数据集 X = ( x 1 , x 2 , … , x n ) X=(x_1,x_2,\ldots,x_n) X=(x1,x2,…,xn),我们希望通过某种估计方法对 k k k个参数 θ = ( θ 1 , θ 2 , … , θ k ) \theta=(\theta_1, \theta_2, \ldots,\theta_k) θ=(θ1,θ2,…,θk)进行估计。
- 对于数据集中的每一个样本,都是一个实高斯随机变量,那么对于矢量 X X X: X ∼ N ( μ ( θ ) , C ( θ ) ) X\sim N(\mu(\theta),C(\theta)) X∼N(μ(θ),C(θ)),具有概率密度: p ( X ∣ θ ) = 1 ( 2 π ) n 2 [ d e t ( C ( θ ) ) ] 1 2 e x p ( − 1 2 ( X − μ ( θ ) ) T C ( θ ) − 1 ( X − μ ( θ ) ) ) p(X|\theta)=\frac{1}{(2\pi)^{\frac{n}{2}}[det(C(\theta))]^{\frac{1}{2}}}exp(-\frac{1}{2}(X-\mu(\theta))^TC(\theta)^{-1}(X-\mu(\theta))) p(X∣θ)=(2π)2n[det(C(θ))]211exp(−21(X−μ(θ))TC(θ)−1(X−μ(θ)))
出于书写方便清晰,后面的推导我将 μ ( θ ) \mu(\theta) μ(θ)和 C ( θ ) C(\theta) C(θ)简写为 μ \mu μ以及 C C C,请不要忘了他们是参数矢量 θ \theta θ的函数
- 对概率密度取对数后求一阶偏导,我们得到: l o g p ( X ∣ θ ) = − n 2 l o g ( 2 π ) − 1 2 l o g ( d e t ( C ) ) − 1 2 ( X − μ ) T C − 1 ( X − μ ) ∂ l o g p ( X ∣ θ ) ∂ θ i = 0 − 1 2 t r ( C − 1 ∂ C ∂ θ i ) + 1 2 ∂ μ T ∂ θ i C − 1 ( X − μ ) − 1 2 ( X − μ ) T ∂ C − 1 ( X − μ ) ∂ θ i logp(X|\theta)=-\frac{n}{2}log(2\pi)-\frac{1}{2}log(det(C))-\frac{1}{2}(X-\mu)^TC^{-1}(X-\mu) \\ \frac{\partial logp(X|\theta)}{\partial \theta_i}=0-\frac{1}{2}tr\bigl(C^{-1}\frac{\partial C}{\partial \theta_i}\bigr)+\frac{1}{2}\frac{\partial \mu^T}{\partial \theta_i}C^{-1}(X-\mu)-\frac{1}{2}(X-\mu)^T\frac{\partial C^{-1}(X-\mu)}{\partial \theta_i} logp(X∣θ)=−2nlog(2π)−21log(det(C))−21(X−μ)TC−1(X−μ)∂θi∂logp(X∣θ)=0−21tr(C−1∂θi∂C)+21∂θi∂μTC−1(X−μ)−21(X−μ)T∂θi∂C−1(X−μ)
这里我们用到了矩阵的行列式求导,下面对其进行推导:
- ∂ ∣ C ∣ ∂ θ = ∑ j ∑ i ∂ ∣ C ∣ ∂ c i j ∂ c i j ∂ θ \frac{\partial |C|}{\partial \theta}=\sum_{j}\sum_i\frac{\partial |C|}{\partial c_{ij}}\frac{\partial c_{ij}}{\partial \theta} ∂θ∂∣C∣=j∑i∑∂cij∂∣C∣∂θ∂cij
- 由于 ∣ C ∣ = ∑ i = 1 n c i j M i j |C|=\sum_{i=1}^nc_{ij}M_{ij} ∣C∣=i=1∑ncijMij其中 M i j M_{ij} Mij为代数余子式,故 ∂ ∣ C ∣ ∂ c i j = M i j \frac{\partial |C|}{\partial c_{ij}}=M_{ij} ∂cij∂∣C∣=Mij
- 因此 ∂ ∣ C ∣ ∂ θ i = ∣ C ∣ ∑ j ∑ i M i j ∣ C ∣ ∂ c i j ∂ θ \frac{\partial |C|}{\partial \theta_i}=|C|\sum_{j}\sum_i\frac{M_{ij}}{|C|}\frac{\partial c_{ij}}{\partial \theta} ∂θi∂∣C∣=∣C∣j∑i∑∣C∣Mij∂θ∂cij
- 由逆矩阵的知识我们知道: M i j ∣ C ∣ = C j i − 1 \frac{M_{ij}}{|C|}=C^{-1}_{ji} ∣C∣Mij=Cji−1,而 ∂ c i j ∂ θ \frac{\partial c_{ij}}{\partial \theta} ∂θ∂cij为 ∂ C ∂ θ \frac{\partial C}{\partial \theta} ∂θ∂C的第 ( i , j ) (i,j) (i,j)个元素,因此第一个对于 i i i的求和相当于是计算矩阵乘法的某一项,即: ∂ ∣ C ∣ ∂ θ i = ∣ C ∣ ∑ j ( C − 1 ∂ C ∂ θ ) j j \frac{\partial |C|}{\partial \theta_i}=|C|\sum_j(C^{-1}\frac{\partial C}{\partial \theta})_{jj} ∂θi∂∣C∣=∣C∣j∑(C−1∂θ∂C)jj
- 对于第二个对 j j j求和,可以看成是求该矩阵的迹,于是 ∂ ∣ C ∣ ∂ θ = ∣ C ∣ t r ( C − 1 ∂ C ∂ θ ) \frac{\partial |C|}{\partial \theta}=|C|tr(C^{-1}\frac{\partial C}{\partial \theta}) ∂θ∂∣C∣=∣C∣tr(C−1∂θ∂C)
- 对于第二个等式的最后一项,我们有: ∂ C − 1 ( X − μ ) ∂ θ i = − C − 1 ∂ C ∂ θ i C − 1 ( X − μ ) − C − 1 ∂ μ ∂ θ i \frac{\partial C^{-1}(X-\mu)}{\partial \theta_i}=-C^{-1}\frac{\partial C}{\partial \theta_i}C^{-1}(X-\mu)-C^{-1}\frac{\partial \mu}{\partial \theta_i} ∂θi∂C−1(X−μ)=−C−1∂θi∂CC−1(X−μ)−C−1∂θi∂μ
- 于是我们得到: ∂ l o g p ( X ∣ θ ) ∂ θ i = − 1 2 t r ( C − 1 ∂ C ∂ θ i ) + 1 2 ∂ μ T ∂ θ i C − 1 ( X − μ ) + 1 2 ( X − μ ) T C − 1 ∂ C ∂ θ i C − 1 ( X − μ ) + 1 2 ( X − μ ) T C − 1 ∂ μ ∂ θ i \frac{\partial logp(X|\theta)}{\partial \theta_i}=-\frac{1}{2}tr\bigl(C^{-1}\frac{\partial C}{\partial \theta_i}\bigr)+\frac{1}{2}\frac{\partial \mu^T}{\partial \theta_i}C^{-1}(X-\mu)+\frac{1}{2}(X-\mu)^TC^{-1}\frac{\partial C}{\partial \theta_i}C^{-1}(X-\mu)+\frac{1}{2}(X-\mu)^TC^{-1}\frac{\partial \mu}{\partial \theta_i} ∂θi∂logp(X∣θ)=−21tr(C−1∂θi∂C)+21∂θi∂μTC−1(X−μ)+21(X−μ)TC−1∂θi∂CC−1(X−μ)+21(X−μ)TC−1∂θi∂μ
这里我们用到了逆矩阵的导数,下面对其进行推导:
- 由于 C C − 1 = I CC^{-1}=I CC−1=I,两边同时求导,得到: ∂ C ∂ θ C − 1 + C ∂ C − 1 ∂ θ = 0 \frac{\partial C}{\partial \theta}C^{-1}+C\frac{\partial C^{-1}}{\partial \theta}=0 ∂θ∂CC−1+C∂θ∂C−1=0
- 于是很轻易的我们就得到: ∂ C − 1 ∂ θ = − C − 1 ∂ C ∂ θ C − 1 \frac{\partial C^{-1}}{\partial \theta}=-C^{-1}\frac{\partial C}{\partial \theta}C^{-1} ∂θ∂C−1=−C−1∂θ∂CC−1
- 注意到: 1 2 ( X − μ ) T C − 1 ∂ μ ∂ θ i = 1 2 ∂ μ T ∂ θ i C − 1 ( X − μ ) \frac{1}{2}(X-\mu)^TC^{-1}\frac{\partial \mu}{\partial \theta_i}=\frac{1}{2}\frac{\partial \mu^T}{\partial \theta_i}C^{-1}(X-\mu) 21(X−μ)TC−1∂θi∂μ=21∂θi∂μTC−1(X−μ)
- 于是偏导数可以简化为: ∂ l o g p ( X ∣ θ ) ∂ θ i = − 1 2 t r ( C − 1 ∂ C ∂ θ i ) + ∂ μ T ∂ θ i C − 1 ( X − μ ) + 1 2 ( X − μ ) T C − 1 ∂ C ∂ θ i C − 1 ( X − μ ) \frac{\partial logp(X|\theta)}{\partial \theta_i}=-\frac{1}{2}tr\bigl(C^{-1}\frac{\partial C}{\partial \theta_i}\bigr)+\frac{\partial \mu^T}{\partial \theta_i}C^{-1}(X-\mu)+\frac{1}{2}(X-\mu)^TC^{-1}\frac{\partial C}{\partial \theta_i}C^{-1}(X-\mu) ∂θi∂logp(X∣θ)=−21tr(C−1∂θi∂C)+∂θi∂μTC−1(X−μ)+21(X−μ)TC−1∂θi∂CC−1(X−μ)
- 由Fisher Information Matrix的定义,对于其第 ( i , j ) (i,j) (i,j)个元素,其表达式为: F i j = E [ ∂ l o g p ( X ∣ θ ) ∂ θ i ∂ l o g p ( X ∣ θ ) ∂ θ j ] = E { [ − 1 2 t r ( C − 1 ∂ C ∂ θ i ) + ∂ μ T ∂ θ i C − 1 ( X − μ ) + 1 2 ( X − μ ) T C − 1 ∂ C ∂ θ i C − 1 ( X − μ ) ] ⋅ [ − 1 2 t r ( C − 1 ∂ C ∂ θ j ) + ∂ μ T ∂ θ j C − 1 ( X − μ ) + 1 2 ( X − μ ) T C − 1 ∂ C ∂ θ j C − 1 ( X − μ ) ] } \begin{aligned} F_{ij}&=E[\frac{\partial logp(X|\theta)}{\partial \theta_i}\frac{\partial logp(X|\theta)}{\partial \theta_j}] \\ &=E \left\{\Big [-\frac{1}{2}tr\bigl(C^{-1}\frac{\partial C}{\partial \theta_i}\bigr)+\frac{\partial \mu^T}{\partial \theta_i}C^{-1}(X-\mu)+\frac{1}{2}(X-\mu)^TC^{-1}\frac{\partial C}{\partial \theta_i}C^{-1}(X-\mu)\Big] \cdot \Big [-\frac{1}{2}tr\bigl(C^{-1}\frac{\partial C}{\partial \theta_j}\bigr)+\frac{\partial \mu^T}{\partial \theta_j}C^{-1}(X-\mu)+\frac{1}{2}(X-\mu)^TC^{-1}\frac{\partial C}{\partial \theta_j}C^{-1}(X-\mu)\Big]\Bigg\}\right. \end{aligned} Fij=E[∂θi∂logp(X∣θ)∂θj∂logp(X∣θ)]=E{[−21tr(C−1∂θi∂C)+∂θi∂μTC−1(X−μ)+21(X−μ)TC−1∂θi∂CC−1(X−μ)]⋅[−21tr(C−1∂θj∂C)+∂θj∂μTC−1(X−μ)+21(X−μ)TC−1∂θj∂CC−1(X−μ)]}
这个式子乘出来有9项,是不是感觉很吓人?别怕,It just some notations。让我们来一项一项看看(顺序为乘法左侧的第一项分别与右侧的三项轮流,然后是左侧第二项,以此类推)。为了方便推导,我们记 y = X − μ y=X-\mu y=X−μ。
- 第一项是 E [ − 1 2 t r ( C − 1 ∂ C ∂ θ i ) × − 1 2 t r ( C − 1 ∂ C ∂ θ j ) ] E[-\frac{1}{2}tr\bigl(C^{-1}\frac{\partial C}{\partial \theta_i}\bigr) \times -\frac{1}{2}tr\bigl(C^{-1}\frac{\partial C}{\partial \theta_j}\bigr)] E[−21tr(C−1∂θi∂C)×−21tr(C−1∂θj∂C)],常数项,没啥好说的
- 第二项是 E [ − 1 2 t r ( C − 1 ∂ C ∂ θ i ) × ∂ μ T ∂ θ j C − 1 y ] E[-\frac{1}{2}tr\bigl(C^{-1}\frac{\partial C}{\partial \theta_i}\bigr) \times \frac{\partial \mu^T}{\partial \theta_j}C^{-1}y] E[−21tr(C−1∂θi∂C)×∂θj∂μTC−1y],由于此项为 y y y的一阶矩,故为0
- 第三项是 E [ − 1 2 t r ( C − 1 ∂ C ∂ θ i ) × 1 2 y T C − 1 ∂ C ∂ θ j C − 1 y ] E[-\frac{1}{2}tr\bigl(C^{-1}\frac{\partial C}{\partial \theta_i}\bigr)\times\frac{1}{2}y^TC^{-1}\frac{\partial C}{\partial \theta_j}C^{-1}y] E[−21tr(C−1∂θi∂C)×21yTC−1∂θj∂CC−1y],这一项稍微复杂一些。我们知道 E ( Y T X ) = t r ( E ( X Y T ) ) E(Y^TX)=tr(E(XY^T)) E(YTX)=tr(E(XYT)),因此该项可以变换为 − 1 4 t r ( C − 1 ∂ C ∂ θ i ) t r ( C − 1 ∂ C ∂ θ j C − 1 E [ y y T ] ) = − 1 4 t r ( C − 1 ∂ C ∂ θ i ) t r ( C − 1 ∂ C ∂ θ j ) -\frac{1}{4}tr\bigl(C^{-1}\frac{\partial C}{\partial \theta_i}\bigr)tr(C^{-1}\frac{\partial C}{\partial \theta_j}C^{-1}E[yy^T])=-\frac{1}{4}tr\bigl(C^{-1}\frac{\partial C}{\partial \theta_i}\bigr)tr(C^{-1}\frac{\partial C}{\partial \theta_j}) −41tr(C−1∂θi∂C)tr(C−1∂θj∂CC−1E[yyT])=−41tr(C−1∂θi∂C)tr(C−1∂θj∂C)
- 第四项与第二项一样为 y y y的一阶矩,故为0
- 第五项为: E [ ∂ μ T ∂ θ i C − 1 y ∂ μ T ∂ θ j C − 1 y ] = E [ ∂ μ T ∂ θ i C − 1 y [ ∂ μ T ∂ θ j C − 1 y ] T ] = ∂ μ T ∂ θ i C − 1 E [ y y T ] C − 1 ∂ μ ∂ θ j = ∂ μ T ∂ θ i C − 1 ∂ μ ∂ θ j \begin{aligned} E\Big[\frac{\partial \mu^T}{\partial \theta_i}C^{-1}y\frac{\partial \mu^T}{\partial \theta_j}C^{-1}y\Big] &= E\Big[\frac{\partial \mu^T}{\partial \theta_i}C^{-1}y[\frac{\partial \mu^T}{\partial \theta_j}C^{-1}y]^T\Big] \\ &= \frac{\partial \mu^T}{\partial \theta_i}C^{-1}E\Big[yy^T\Big]C^{-1}\frac{\partial \mu}{\partial \theta_j} \\ &= \frac{\partial \mu^T}{\partial \theta_i}C^{-1}\frac{\partial \mu}{\partial \theta_j} \end{aligned} E[∂θi∂μTC−1y∂θj∂μTC−1y]=E[∂θi∂μTC−1y[∂θj∂μTC−1y]T]=∂θi∂μTC−1E[yyT]C−1∂θj∂μ=∂θi∂μTC−1∂θj∂μ
- 第六项为 y y y的三阶矩,故也为0
- 第七项与第三项一模一样,为 − 1 4 t r ( C − 1 ∂ C ∂ θ i ) t r ( C − 1 ∂ C ∂ θ j ) -\frac{1}{4}tr\bigl(C^{-1}\frac{\partial C}{\partial \theta_i}\bigr)tr(C^{-1}\frac{\partial C}{\partial \theta_j}) −41tr(C−1∂θi∂C)tr(C−1∂θj∂C)
- 第八项为 y y y的三阶矩,故也为0
- 第九项为 1 4 E [ y T C − 1 ∂ C ∂ θ i C − 1 y y T C − 1 ∂ C ∂ θ j C − 1 y ] \frac{1}{4}E\Big[y^TC^{-1}\frac{\partial C}{\partial \theta_i}C^{-1}yy^TC^{-1}\frac{\partial C}{\partial \theta_j}C^{-1}y\Big] 41E[yTC−1∂θi∂CC−1yyTC−1∂θj∂CC−1y]。这里需要使用一个引理,这里不加证明的给出: E [ y T A y y T B y ] = t r ( A C ) t r ( B C ) + 2 ⋅ t r ( A C B C ) E[y^TAyy^TBy]=tr(AC)tr(BC)+2\cdot tr(ACBC) E[yTAyyTBy]=tr(AC)tr(BC)+2⋅tr(ACBC),要求A与B为对称矩阵。通过这个引理,我们将第九项变换为: 1 4 [ t r ( C − 1 ∂ C ∂ θ i C − 1 C ) t r ( C − 1 ∂ C ∂ θ j C − 1 C ) + t r ( C − 1 ∂ C ∂ θ i C − 1 C C − 1 ∂ C ∂ θ j C − 1 C ) ] = 1 4 [ t r ( C − 1 ∂ C ∂ θ i ) t r ( C − 1 ∂ C ∂ θ j ) + 2 ⋅ t r ( C − 1 ∂ C ∂ θ i C − 1 ∂ C ∂ θ j ) ] \frac{1}{4}\Big[tr(C^{-1}\frac{\partial C}{\partial \theta_i}C^{-1}C)tr(C^{-1}\frac{\partial C}{\partial \theta_j}C^{-1}C)+ tr(C^{-1}\frac{\partial C}{\partial \theta_i}C^{-1}CC^{-1}\frac{\partial C}{\partial \theta_j}C^{-1}C)\Big]=\frac{1}{4}\Big[ tr(C^{-1}\frac{\partial C}{\partial \theta_i})tr(C^{-1}\frac{\partial C}{\partial \theta_j}) + 2\cdot tr(C^{-1}\frac{\partial C}{\partial \theta_i}C^{-1}\frac{\partial C}{\partial \theta_j})\Big] 41[tr(C−1∂θi∂CC−1C)tr(C−1∂θj∂CC−1C)+tr(C−1∂θi∂CC−1CC−1∂θj∂CC−1C)]=41[tr(C−1∂θi∂C)tr(C−1∂θj∂C)+2⋅tr(C−1∂θi∂CC−1∂θj∂C)]
- 于是我们最终得到: F i j = 1 4 t r ( C − 1 ∂ C ∂ θ i ) t r ( C − 1 ∂ C ∂ θ j ) − 1 4 t r ( C − 1 ∂ C ∂ θ i ) t r ( C − 1 ∂ C ∂ θ j ) + ∂ μ T ∂ θ i C − 1 ∂ μ ∂ θ j − 1 4 t r ( C − 1 ∂ C ∂ θ i ) t r ( C − 1 ∂ C ∂ θ j ) + 1 4 [ t r ( C − 1 ∂ C ∂ θ i ) t r ( C − 1 ∂ C ∂ θ j ) + 2 × t r ( C − 1 ∂ C ∂ θ i C − 1 ∂ C ∂ θ j ) ] = ∂ μ T ∂ θ i C − 1 ∂ μ ∂ θ j + 1 2 ⋅ t r ( C − 1 ∂ C ∂ θ i C − 1 ∂ C ∂ θ j ) \begin{aligned} F_{ij} &= \frac{1}{4}tr\bigl(C^{-1}\frac{\partial C}{\partial \theta_i}\bigr)tr(C^{-1}\frac{\partial C}{\partial \theta_j})-\frac{1}{4}tr\bigl(C^{-1}\frac{\partial C}{\partial \theta_i}\bigr)tr(C^{-1}\frac{\partial C}{\partial \theta_j})+\frac{\partial \mu^T}{\partial \theta_i}C^{-1}\frac{\partial \mu}{\partial \theta_j}-\frac{1}{4}tr\bigl(C^{-1}\frac{\partial C}{\partial \theta_i}\bigr)tr(C^{-1}\frac{\partial C}{\partial \theta_j})+\frac{1}{4}\Big[ tr(C^{-1}\frac{\partial C}{\partial \theta_i})tr(C^{-1}\frac{\partial C}{\partial \theta_j}) + 2\times tr(C^{-1}\frac{\partial C}{\partial \theta_i}C^{-1}\frac{\partial C}{\partial \theta_j})\Big] \\ &= \frac{\partial \mu^T}{\partial \theta_i}C^{-1}\frac{\partial \mu}{\partial \theta_j} + \frac{1}{2}\cdot tr(C^{-1}\frac{\partial C}{\partial \theta_i}C^{-1}\frac{\partial C}{\partial \theta_j})\end{aligned} Fij=41tr(C−1∂θi∂C)tr(C−1∂θj∂C)−41tr(C−1∂θi∂C)tr(C−1∂θj∂C)+∂θi∂μTC−1∂θj∂μ−41tr(C−1∂θi∂C)tr(C−1∂θj∂C)+41[tr(C−1∂θi∂C)tr(C−1∂θj∂C)+2×tr(C−1∂θi∂CC−1∂θj∂C)]=∂θi∂μTC−1∂θj∂μ+21⋅tr(C−1∂θi∂CC−1∂θj∂C)
复高斯矢量参数的FIM
- 假定我们有一组数据集 X ~ = ( x ~ 1 , x ~ 2 , … , x ~ n ) \widetilde{X}=(\widetilde{x}_1,\widetilde{x}_2,\ldots,\widetilde{x}_n) X =(x 1,x 2,…,x n),我们希望通过某种估计方法对 k k k个参数 θ = ( θ 1 , θ 2 , … , θ k ) \theta=(\theta_1, \theta_2, \ldots,\theta_k) θ=(θ1,θ2,…,θk)进行估计。
对于这 k k k个参数,既可以是复数参数也可以是实数参数。我们都知道,对于复数可以以实部和虚部进行表示,因此一个复数参数可以写成两个实数参数。因此为了不产生混淆,后续所有的 θ \theta θ均为实数参数矢量。
- 对于数据集中的每一个样本,都是一个复高斯随机变量,那么对于矢量 X ~ \widetilde{X} X : X ~ ∼ N ( μ ~ ( θ ) , C ~ ( θ ) ) \widetilde{X}\sim N(\widetilde{\mu}(\theta),\widetilde{C}(\theta)) X ∼N(μ (θ),C (θ)),具有概率密度: p ( X ~ ∣ θ ) = 1 π n d e t C ~ ( θ ) e x p ( − ( X ~ − μ ~ ( θ ) ) H C ~ − 1 ( θ ) ( X ~ − μ ~ ( θ ) ) ) p(\widetilde{X}|\theta)=\frac{1}{\pi^n det\widetilde{C}(\theta)}exp\Big(-(\widetilde{X}-\widetilde{\mu}(\theta))^H\widetilde{C}^{-1}(\theta)(\widetilde{X}-\widetilde{\mu}(\theta))\Big) p(X ∣θ)=πndetC (θ)1exp(−(X −μ (θ))HC −1(θ)(X −μ (θ)))
出于书写方便清晰,后面的推导我将 μ ~ ( θ ) \widetilde{\mu}(\theta) μ (θ)和 C ~ ( θ ) \widetilde{C}(\theta) C (θ)简写为 μ ~ \widetilde{\mu} μ 以及 C ~ \widetilde{C} C ,请不要忘了他们是参数矢量 θ \theta θ的函数
- 对概率密度取对数,我们得到: l o g p ( X ~ ∣ θ ) = − n l o g ( π ) − l o g ( d e t ( C ~ ) ) − ( X ~ − μ ~ ) H C ~ − 1 ( X ~ − μ ~ ) logp(\widetilde{X}|\theta)=-nlog(\pi)-log(det(\widetilde{C}))-(\widetilde{X}-\widetilde{\mu})^H\widetilde{C}^{-1}(\widetilde{X}-\widetilde{\mu}) logp(X ∣θ)=−nlog(π)−log(det(C ))−(X −μ )HC −1(X −μ )
- 进一步求一阶偏导,我们得到: ∂ l o g p ( X ~ ∣ θ ) ∂ θ i = − t r ( C ~ − 1 ∂ C ~ ∂ θ i ) + ∂ μ ~ H ∂ θ i C ~ − 1 ( X ~ − μ ~ ) − ( X ~ − μ ~ ) H ∂ C ~ − 1 ( X ~ − μ ~ ) ∂ θ i \frac{\partial logp(\widetilde{X}|\theta)}{\partial \theta_i}=-tr\Big(\widetilde{C}^{-1}\frac{\partial \widetilde{C}}{\partial \theta_i}\Big)+\frac{\partial \widetilde{\mu}^H}{\partial \theta_i}\widetilde{C}^{-1}(\widetilde{X}-\widetilde{\mu})-(\widetilde{X}-\widetilde{\mu})^H\frac{\partial \widetilde{C}^{-1}(\widetilde{X}-\widetilde{\mu})}{\partial \theta_i} ∂θi∂logp(X ∣θ)=−tr(C −1∂θi∂C )+∂θi∂μ HC −1(X −μ )−(X −μ )H∂θi∂C −1(X −μ )
- 对于上式的最后一项,我们进一步拆开: ∂ C ~ − 1 ( X ~ − μ ~ ) ∂ θ i = − C ~ − 1 ∂ C ~ ∂ θ i C ~ − 1 ( X ~ − μ ~ ) − C ~ − 1 ∂ μ ~ ∂ θ i \frac{\partial \widetilde{C}^{-1}(\widetilde{X}-\widetilde{\mu})}{\partial \theta_i}=-\widetilde{C}^{-1}\frac{\partial \widetilde{C}}{\partial \theta_i}\widetilde{C}^{-1}(\widetilde{X}-\widetilde{\mu})-\widetilde{C}^{-1}\frac{\partial \widetilde{\mu}}{\partial\theta_i} ∂θi∂C −1(X −μ )=−C −1∂θi∂C C −1(X −μ )−C −1∂θi∂μ
- 最终我们得到如下结果: ∂ l o g p ( X ~ ∣ θ ) ∂ θ i = − t r ( C ~ − 1 ∂ C ~ ∂ θ i ) + ∂ μ ~ H ∂ θ i C ~ − 1 ( X ~ − μ ~ ) + ( X ~ − μ ~ ) H C ~ − 1 ∂ μ ~ ∂ θ i + ( X ~ − μ ~ ) H C ~ − 1 ∂ C ~ ∂ θ i C ~ − 1 ( X ~ − μ ~ ) \frac{\partial logp(\widetilde{X}|\theta)}{\partial \theta_i}=-tr\Big(\widetilde{C}^{-1}\frac{\partial \widetilde{C}}{\partial \theta_i}\Big)+\frac{\partial \widetilde{\mu}^H}{\partial \theta_i}\widetilde{C}^{-1}(\widetilde{X}-\widetilde{\mu})+(\widetilde{X}-\widetilde{\mu})^H\widetilde{C}^{-1}\frac{\partial \widetilde{\mu}}{\partial\theta_i}+(\widetilde{X}-\widetilde{\mu})^H\widetilde{C}^{-1}\frac{\partial \widetilde{C}}{\partial \theta_i}\widetilde{C}^{-1}(\widetilde{X}-\widetilde{\mu}) ∂θi∂logp(X ∣θ)=−tr(C −1∂θi∂C )+∂θi∂μ HC −1(X −μ )+(X −μ )HC −1∂θi∂μ +(X −μ )HC −1∂θi∂C C −1(X −μ )
在推导一阶偏导的时候,我们同样用到了矩阵行列式的导数以及矩阵的逆的导数,在实高斯部分已经讲过了,这里就不再赘述。
- 由Fisher Information Matrix的定义,对于其第 ( i , j ) (i,j) (i,j)个元素,其表达式为: F i j = E [ ∂ l o g p ( X ~ ∣ θ ) ∂ θ i ∂ l o g p ( X ~ ∣ θ ) ∂ θ j ] = E { [ − t r ( C ~ − 1 ∂ C ~ ∂ θ i ) + ∂ μ ~ H ∂ θ i C ~ − 1 ( X ~ − μ ~ ) + ( X ~ − μ ~ ) H C ~ − 1 ∂ μ ~ ∂ θ i + ( X ~ − μ ~ ) H C ~ − 1 ∂ C ~ ∂ θ i C ~ − 1 ( X ~ − μ ~ ) ] ⋅ [ − t r ( C ~ − 1 ∂ C ~ ∂ θ j ) + ∂ μ ~ H ∂ θ j C ~ − 1 ( X ~ − μ ~ ) + ( X ~ − μ ~ ) H C ~ − 1 ∂ μ ~ ∂ θ j + ( X ~ − μ ~ ) H C ~ − 1 ∂ C ~ ∂ θ j C ~ − 1 ( X ~ − μ ~ ) ] } \begin{aligned} F_{ij}&=E\Big[\frac{\partial logp(\widetilde{X}|\theta)}{\partial \theta_i}\frac{\partial logp(\widetilde{X}|\theta)}{\partial \theta_j}\Big] \\ &= E\left\{ \Bigg[ -tr\Big(\widetilde{C}^{-1}\frac{\partial \widetilde{C}}{\partial \theta_i}\Big)+\frac{\partial \widetilde{\mu}^H}{\partial \theta_i}\widetilde{C}^{-1}(\widetilde{X}-\widetilde{\mu})+(\widetilde{X}-\widetilde{\mu})^H\widetilde{C}^{-1}\frac{\partial \widetilde{\mu}}{\partial\theta_i}+(\widetilde{X}-\widetilde{\mu})^H\widetilde{C}^{-1}\frac{\partial \widetilde{C}}{\partial \theta_i}\widetilde{C}^{-1}(\widetilde{X}-\widetilde{\mu})\Bigg] \cdot \Bigg[ -tr\Big(\widetilde{C}^{-1}\frac{\partial \widetilde{C}}{\partial \theta_j}\Big)+\frac{\partial \widetilde{\mu}^H}{\partial \theta_j}\widetilde{C}^{-1}(\widetilde{X}-\widetilde{\mu})+(\widetilde{X}-\widetilde{\mu})^H\widetilde{C}^{-1}\frac{\partial \widetilde{\mu}}{\partial\theta_j}+(\widetilde{X}-\widetilde{\mu})^H\widetilde{C}^{-1}\frac{\partial \widetilde{C}}{\partial \theta_j}\widetilde{C}^{-1}(\widetilde{X}-\widetilde{\mu})\Bigg] \Bigg\}\right. \end{aligned} Fij=E[∂θi∂logp(X ∣θ)∂θj∂logp(X ∣θ)]=E{[−tr(C −1∂θi∂C )+∂θi∂μ HC −1(X −μ )+(X −μ )HC −1∂θi∂μ +(X −μ )HC −1∂θi∂C C −1(X −μ )]⋅[−tr(C −1∂θj∂C )+∂θj∂μ HC −1(X −μ )+(X −μ )HC −1∂θj∂μ +(X −μ )HC −1∂θj∂C C −1(X −μ )]}
好家伙,乘出来足足16项!不过在实高斯部分我们已经看到了,是有很多部分是0,可以直接消去的。在这里我们记 y ~ = X ~ − μ ~ \widetilde{y}=\widetilde{X}-\widetilde{\mu} y =X −μ ,这里我们不再对每一项进行推导,仅给出在简化过程中会用到的一些点。
- 我们知道,关于 y ~ \widetilde{y} y 的一阶矩和3阶矩均为0。
- 同样的,我们需要使用 E [ y H y ] = t r ( E [ y y H ] ) E[y^Hy]=tr(E[yy^H]) E[yHy]=tr(E[yyH])
- 在实高斯情况下第九项中用到的引理在这里还需要再次使用,只不过形式稍有改变: E [ y H A y y H B y ] = t r ( A C ) t r ( B C ) + t r ( A C B C ) E[y^HAyy^HBy]=tr(AC)tr(BC)+tr(ACBC) E[yHAyyHBy]=tr(AC)tr(BC)+tr(ACBC),同样要求A与B为对称矩阵。
- 还有一个与实高斯情况下有所不同,那就是这里我们的 C ~ = E [ y ~ y ~ H ] \widetilde{C}=E[\widetilde{y}\widetilde{y}^H] C =E[y y H],而 E [ y ~ y ~ T ] = 0 E[\widetilde{y}\widetilde{y}^T]=0 E[y y T]=0,并以此可以推出 E [ y ~ ∗ y ~ H ] = ( E [ y ~ y ~ T ] ) ∗ = 0 E[\widetilde{y}^*\widetilde{y}^H]=(E[\widetilde{y}\widetilde{y}^T])^*=0 E[y ∗y H]=(E[y y T])∗=0。
- 在这里我不道德的要求各位自己对这16项进行简化,我仅给出简化后的结果: F i j = t r ( C ~ − 1 ∂ C ~ ∂ θ i ) t r ( C ~ − 1 ∂ C ~ ∂ θ i ) − t r ( C ~ − 1 ∂ C ~ ∂ θ i ) t r ( C ~ − 1 ∂ C ~ ∂ θ i ) − t r ( C ~ − 1 ∂ C ~ ∂ θ i ) t r ( C ~ − 1 ∂ C ~ ∂ θ i ) + ∂ μ ~ H ∂ θ i C ~ − 1 ∂ μ ~ ∂ θ j + ∂ μ ~ H ∂ θ j C ~ − 1 ∂ μ ~ ∂ θ i + t r ( C ~ − 1 ∂ C ~ ∂ θ i ) t r ( C ~ − 1 ∂ C ~ ∂ θ i ) + t r ( C ~ − 1 ∂ C ~ ∂ θ i C ~ − 1 ∂ C ~ ∂ θ j ) F_{ij}=tr\Big(\widetilde{C}^{-1}\frac{\partial \widetilde{C}}{\partial \theta_i}\Big)tr\Big(\widetilde{C}^{-1}\frac{\partial \widetilde{C}}{\partial \theta_i}\Big)-tr\Big(\widetilde{C}^{-1}\frac{\partial \widetilde{C}}{\partial \theta_i}\Big)tr\Big(\widetilde{C}^{-1}\frac{\partial \widetilde{C}}{\partial \theta_i}\Big)\\-tr\Big(\widetilde{C}^{-1}\frac{\partial \widetilde{C}}{\partial \theta_i}\Big)tr\Big(\widetilde{C}^{-1}\frac{\partial \widetilde{C}}{\partial \theta_i}\Big) + \frac{\partial \widetilde{\mu}^H}{\partial \theta_i}\widetilde{C}^{-1}\frac{\partial \widetilde{\mu}}{\partial \theta_j}+\frac{\partial \widetilde{\mu}^H}{\partial \theta_j}\widetilde{C}^{-1}\frac{\partial \widetilde{\mu}}{\partial \theta_i}\\+tr\Big(\widetilde{C}^{-1}\frac{\partial \widetilde{C}}{\partial \theta_i}\Big)tr\Big(\widetilde{C}^{-1}\frac{\partial \widetilde{C}}{\partial \theta_i}\Big) + tr\Big(\widetilde{C}^{-1}\frac{\partial \widetilde{C}}{\partial \theta_i}\widetilde{C}^{-1}\frac{\partial \widetilde{C}}{\partial \theta_j}\Big) Fij=tr(C −1∂θi∂C )tr(C −1∂θi∂C )−tr(C −1∂θi∂C )tr(C −1∂θi∂C )−tr(C −1∂θi∂C )tr(C −1∂θi∂C )+∂θi∂μ HC −1∂θj∂μ +∂θj∂μ HC −1∂θi∂μ +tr(C −1∂θi∂C )tr(C −1∂θi∂C )+tr(C −1∂θi∂C C −1∂θj∂C )
- 注意到: ∂ μ ~ H ∂ θ i C ~ − 1 ∂ μ ~ ∂ θ j = ( ∂ μ ~ H ∂ θ j C ~ − 1 ∂ μ ~ ∂ θ i ) H \frac{\partial \widetilde{\mu}^H}{\partial \theta_i}\widetilde{C}^{-1}\frac{\partial \widetilde{\mu}}{\partial \theta_j} = \Big(\frac{\partial \widetilde{\mu}^H}{\partial \theta_j}\widetilde{C}^{-1}\frac{\partial \widetilde{\mu}}{\partial \theta_i}\Big)^H ∂θi∂μ HC −1∂θj∂μ =(∂θj∂μ HC −1∂θi∂μ )H 即两项互为共轭,因此有: ∂ μ ~ H ∂ θ i C ~ − 1 ∂ μ ~ ∂ θ j + ∂ μ ~ H ∂ θ j C ~ − 1 ∂ μ ~ ∂ θ i = 2 R e { ∂ μ ~ H ∂ θ i C ~ − 1 ∂ μ ~ ∂ θ j } \frac{\partial \widetilde{\mu}^H}{\partial \theta_i}\widetilde{C}^{-1}\frac{\partial \widetilde{\mu}}{\partial \theta_j}+\frac{\partial \widetilde{\mu}^H}{\partial \theta_j}\widetilde{C}^{-1}\frac{\partial \widetilde{\mu}}{\partial \theta_i}=2Re\left\{ \frac{\partial \widetilde{\mu}^H}{\partial \theta_i}\widetilde{C}^{-1}\frac{\partial \widetilde{\mu}}{\partial \theta_j}\Bigg\}\right. ∂θi∂μ HC −1∂θj∂μ +∂θj∂μ HC −1∂θi∂μ =2Re{∂θi∂μ HC −1∂θj∂μ }
- 最终我们得到了: F i j = t r ( C ~ − 1 ∂ C ~ ∂ θ i C ~ − 1 ∂ C ~ ∂ θ j ) + 2 R e { ∂ μ ~ H ∂ θ i C ~ − 1 ∂ μ ~ ∂ θ j } F_{ij}=tr\Big(\widetilde{C}^{-1}\frac{\partial \widetilde{C}}{\partial \theta_i}\widetilde{C}^{-1}\frac{\partial \widetilde{C}}{\partial \theta_j}\Big)+2Re\left\{ \frac{\partial \widetilde{\mu}^H}{\partial \theta_i}\widetilde{C}^{-1}\frac{\partial \widetilde{\mu}}{\partial \theta_j}\Bigg\}\right. Fij=tr(C −1∂θi∂C C −1∂θj∂C )+2Re{∂θi∂μ HC −1∂θj∂μ }
参考文献
[1] Kay S M. Fundamentals of statistical signal processing: estimation theory[M]. Prentice-Hall, Inc., 1993.