文章目录
本节探讨信息矩阵、hessian矩阵与协方差矩阵的关系,阐明边缘化的原理。
一个简单的示例,如下:
来自 David Mackay. “The humble Gaussian distribution”. In: (2006). 以及手写vio第四节。

箭头代表了约束方程(或可以理解为观测方程):
z 1 : z 2 : z 3 : x 2 = v 2 x 1 = w 1 x 2 + v 1 x 3 = w 3 x 2 + v 3 \begin{array}{} { {z_1}:}\\ { {z_2}:}\\ { {z_3}:} \end{array}\begin{array}{} { {x_2} = {v_2}}\\ {\,{x_1} = {w_1}{x_2} + {v_1}}\\ {\,{x_3} = {w_3}{x_2} + {v_3}} \end{array} z1:z2:z3:x2=v2x1=w1x2+v1x3=w3x2+v3
其中, v i v_i vi 相互独立,且各自服从零均值,协方差为 σ i 2 \sigma_i^2 σi2的高斯分布。
协方差矩阵
协方差计算公式:
C o v ( X , Y ) = E [ ( X − E [ X ] ) ∗ ( Y − E [ Y ] ) = E [ X Y ] − 2 E [ X ] E [ Y ] + E [ X ] E [ Y ] = E [ X Y ] − E [ X ] E [ Y ] \begin{aligned} Cov(X,Y) &= E[(X - E[X]) * (Y - E[Y])\\ &= E[XY] - 2E[X]E[Y] + E[X]E[Y]\\ &= E[XY] - E[X]E[Y] \end{aligned} Cov(X,Y)=E[(X−E[X])∗(Y−E[Y])=E[XY]−2E[X]E[Y]+E[X]E[Y]=E[XY]−E[X]E[Y]
或: C o v ( X , Y ) = E [ ( X − μ x ) ( Y − μ y ) ] Cov(X,Y) = E[(X - {\mu _x})(Y - {\mu _y})] Cov(X,Y)=E[(X−μx)(Y−μy)]
计算 x 1 , x 2 , x 3 x_1,x_2,x_3 x1,x2,x3之间的协方差矩阵:
Σ 11 = E ( x 1 x 1 ) = E ( ( w 1 v 2 + v 1 ) ( w 1 v 2 + v 1 ) ) = w 1 2 E ( v 2 2 ) + 2 w 1 E ( v 1 v 2 ) + E ( v 1 2 ) = w 1 2 σ 2 2 + σ 1 2 Σ 22 = σ 2 2 , Σ 33 = w 3 2 σ 2 2 + σ 3 2 Σ 12 = E ( x 1 x 2 ) = E ( ( w 1 v 2 + v 1 ) v 2 ) = w 1 σ 2 2 Σ 13 = E ( ( w 1 v 2 + v 1 ) ( w 3 v 2 + v 3 ) ) = w 1 w 3 σ 2 2 \begin{aligned} {
{\rm{\Sigma }}_{11}} &= E({x_1}{x_1}) = E(({w_1}{v_2} + {v_1})({w_1}{v_2} + {v_1}))\\ &= w_1^2E(v_2^2) + 2{w_1}E({v_1}{v_2}) + E(v_1^2)\\ &= w_1^2\sigma _2^2 + \sigma _1^2\\ {
{\rm{\Sigma }}_{22}} &= \sigma _2^2,\quad {
{\rm{\Sigma }}_{33}} = w_3^2\sigma _2^2 + \sigma _3^2\\ {
{\rm{\Sigma }}_{12}} &= E({x_1}{x_2}) = E(({w_1}{v_2} + {v_1}){v_2}) = {w_1}\sigma _2^2\\ {
{\rm{\Sigma }}_{13}} &= E(({w_1}{v_2} + {v_1})({w_3}{v_2} + {v_3})) = {w_1}{w_3}\sigma _2^2 \end{aligned} Σ11Σ22Σ12Σ13=E(x1x1)=E((w1v2+v1)(w1v2+v1))=w12E(v22)+2w1E(v1v2)+E(v12)=w12σ22+σ12=σ22,Σ33=w32σ22+σ32=E(x1x2)=E((w1v2+v1)v2)=w1σ22=E((w1v2+v1)(w3v2+v3))=w1w3σ22
最后得到协方差矩阵:
Σ = [ w 1 2 σ 2 2 + σ 1 2 w 1 σ 2 2 w 1 w 3 σ 2 2 w 1 σ 2 2 σ 2 2 w 3 σ 2 2 w 1 w 3 σ 2 2 w 3 σ 2 2 w 3 2 σ 2 2 + σ 3 2 ] \Sigma = \left[ {\begin{array}{} {w_1^2\sigma _2^2 + \sigma _1^2}&{
{w_1}\sigma _2^2}&{
{w_1}{w_3}\sigma _2^2}\\ {
{w_1}\sigma _2^2}&{\sigma _2^2}&{
{w_3}\sigma _2^2}\\ {
{w_1}{w_3}\sigma _2^2}&{
{w_3}\sigma _2^2}&{w_3^2\sigma _2^2 + \sigma _3^2} \end{array}} \right] Σ=
w12σ22+σ12w1σ22w1w3σ22w1σ22σ22w3σ22w1w