多维高斯分布
概率密度函数
首先给出多维高斯分布的概率密度函数:
p
(
x
∣
μ
,
Σ
)
=
1
(
2
π
)
p
2
∣
Σ
∣
1
2
e
x
p
{
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
}
p(x|\mu,\Sigma)=\frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}exp\{-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)\}
p(x∣μ,Σ)=(2π)2p∣Σ∣211exp{−21(x−μ)TΣ−1(x−μ)}其中,
x
⃗
∈
R
p
,
μ
\vec{x}\in{R^p},\mu
x∈Rp,μ 为均值
,
Σ
,\Sigma
,Σ 为协方差矩阵
,
∣
Σ
∣
,|\Sigma|
,∣Σ∣ 为对应行列式的值。
由于协方差矩阵
Σ
\Sigma
Σ 具有正定性,可对其做特征分解
Σ
=
U
Λ
U
T
\Sigma=U\Lambda U^T
Σ=UΛUT, 其中
U
=
(
u
1
u
2
…
u
p
)
,
U
U
T
=
I
p
×
p
,
Λ
=
d
i
a
g
(
λ
i
)
U=\begin{pmatrix}u_1&u_2&\dots&&u_p\end{pmatrix},UU^T=I_{p\times{p}}, \Lambda=diag(\lambda_i)
U=(u1u2…up),UUT=Ip×p,Λ=diag(λi)
由此可得:
Σ
=
(
u
1
u
2
…
u
p
)
(
λ
1
0
0
…
0
0
λ
2
0
…
0
0
0
λ
3
…
0
0
0
0
…
0
.
.
0
0
0
…
λ
p
)
(
u
1
T
u
2
T
⋮
u
p
T
)
=
∑
i
=
1
p
λ
i
u
i
u
i
T
\Sigma=\begin{pmatrix}u_1&u_2&\dots&u_p\end{pmatrix}\begin{pmatrix}\lambda_1&0&0&\dots&0\\0&\lambda_2&0&\dots&0\\0&0&\lambda_3&\dots&0\\0&0&0&\dots&0\\.&&&&.\\0&0&0&\dots&\lambda_p\end{pmatrix}\begin{pmatrix}u_1^T\\u_2^T\\\vdots\\u_p^T\end{pmatrix}=\sum\limits_{i=1}^{p}\lambda_iu_iu_i^T
Σ=(u1u2…up)⎝⎜⎜⎜⎜⎜⎜⎛λ1000.00λ200000λ300……………0000.λp⎠⎟⎟⎟⎟⎟⎟⎞⎝⎜⎜⎜⎛u1Tu2T⋮upT⎠⎟⎟⎟⎞=i=1∑pλiuiuiT
Σ
1
−
1
=
(
U
Λ
U
T
)
−
1
=
(
U
T
)
−
1
Λ
−
1
U
−
1
=
(
U
−
1
)
T
Λ
−
1
U
−
1
=
U
Λ
−
1
U
T
=
∑
i
=
1
p
1
λ
i
u
i
u
i
T
\Sigma_1^{-1}=(U\Lambda U^T)^{-1}=(U^T)^{-1}\Lambda^{-1}U^{-1}=(U^{-1})^{T}\Lambda^{-1}U^{-1}=U\Lambda^{-1}U^T=\sum\limits_{i=1}^{p}\frac{1}{\lambda_i}u_iu_i^T
Σ1−1=(UΛUT)−1=(UT)−1Λ−1U−1=(U−1)TΛ−1U−1=UΛ−1UT=i=1∑pλi1uiuiT
则分布函数中的指数部分可进一步表示为:
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
=
∑
i
=
1
p
1
λ
i
(
x
−
μ
)
T
u
i
u
i
T
(
x
−
μ
)
=
∑
i
=
1
p
1
λ
i
y
i
y
i
T
=
∑
i
=
1
p
1
λ
i
∣
∣
y
i
∣
∣
2
(x-\mu)^T\Sigma^{-1}(x-\mu)=\sum\limits_{i=1}^{p}\frac{1}{\lambda_i}(x-\mu)^Tu_iu_i^T(x-\mu)=\sum\limits_{i=1}^{p}\frac{1}{\lambda_i}y_iy_i^T=\sum\limits_{i=1}^{p}\frac{1}{\lambda_i}||y_i||^2
(x−μ)TΣ−1(x−μ)=i=1∑pλi1(x−μ)TuiuiT(x−μ)=i=1∑pλi1yiyiT=i=1∑pλi1∣∣yi∣∣2
很明显,当 p = 2 时,指数部分
y
1
2
λ
1
+
y
2
2
λ
2
=
r
\frac{y_1^2}{\lambda_1}+\frac{y_2^2}{\lambda_2}=r
λ1y12+λ2y22=r 表示一个椭圆曲线。椭圆的两个轴的方向是
U
U
U 中两个特征向量的方向,轴长是对应的
λ
i
\sqrt{\lambda_i}
λi 的长度。高斯分布的形状见下图:
高斯分布的线性变换
对于 Y = A X + B Y=AX+B Y=AX+B
若 X ∼ N ( μ x , Σ ) X\sim N(\mu_x,\Sigma) X∼N(μx,Σ),则 Y ∼ N ( A μ x + B , A Σ A T ) Y\sim N(A\mu_x+B,A\Sigma A^T) Y∼N(Aμx+B,AΣAT)
证明如下:
E [ Y ] = E [ A X + B ] = A E [ X ] + B = A μ x + B E[Y]=E[AX+B]=AE[X]+B=A\mu_x+B E[Y]=E[AX+B]=AE[X]+B=Aμx+B
D [ Y ] = D [ ( Y − μ y ) ( Y − μ y ) T ] = D [ ( ( A X + B ) − ( A μ x + B ) ) ( ( A X + B ) − ( A μ x + B ) ) T ] = D [ ( A X − A μ x ) ( A X − A μ x ) T ] = A D [ ( X − μ x ) ( X − μ x ) ] A T = A Σ x A T D[Y]=D[(Y-\mu_y)(Y-\mu_y)^T]\\\quad \quad =D[((AX+B)-(A\mu_x+B))((AX+B)-(A\mu_x+B))^T]\\\quad \quad= D[(AX-A\mu_x)(AX-A\mu_x)^T]\\\quad \quad =AD[(X-\mu_x)(X-\mu_x)]A^T\\\quad \quad =A\Sigma_xA^T D[Y]=D[(Y−μy)(Y−μy)T]=D[((AX+B)−(Aμx+B))((AX+B)−(Aμx+B))T]=D[(AX−Aμx)(AX−Aμx)T]=AD[(X−μx)(X−μx)]AT=AΣxAT
由联合概率分布求边缘概率分布
【假设】:
X
=
(
x
a
x
b
)
,
μ
=
(
μ
a
μ
b
)
,
Σ
=
(
Σ
a
a
Σ
a
b
Σ
b
a
Σ
b
b
)
,
x
a
∈
R
m
,
x
b
∈
R
n
X=\begin{pmatrix}x_a\\x_b\end{pmatrix},\mu=\begin{pmatrix}\mu_a\\\mu_b\end{pmatrix},\Sigma=\begin{pmatrix}\Sigma_{aa}&\Sigma_{ab}\\\Sigma_{ba}&\Sigma_{bb}\end{pmatrix},x_a\in{R^m},x_b\in{R^n}
X=(xaxb),μ=(μaμb),Σ=(ΣaaΣbaΣabΣbb),xa∈Rm,xb∈Rn对于
x
a
=
(
I
m
0
)
(
x
a
x
b
)
=
A
X
x_a=\begin{pmatrix}I_m&0\end{pmatrix}\begin{pmatrix}x_a\\x_b\end{pmatrix}=AX
xa=(Im0)(xaxb)=AX
则根据线性变换有:
E
[
x
a
]
=
E
[
A
X
]
=
E
[
I
m
x
a
]
=
E
[
x
a
]
=
μ
a
E[x_a]=E[AX]=E[I_mx_a]=E[x_a]=\mu_a
E[xa]=E[AX]=E[Imxa]=E[xa]=μa
D
[
x
a
]
=
A
Σ
A
T
=
(
I
m
0
)
(
Σ
a
a
Σ
a
b
Σ
b
a
Σ
b
b
)
(
I
m
0
)
=
Σ
a
a
D[x_a]=A\Sigma A^T=\begin{pmatrix}I_m&0\end{pmatrix}\begin{pmatrix}\Sigma_{aa}&\Sigma_{ab}\\\Sigma_{ba}&\Sigma_{bb}\end{pmatrix}\begin{pmatrix}I_m\\0\end{pmatrix}=\Sigma_{aa}
D[xa]=AΣAT=(Im0)(ΣaaΣbaΣabΣbb)(Im0)=Σaa
所以
x
a
∼
N
(
μ
a
,
Σ
a
a
)
x_a\sim N(\mu_a,\Sigma_{aa})
xa∼N(μa,Σaa),同理也可得到
x
b
∼
N
(
μ
b
,
Σ
b
b
)
x_b\sim N(\mu_b,\Sigma_{bb})
xb∼N(μb,Σbb)