目录
高斯分布
linear Gaussian model
$z_{t}=Az_{t-1}+B+\epsilon$
其中$\epsilon$
为噪音
极大似然估计
- 已知
X = ( x 1 , x 2 , x 3 , ⋯ x N ) T X=(\mathrm{x}_{1}, \mathrm{x}_{2}, \mathrm{x}_{3}, \cdots \mathrm{x}_{N})^{T} X=(x1,x2,x3,⋯xN)T,
X i ∈ R X_{i}\in\R Xi∈R - 推导 X i ∼ N ( μ , σ 2 ) X_{i}\sim\N(\mu,\sigma^2) Xi∼N(μ,σ2), X i ∈ R P X_{i}\in\R^{P} Xi∈RP,
l o g P ( X ∣ θ ) = l o g ∏ i = 1 N P ( X ∣ θ ) = ∑ i = 1 N l o g P ( x i ∣ θ ) = ∑ i = 1 N [ log 1 2 π + log 1 σ − ( x i − μ ) 2 2 σ 2 ] log\mathrm{P}(X|\theta)=log\prod_{i=1}^{N}\mathrm{P}(X|\theta)=\sum_{i=1}^{N}log\mathrm{P}(x_{i}|\theta)=\sum_{i=1}^{N}[\log\frac{1}{\sqrt{2\pi}}+\log\frac{1}{\sigma}-\frac{(x_{i}-\mu)^2}{2\sigma^2}] logP(X∣θ)=log∏i=1NP(X∣θ)=∑i=1NlogP(xi∣θ)=∑i=1N[log2π1+logσ1−2σ2(xi−μ)2]
μ M L E = arg max μ log P ( X ∣ θ ) = arg min μ ∑ i = 1 N ( x i − μ ) 2 \mu_{MLE}=\underset{\mu}{\arg\max}\log\mathrm{P}(X|\theta)=\underset{\mu}{\arg\min}\sum_{i=1}^{N}(x_{i}-\mu)^2 μMLE=μargmaxlogP(X∣θ)=μargmin∑i=1N(xi−μ)2
得到 μ M L E = 1 N ∑ i = 1 N x i \mu_{MLE}=\frac{1}{N}\sum_{i=1}^{N}x_{i} μMLE=N1∑i=1Nxi
同理 σ M L E 2 = 1 N ∑ i = 1 N ( x i − μ M L E ) 2 \sigma_{MLE}^{2}=\frac{1}{N}\sum_{i=1}^{N}(x_{i}-\mu_{MLE})^{2} σMLE2=N1∑i=1N(xi−μMLE)2
- μ M L E \mu_{MLE} μMLE叫无偏估计
- σ M L E 2 \sigma_{MLE}^{2} σMLE2叫有偏估计( E [ σ M L E 2 ] = N − 1 N σ 2 E[\sigma_{MLE}^{2}]=\frac{N-1}{N}\sigma^{2} E[σMLE2]=NN−1σ2)
- 无偏
σ
^
=
1
N
−
1
∑
i
=
1
N
(
x
i
−
μ
)
2
\hat{\sigma}=\frac{1}{N-1}\sum_{i=1}^{N}(x_{i}-\mu)^2
σ^=N−11∑i=1N(xi−μ)2
- 推导 E [ σ M L E 2 ] = E [ 1 N ∑ i = 1 N ( x i 2 − 2 x i μ M L E + μ M L E 2 ) ] = E [ 1 N ∑ i = 1 N ( x i 2 − 2 μ M L E 2 + μ M L E 2 ) ] = E [ 1 N ∑ i = 1 N x i 2 − μ 2 ] − E [ μ M L E 2 − μ 2 ) ] = σ 2 − ( E [ μ M L E 2 ] − E 2 [ μ M L E ] ) = N − 1 N σ 2 E[\sigma_{MLE}^{2}]=E[\frac{1}{N}\sum_{i=1}^{N}(x_{i}^{2}-2x_{i}\mu_{MLE}+\mu_{MLE}^{2})]=E[\frac{1}{N}\sum_{i=1}^{N}(x_{i}^{2}-2\mu_{MLE}^{2}+\mu_{MLE}^{2})]=E[\frac{1}{N}\sum_{i=1}^{N}x_{i}^{2}-\mu^{2}]-E[\mu_{MLE}^{2}-\mu^{2})]=\sigma^{2}-(E[\mu_{MLE}^{2}]-E^{2}[\mu_{MLE}])=\frac{N-1}{N}\sigma^{2} E[σMLE2]=E[N1∑i=1N(xi2−2xiμMLE+μMLE2)]=E[N1∑i=1N(xi2−2μMLE2+μMLE2)]=E[N1∑i=1Nxi2−μ2]−E[μMLE2−μ2)]=σ2−(E[μMLE2]−E2[μMLE])=NN−1σ2
- 极大似然估计方差偏小
多维高斯分布
-
x ∼ N ( μ , ∑ ) = 1 ( 2 π ) P 2 ∣ ∑ ∣ 1 2 e x p ( − 1 2 ( x − μ ) T ∑ − 1 ( x − μ ) ) x\sim\N(\mu,\sum)=\frac{1}{(2\pi)^{\frac{P}{2}}|\sum|^{\frac{1}{2}}}exp(-\frac{1}{2}(x-\mu)^{T}\sum^{-1}(x-\mu)) x∼N(μ,∑)=(2π)2P∣∑∣211exp(−21(x−μ)T∑−1(x−μ))
其中 μ = ( μ 1 , μ 2 , ⋯ , μ P ) T \mu=(\mu_{1},\mu_{2},\cdots,\mu_{P})^{T} μ=(μ1,μ2,⋯,μP)T, ∑ = ( σ 11 ⋯ σ 1 P ⋮ ⋮ ⋮ σ P 1 ⋯ σ P P ) \sum=\left(\begin{array}{ll}{\sigma_{11}}&{\cdots}&{\sigma_{1P}}\\{\vdots}&{\vdots}&{\vdots}\\{\sigma_{P1}}&{\cdots}&{\sigma_{PP}} \end{array}\right) ∑=⎝⎜⎛σ11⋮σP1⋯⋮⋯σ1P⋮σPP⎠⎟⎞`(方差矩阵)为正交的且为半正定(一般) -
( x − μ ) T ∑ − 1 ( x − μ ) (x-\mu)^{T}\sum^{-1}(x-\mu) (x−μ)T∑−1(x−μ):马氏距离(x与 μ \mu μ的距离)
- ∑ = I \sum=I ∑=I,马氏距离=欧氏距离
- example:
- z 1 = ( z 11 , z 12 ) , z 2 = ( z 21 , z 22 ) z_{1}=(z_{11},z_{12}),z_{2}=(z_{21},z_{22}) z1=(z11,z12),z2=(z21,z22), ( z 1 − z 2 ) T ∑ − 1 ( z 1 − z 2 ) = ( z 11 − z 21 ) 2 − ( z 12 − z 22 ) 2 (z_{1}-z_{2})^{T}\sum^{-1}(z_{1}-z_{2})=(z_{11}-z_{21})^{2}-(z_{12}-z_{22})^{2} (z1−z2)T∑−1(z1−z2)=(z11−z21)2−(z12−z22)2为欧氏距离
-
∑
=
U
λ
U
T
=
U
T
U
=
1
\sum=U\lambda U^{T}=U^{T}U=1
∑=UλUT=UTU=1,其中
U
U
T
=
U
T
U
=
1
UU^{T}=U^{T}U=1
UUT=UTU=1,
λ
\lambda
λ为特征值矩阵
- ∑ = U λ U T = U T U = 1 = ∑ i = 1 P u i λ u i T \sum=U\lambda U^{T}=U^{T}U=1=\sum_{i=1}^{P}u_{i}\lambda u_{i}^{T} ∑=UλUT=UTU=1=∑i=1PuiλuiT
- ∑ − 1 = ( U λ U T ) − 1 = ∑ i = 1 P u i λ − 1 u i T \sum^{-1}=(U\lambda U^{T})^{-1}=\sum_{i=1}^{P}u_{i}\lambda^{-1} u_{i}^{T} ∑−1=(UλUT)−1=∑i=1Puiλ−1uiT
-
Δ
=
(
x
−
μ
)
T
∑
−
1
(
x
−
μ
)
=
∑
i
=
1
P
(
x
−
μ
)
T
u
i
λ
−
1
u
i
T
(
x
−
μ
)
=
∑
i
=
1
P
y
i
1
λ
i
y
i
T
=
∑
i
=
1
P
y
i
2
λ
i
\Delta=(x-\mu)^{T}\sum^{-1}(x-\mu)=\sum_{i=1}^{P}(x-\mu)^{T}u_{i}\lambda^{-1} u_{i}^{T}(x-\mu)=\sum_{i=1}^{P}y_{i}\frac{1}{\lambda_{i}}y_{i}^{T}=\sum_{i=1}^{P}\frac{y_{i}^{2}}{\lambda_{i}}
Δ=(x−μ)T∑−1(x−μ)=∑i=1P(x−μ)Tuiλ−1uiT(x−μ)=∑i=1Pyiλi1yiT=∑i=1Pλiyi2
,其中
y i = ( x − μ ) T μ i y_{i}=(x-\mu)^{T}\mu_{i} yi=(x−μ)Tμi- p = 2 p=2 p=2, Δ = y 1 2 λ 1 + y 2 2 λ 1 = r i \Delta=\frac{y_{1}^{2}}{\lambda_{1}}+\frac{y_{2}^{2}}{\lambda_{1}}=r_{i} Δ=λ1y12+λ1y22=ri,便可以将其看作是椭圆,之不过是变换轴,这样就会出现等高线,即可与二维的高斯分布对比。
-
∑ p × p \sum_{p\times p} ∑p×p
- 有 p 2 − p 2 + p = p 2 + p 2 \frac{p^{2}-p}{2}+p=\frac{p^{2}+p}{2} 2p2−p+p=2p2+p个参数即有 p ( p + 1 ) 2 = O ( p 2 ) \frac{p(p+1)}{2}=O(p^{2}) 2p(p+1)=O(p2)
-
局限性:如果样本点拟合成两个高斯分布更准确,但是实际中使用高斯则是用一个大的高斯去拟合,这样就会存在造成较大误差。
已知高斯分布求边缘和条件高斯分布
- 已知 x = ( x a x b ) x=\left(\begin{array}{ll}{x_{a}}\\{x_{b}}\end{array}\right) x=(xaxb), μ = ( μ a μ b ) \mu=\left(\begin{array}{ll}{\mu_{a}}\\{\mu_{b}}\end{array}\right) μ=(μaμb), ∑ = ( σ a a σ a b σ b a σ b b ) \sum=\left(\begin{array}{ll}{\sigma_{aa}}&{\sigma_{ab}}\\{\sigma_{ba}}&{\sigma_{bb}}\end{array}\right) ∑=(σaaσbaσabσbb),其中 a + b = p a+b=p a+b=p。求 P ( x a ) P(x_{a}) P(xa), P ( x b ∣ x a ) P(x_{b}|x_{a}) P(xb∣xa), P ( x b ) P(x_{b}) P(xb), P ( x a ∣ x b ) P(x_{a}|x_{b}) P(xa∣xb)
- 配方法(prml)
- 定理:
x
∼
N
(
μ
,
σ
2
)
x\sim N(\mu,\sigma^{2})
x∼N(μ,σ2),
y
=
A
X
+
B
y=AX+B
y=AX+B;结论:
y
∼
N
(
A
μ
+
B
,
A
∑
A
T
)
y\sim N(A\mu+B,A\sum A^{T})
y∼N(Aμ+B,A∑AT)
-
x
a
=
(
I
m
,
0
)
(
x
a
x
b
)
x_{a}=(I_{m},0)\left(\begin{array}{ll}{x_{a}}\\{x_{b}}\end{array}\right)
xa=(Im,0)(xaxb)`
- ` E ( x a ) = E ( ( I m , 0 ) ( x a x b ) ) = μ a E(x_{a})=E((I_{m},0)\left(\begin{array}{ll}{x_{a}}\\{x_{b}}\end{array}\right))=\mu_{a} E(xa)=E((Im,0)(xaxb))=μa
- ` v a r ( x a ) = v a r ( ( I m , 0 ) ( ∑ a a ∑ a b ∑ b a ∑ b b ) ) = σ a a var(x_{a})=var((I_{m},0)\left(\begin{array}{ll}{\sum_{aa}}&{\sum_{ab}}\\{\sum_{ba}}&{\sum_{bb}}\end{array}\right))=\sigma_{aa} var(xa)=var((Im,0)(∑aa∑ba∑ab∑bb))=σaa
- ` x a ∼ N ( μ a , ∑ a a ) x_{a}\sim N(\mu_{a},\sum_{aa}) xa∼N(μa,∑aa)
- 求
P
(
x
a
∣
x
b
)
P(x_{a}|x_{b})
P(xa∣xb)
- 构造
x
b
−
a
=
x
b
−
x
b
b
∑
−
1
x
a
x_{b-a}=x_{b}-x_{bb}\sum^{-1}x_{a}
xb−a=xb−xbb∑−1xa
,
μ b − a = μ b − μ b b ∑ − 1 μ a \mu_{b-a}=\mu_{b}-\mu_{bb}\sum^{-1}\mu_{a} μb−a=μb−μbb∑−1μa,
∑ b − a = ∑ b b − ∑ b a ∑ − 1 ∑ a b \sum_{b-a}=\sum_{bb}-\sum_{ba}\sum^{-1}\sum_{ab} ∑b−a=∑bb−∑ba∑−1∑ab` - E ( x b ∣ x a ) = E ( x a ) E(x_{b}|x_{a})= E(x_{a}) E(xb∣xa)=E(xa)=
- 构造
x
b
−
a
=
x
b
−
x
b
b
∑
−
1
x
a
x_{b-a}=x_{b}-x_{bb}\sum^{-1}x_{a}
xb−a=xb−xbb∑−1xa
-
x
a
=
(
I
m
,
0
)
(
x
a
x
b
)
x_{a}=(I_{m},0)\left(\begin{array}{ll}{x_{a}}\\{x_{b}}\end{array}\right)
xa=(Im,0)(xaxb)`
已知边缘概率密度求联合
- 已知
p
(
x
)
=
N
(
x
∣
μ
,
λ
−
1
)
p(x)=N(x|\mu,\lambda^{-1})
p(x)=N(x∣μ,λ−1),
p
(
x
∣
y
)
=
N
(
y
∣
A
x
+
B
,
L
−
1
)
p(x|y)=N(y|Ax+B,L^{-1})
p(x∣y)=N(y∣Ax+B,L−1)。求
p
(
y
)
p(y)
p(y),
p
(
x
∣
y
)
p(x|y)
p(x∣y)
- 对于线性高斯 y = A X + B + ϵ , ϵ ∼ N ( 0 , L − 1 ) , ϵ 与 x 独 立 y=AX+B+\epsilon,\epsilon\sim N(0,L^{-1}),\epsilon与x独立 y=AX+B+ϵ,ϵ∼N(0,L−1),ϵ与x独立
- 解 E [ y ] = E [ A x + B + ϵ ] = A μ + B E[y]=E[Ax+B+\epsilon ]=A\mu+B E[y]=E[Ax+B+ϵ]=Aμ+B, v a r [ Y ] = v a r [ A X + B + ϵ ] = v a r [ A X + B ] + v a r [ ϵ ] = A λ − 1 A T + L − 1 var[Y]=var[AX+B+\epsilon]=var[AX+B]+var[\epsilon]=A\lambda^{-1}A^{T}+L^{-1} var[Y]=var[AX+B+ϵ]=var[AX+B]+var[ϵ]=Aλ−1AT+L−1,得到 y ∼ N ( A μ + B , A λ − 1 A T + L − 1 ) y\sim N(A\mu+B,A\lambda^{-1}A^{T}+L^{-1}) y∼N(Aμ+B,Aλ−1AT+L−1)
-
z
=
(
x
y
)
∼
N
(
[
μ
A
μ
+
b
]
,
[
λ
−
1
Δ
Δ
L
−
1
+
A
λ
−
1
A
T
+
L
−
1
]
)
z=\left(\begin{array}{ll}{x}\\{y}\end{array}\right)\sim N\left(\left[\begin{array}{ll}{\mu}\\{A\mu+b}\end{array}\right],\left[\begin{array}{ll}{\lambda^{-1}} & {\Delta} \\ {\Delta} & {L^{-1}+A\lambda^{-1}A^{T}+L^{-1}}\end{array}\right]\right)
z=(xy)∼N([μAμ+b],[λ−1ΔΔL−1+Aλ−1AT+L−1]),其中
Δ
=
c
o
v
(
x
,
y
)
=
E
[
(
X
−
μ
)
(
y
−
E
[
y
]
)
T
]
=
E
[
(
X
−
μ
)
(
A
X
−
A
μ
+
ϵ
)
]
=
E
[
(
X
−
μ
)
(
X
−
μ
)
T
A
T
]
=
E
[
(
X
−
μ
)
(
X
−
μ
)
T
]
A
T
=
V
a
r
[
x
]
A
T
=
λ
−
1
A
T
\Delta=cov(x,y)=E[(X-\mu)(y-E[y])^{T}]=E[(X-\mu)(AX-A\mu+\epsilon)]=E[(X-\mu)(X-\mu)^{T}A^{T}]=E[(X-\mu)(X-\mu)^{T}]A^{T}=Var[x]A^{T}=\lambda^{-1}A^{T}
Δ=cov(x,y)=E[(X−μ)(y−E[y])T]=E[(X−μ)(AX−Aμ+ϵ)]=E[(X−μ)(X−μ)TAT]=E[(X−μ)(X−μ)T]AT=Var[x]AT=λ−1AT
-[x] p ( x ∣ y ) ∼ p(x|y)\sim p(x∣y)∼