机器学习-白板推导 P2_2
多维高斯分布
p
(
x
)
=
1
2
π
p
2
exp
(
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
)
p(x) = \frac{1}{{2 \pi}^{\frac{p}{2}} } \exp \left( - \frac{1}{2} (x - \mu)^T \Sigma^{-1 } (x - \mu) \right)
p(x)=2π2p1exp(−21(x−μ)TΣ−1(x−μ))
x
∈
R
p
x \in R^p
x∈Rp,是随机变量
x
=
[
x
1
x
2
⋮
x
p
]
μ
=
[
μ
1
μ
2
⋮
μ
p
]
Σ
=
[
σ
11
σ
12
⋯
σ
1
p
σ
21
σ
22
⋯
σ
2
p
⋮
⋮
⋱
⋮
σ
p
1
σ
p
2
⋯
σ
p
p
]
p
×
p
通
常
Σ
是
半
正
定
的
x= \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{p} \end{bmatrix} \qquad \mu= \begin{bmatrix} \mu_{1} \\ \mu_{2} \\ \vdots \\ \mu_{p} \end{bmatrix} \qquad \Sigma= \begin{bmatrix} \sigma_{11} & \sigma_{12} & \cdots &\sigma_{1p} \\ \sigma_{21} & \sigma_{22} & \cdots&\sigma_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{p1} & \sigma_{p2} & \cdots &\sigma_{pp} \end{bmatrix}_{p \times p } 通常\Sigma是半正定的
x=⎣⎢⎢⎢⎡x1x2⋮xp⎦⎥⎥⎥⎤μ=⎣⎢⎢⎢⎡μ1μ2⋮μp⎦⎥⎥⎥⎤Σ=⎣⎢⎢⎢⎡σ11σ21⋮σp1σ12σ22⋮σp2⋯⋯⋱⋯σ1pσ2p⋮σpp⎦⎥⎥⎥⎤p×p通常Σ是半正定的
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
(x-\mu)^T\Sigma^{-1}(x-\mu)
(x−μ)TΣ−1(x−μ):马氏距离(
x
x
x与
μ
\mu
μ之间) mahalabonis distance
如果:
Σ
=
I
\Sigma = I
Σ=I,马氏距离=欧氏距离
Σ
\Sigma
Σ特征分解:
Σ
=
U
Λ
U
T
U
U
T
=
U
T
U
=
I
Λ
=
d
i
a
g
(
λ
i
)
  
i
=
1
,
.
.
.
p
U
=
(
u
1
,
u
2
.
.
.
u
p
)
p
∗
p
\begin{aligned} &\Sigma=U \Lambda U^T\\ & UU^T=U^TU=I\\ & \Lambda=diag(\lambda_i)\; i=1,...p \\ & U=(u_1,u_2...u_p)_{p*p} \end{aligned}
Σ=UΛUTUUT=UTU=IΛ=diag(λi)i=1,...pU=(u1,u2...up)p∗p
Σ
=
[
u
1
u
2
⋯
u
p
]
[
λ
1
0
⋯
0
0
λ
2
⋯
0
⋮
⋮
⋱
⋮
0
⋯
⋯
λ
p
]
[
u
1
T
u
2
T
⋮
u
p
T
]
=
∑
i
=
1
p
μ
i
λ
i
μ
i
T
\begin{aligned} \Sigma &= \begin{bmatrix} u_1 &u_2 & \cdots &u_p \end{bmatrix} \begin{bmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & \cdots & \cdots & \lambda_p \\ \end{bmatrix} \begin{bmatrix} u_1^T \\ u_2^T \\ \vdots \\ u_p^T \end{bmatrix} \\ &= \sum_{i=1}^p \mu_i \lambda_i \mu_i^T \end{aligned}
Σ=[u1u2⋯up]⎣⎢⎢⎢⎡λ10⋮00λ2⋮⋯⋯⋯⋱⋯00⋮λp⎦⎥⎥⎥⎤⎣⎢⎢⎢⎡u1Tu2T⋮upT⎦⎥⎥⎥⎤=i=1∑pμiλiμiT
Σ
−
1
=
(
U
Λ
U
T
)
−
1
=
(
U
T
)
−
1
Λ
−
1
U
−
1
=
U
Λ
−
1
U
T
=
∑
i
=
1
p
μ
i
1
λ
i
μ
i
T
Λ
−
1
=
d
i
a
g
(
1
λ
i
)
  
i
=
1
,
.
.
.
p
\begin{aligned} &\Sigma^{-1}=(U \Lambda U^T)^{-1} = (U^T)^{-1} \Lambda^{-1} U^{-1} = U\Lambda^{-1} U^T= \sum_{i=1}^p \mu_i \frac{1}{\lambda_i} \mu_i^T\\ &\Lambda^{-1} = diag(\frac{1}{\lambda_i})\; i=1,...p \end{aligned}
Σ−1=(UΛUT)−1=(UT)−1Λ−1U−1=UΛ−1UT=i=1∑pμiλi1μiTΛ−1=diag(λi1)i=1,...p
定 义 : y = [ x 1 x 2 ⋮ x p ] 令 : y i = ( x − μ ) T u i 定义: y = \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{p} \end{bmatrix} \qquad 令:y_i =(x-\mu)^Tu_i 定义:y=⎣⎢⎢⎢⎡x1x2⋮xp⎦⎥⎥⎥⎤令:yi=(x−μ)Tui
Δ = ( x − μ ) T Σ − 1 ( x − μ ) = ( x − μ ) T ∑ i = 1 p μ i 1 λ i μ i T ( x − μ ) = ∑ i = 1 p ( x − μ ) T μ i 1 λ i μ i T ( x − μ ) = ∑ i = 1 p y i 1 λ i y i T = ∑ i = 1 p y i 2 λ i \begin{aligned} \Delta = &(x-\mu)^T\Sigma^{-1}(x-\mu) = (x-\mu)^T \sum_{i=1}^p \mu_i \frac{1}{\lambda_i} \mu_i^T (x-\mu) \\ &= \sum_{i=1}^p (x-\mu)^T \mu_i \frac{1}{\lambda_i} \mu_i^T (x-\mu) \\ & = \sum_{i=1}^p y_i \frac{1}{\lambda_i} y_i^T \\ & = \sum_{i=1}^p \frac{y_i^2}{\lambda_i} \end{aligned} Δ=(x−μ)TΣ−1(x−μ)=(x−μ)Ti=1∑pμiλi1μiT(x−μ)=i=1∑p(x−μ)Tμiλi1μiT(x−μ)=i=1∑pyiλi1yiT=i=1∑pλiyi2
令
p
=
2
p=2
p=2,
Δ
=
y
1
2
λ
1
+
y
2
2
λ
2
\Delta=\frac{y_1^2}{\lambda_1}+\frac{y_2^2}{\lambda_2}
Δ=λ1y12+λ2y22
假设
Δ
=
y
1
2
λ
1
+
y
2
2
λ
2
=
1
\Delta=\frac{y_1^2}{\lambda_1}+\frac{y_2^2}{\lambda_2}=1
Δ=λ1y12+λ2y22=1,图形为椭圆。
y
i
=
(
x
−
μ
)
T
u
i
y_i =(x - \mu)^Tu_i
yi=(x−μ)Tui物理解释,
x
−
μ
x - \mu
x−μ在
u
i
u_i
ui方向上的投影。