二维正态分布
(X,Y)服从参数为μ1, μ2, σ1, σ2, ρ的二维正态分布,记作(X, Y)~N(μ1, μ2, σ1, σ2, ρ),它的密度函数:
f
(
x
,
y
)
=
1
2
π
σ
1
σ
2
1
−
ρ
2
exp
(
−
1
2
(
1
−
ρ
2
)
[
(
x
−
μ
1
)
2
σ
1
2
−
2
ρ
(
x
−
μ
1
)
(
y
−
μ
2
)
2
σ
1
σ
2
+
(
y
−
μ
2
)
2
σ
2
2
]
)
=
1
(
2
π
)
2
σ
1
σ
2
1
−
ρ
2
exp
(
−
1
2
(
1
−
ρ
2
)
[
(
x
−
μ
1
)
2
σ
1
2
−
2
ρ
(
x
−
μ
1
)
(
y
−
μ
2
)
2
σ
1
σ
2
+
(
y
−
μ
2
)
2
σ
2
2
]
)
\begin{array}{l} f\left(x, y\right) \\ \quad=\frac{1}{2 \pi \sigma_{1} \sigma_{2} \sqrt{1-\rho^{2}}} \exp \left(-\frac{1}{2\left(1-\rho^{2}\right)}\left[\frac{\left(x-\mu_{1}\right)^{2}}{\sigma_{1}^{2}}-2 \rho \frac{\left(x-\mu_{1}\right)\left(y-\mu_{2}\right)}{2 \sigma_{1} \sigma_{2}}+\frac{\left(y-\mu_{2}\right)^{2}}{\sigma_{2}^{2}}\right]\right) \\ \quad=\frac{1}{(\sqrt{2 \pi})^{2} \sigma_{1} \sigma_{2} \sqrt{1-\rho^{2}}} \exp \left(-\frac{1}{2\left(1-\rho^{2}\right)}\left[\frac{\left(x-\mu_{1}\right)^{2}}{\sigma_{1}^{2}}-2 \rho \frac{\left(x-\mu_{1}\right)\left(y-\mu_{2}\right)}{2 \sigma_{1} \sigma_{2}}+\frac{\left(y-\mu_{2}\right)^{2}}{\sigma_{2}^{2}}\right]\right) \end{array}
f(x,y)=2πσ1σ21−ρ21exp(−2(1−ρ2)1[σ12(x−μ1)2−2ρ2σ1σ2(x−μ1)(y−μ2)+σ22(y−μ2)2])=(2π)2σ1σ21−ρ21exp(−2(1−ρ2)1[σ12(x−μ1)2−2ρ2σ1σ2(x−μ1)(y−μ2)+σ22(y−μ2)2])
其中μ1是第1维度的均值,σ1是第1维度的方差,ρ是将两个维度的相关性规范到-1到+1之间的统计量,称为样本的相关系数,定义为:
ρ
=
COV
(
X
,
Y
)
σ
1
σ
2
,
∣
ρ
∣
<
1
\rho=\frac{\operatorname{COV}(X, Y)}{\sigma_{1} \sigma_{2}}, \quad|\rho|<1
ρ=σ1σ2COV(X,Y),∣ρ∣<1
对于二维正态随机变量(X,Y),X和Y相互独立的充要条件是二者的协方差为0,也就是参数ρ=0。由于一维随机变量没有是否独立一说,ρ一定是0,因此没有在一维随机变量的正态分布中体现ρ。
多维正态分布
假设n维随机变量
x
=
[
x
1
,
x
2
,
⋯
,
x
n
]
T
x=\left[x_{1}, x_{2}, \cdots, x_{n}\right]^{\mathrm{T}}
x=[x1,x2,⋯,xn]T的各个维度之间互不相关,且服从正态分布(维度不相关多元正态分布),各个维度的均值为
E
(
x
)
=
[
μ
1
,
μ
2
,
⋯
,
μ
n
]
T
E(x)=\left[\mu_{1}, \mu_{2}, \cdots, \mu_{n}\right]^{\mathrm{T}}
E(x)=[μ1,μ2,⋯,μn]T,各个维度的方差为
σ
(
x
)
=
[
σ
1
,
σ
2
,
⋯
,
σ
n
]
T
\sigma(x)=\left[\sigma_{1}, \sigma_{2}, \cdots, \sigma_{n}\right]^{\mathrm{T}}
σ(x)=[σ1,σ2,⋯,σn]T
用列向量的形式表示随机变量和参数,对于n维随机变量有:
x
=
[
x
1
x
2
⋮
x
n
]
,
μ
=
[
μ
1
μ
2
⋮
μ
n
]
,
σ
=
[
σ
1
σ
2
⋮
σ
n
]
x=\left[\begin{array}{c} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{array}\right], \quad \mu=\left[\begin{array}{c} \mu_{1} \\ \mu_{2} \\ \vdots \\ \mu_{n} \end{array}\right], \quad \sigma=\left[\begin{array}{c} \sigma_{1} \\ \sigma_{2} \\ \vdots \\ \sigma_{n} \end{array}\right]
x=⎣
⎡x1x2⋮xn⎦
⎤,μ=⎣
⎡μ1μ2⋮μn⎦
⎤,σ=⎣
⎡σ1σ2⋮σn⎦
⎤
根据联合概率密度公式:
f
(
x
)
=
p
(
x
1
,
x
2
…
x
n
)
=
p
(
x
1
)
p
(
x
2
)
…
p
(
x
n
)
=
1
2
π
σ
1
exp
(
−
1
2
(
x
1
−
μ
1
σ
1
)
2
)
1
2
π
σ
2
exp
(
−
1
2
(
x
2
−
μ
2
σ
2
)
2
)
⋯
1
2
π
σ
n
exp
(
−
1
2
(
x
n
−
μ
n
σ
n
)
2
)
=
1
(
2
π
)
n
σ
1
σ
2
⋯
σ
n
exp
(
−
1
2
[
(
x
1
−
μ
1
σ
1
)
2
+
(
x
2
−
μ
2
σ
2
)
2
+
⋯
+
(
x
n
−
μ
n
σ
n
)
2
]
)
\begin{array}{l} f(x)=p\left(x_{1}, x_{2} \ldots x_{n}\right)=p\left(x_{1}\right) p\left(x_{2}\right) \ldots p\left(x_{n}\right) \\ =\frac{1}{\sqrt{2 \pi} \sigma_{1}} \exp \left(-\frac{1}{2}\left(\frac{x_{1}-\mu_{1}}{\sigma_{1}}\right)^{2}\right) \frac{1}{\sqrt{2 \pi} \sigma_{2}} \exp \left(-\frac{1}{2}\left(\frac{x_{2}-\mu_{2}}{\sigma_{2}}\right)^{2}\right) \cdots \frac{1}{\sqrt{2 \pi} \sigma_{n}} \exp \left(-\frac{1}{2}\left(\frac{x_{n}-\mu_{n}}{\sigma_{n}}\right)^{2}\right)\\ =\frac{1}{(\sqrt{2 \pi})^{n} \sigma_{1} \sigma_{2} \cdots \sigma_{n}} \exp \left(-\frac{1}{2}\left[\left(\frac{x_{1}-\mu_{1}}{\sigma_{1}}\right)^{2}+\left(\frac{x_{2}-\mu_{2}}{\sigma_{2}}\right)^{2}+\cdots+\left(\frac{x_{n}-\mu_{n}}{\sigma_{n}}\right)^{2}\right]\right)\end{array}
f(x)=p(x1,x2…xn)=p(x1)p(x2)…p(xn)=2πσ11exp(−21(σ1x1−μ1)2)2πσ21exp(−21(σ2x2−μ2)2)⋯2πσn1exp(−21(σnxn−μn)2)=(2π)nσ1σ2⋯σn1exp(−21[(σ1x1−μ1)2+(σ2x2−μ2)2+⋯+(σnxn−μn)2])
令 z 2 = ( x 1 − μ 1 ) 2 σ 1 2 + ( x 2 − μ 2 ) 2 σ 2 2 ⋯ + ( x n − μ n ) 2 σ n 2 , σ z = σ 1 σ 2 ⋯ σ n z^{2}=\frac{\left(x_{1}-\mu_{1}\right)^{2}}{\sigma_{1}^{2}}+\frac{\left(x_{2}-\mu_{2}\right)^{2}}{\sigma_{2}^{2}} \cdots+\frac{\left(x_{n}-\mu_{n}\right)^{2}}{\sigma_{n}^{2}}, \quad\sigma_{z}=\sigma_{1} \sigma_{2} \cdots \sigma_{n} z2=σ12(x1−μ1)2+σ22(x2−μ2)2⋯+σn2(xn−μn)2,σz=σ1σ2⋯σn
则
f
(
x
)
f(x)
f(x)可以化为:
f
(
z
)
=
1
(
2
π
)
n
σ
z
e
−
z
2
2
①
f(z)=\frac{1}{(\sqrt{2 \pi})^{n} \sigma_{z}} e^{-\frac{z^{2}}{2}}\quad\quad①
f(z)=(2π)nσz1e−2z2①
因为多元正态分布有着很强的几何思想,单纯从代数的角度看待z很难看出z的概率分布规律,这里需要转换成矩阵形式:
z
2
=
z
T
z
=
[
x
1
−
μ
1
x
2
−
μ
2
⋯
x
n
−
μ
n
]
[
1
σ
1
2
0
⋯
0
0
1
σ
2
2
⋯
0
⋮
⋮
⋱
⋮
0
0
⋯
1
σ
n
2
]
[
x
1
−
μ
1
x
2
−
μ
2
⋮
x
n
−
μ
n
]
②
\begin{array}{l} z^{2}=z^{\mathrm{T}} z \\ =\left[\begin{array}{llll} x_{1}-\mu_{1} & x_{2}-\mu_{2} & \cdots & x_{n}-\mu_{n} \end{array}\right]\left[\begin{array}{cccc} \frac{1}{\sigma_{1}^{2}} & 0 & \cdots & 0 \\ 0 & \frac{1}{\sigma_{2}^{2}} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \frac{1}{\sigma_{n}^{2}} \end{array}\right]\left[\begin{array}{c} x_{1}-\mu_{1} \\ x_{2}-\mu_{2} \\ \vdots \\ x_{n}-\mu_{n} \end{array}\right] \end{array}\quad\quad②
z2=zTz=[x1−μ1x2−μ2⋯xn−μn]⎣
⎡σ1210⋮00σ221⋮0⋯⋯⋱⋯00⋮σn21⎦
⎤⎣
⎡x1−μ1x2−μ2⋮xn−μn⎦
⎤②
上面的等式比较长,这里做一下变量替换,记
x − μ = [ x 1 − μ 1 , x 2 − μ 2 , ⋯ , x n − μ n ] T x-\mu=\left[x_{1}-\mu_{1}, x_{2}-\mu_{2}, \cdots, x_{n}-\mu_{n}\right]^{\mathrm{T}} x−μ=[x1−μ1,x2−μ2,⋯,xn−μn]T
定义一个符号 \quad Σ = [ σ 1 2 0 ⋯ 0 0 σ 2 2 ⋯ 0 ⋮ ⋯ ⋯ ⋮ 0 0 ⋯ σ n 2 ] \Sigma=\left[\begin{array}{cccc} \sigma_{1}^{2} & 0 & \cdots & 0 \\ 0 & \sigma_{2}^{2} & \cdots & 0 \\ \vdots & \cdots & \cdots & \vdots \\ 0 & 0 & \cdots & \sigma_{n}^{2} \end{array}\right] Σ=⎣ ⎡σ120⋮00σ22⋯0⋯⋯⋯⋯00⋮σn2⎦ ⎤
Σ
\Sigma
Σ 表示变量
x
x
x 的协方差矩阵,
i
i
i 行
j
j
j 列的元素值表示
x
i
x_{i}
xi 与
x
j
x_{j}
xj 的协方差。
因为现在变量之间是相互独立的,所以只有对角线上
(
i
=
j
)
(i = j)
(i=j)存在元素,其他地方都等于0,且
x
i
x_{i}
xi 与它本身的协方差就等于方差。
Σ
\Sigma
Σ 为一个对角矩阵,根据对角矩阵的性质,
Σ
\Sigma
Σ 的逆矩阵为:
(
Σ
)
−
1
=
[
1
σ
1
2
0
⋯
0
0
1
σ
2
2
⋯
0
⋮
⋯
⋯
⋮
0
0
⋯
1
σ
n
2
]
\left(\Sigma\right)^{-1}=\left[\begin{array}{cccc} \frac{1}{\sigma_{1}^{2}} & 0 & \cdots & 0 \\ 0 & \frac{1}{\sigma_{2}^{2}} & \cdots & 0 \\ \vdots & \cdots & \cdots & \vdots \\ 0 & 0 & \cdots & \frac{1}{\sigma_{n}^{2}} \end{array}\right]
(Σ)−1=⎣
⎡σ1210⋮00σ221⋯0⋯⋯⋯⋯00⋮σn21⎦
⎤
因为对角矩阵的行列式 = 对角元素的乘积
∣ Σ ∣ = σ 1 2 σ 2 2 ⋯ σ n 2 |\Sigma|=\sigma_{1}^{2} \sigma_{2}^{2} \cdots \sigma_{n}^{2} ∣Σ∣=σ12σ22⋯σn2
σ z = ∣ Σ ∣ 1 2 = σ 1 σ 2 … σ n \sigma_{z}=\left|\Sigma\right|^{\frac{1}{2}}=\sigma_{1} \sigma_{2} \ldots \sigma_{n} σz=∣Σ∣21=σ1σ2…σn
带入②中可得:
z
T
z
=
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
③
z^{\mathrm{T}} z=\left(x-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x-\mu\right)\quad\quad③
zTz=(x−μ)TΣ−1(x−μ)③
带入①中可得:
f
(
z
)
=
1
(
2
π
)
n
σ
z
e
−
z
2
2
=
1
(
2
π
)
n
∣
Σ
∣
1
2
e
−
(
x
−
μ
)
T
(
Σ
)
−
1
(
x
−
μ
)
2
f(z)=\frac{1}{(\sqrt{2 \pi})^{n} \sigma_{z}} e^{-\frac{z^{2}}{2}}=\frac{1}{(\sqrt{2 \pi})^{n}\left|\Sigma\right|^{\frac{1}{2}}} e^{-\frac{\left(x-\mu\right)^{\mathrm{T}}\left(\Sigma\right)^{-1}\left(x-\mu\right)}{2}}
f(z)=(2π)nσz1e−2z2=(2π)n∣Σ∣211e−2(x−μ)T(Σ)−1(x−μ)
所以得到:
f
(
x
)
=
1
(
2
π
)
n
σ
1
σ
2
⋯
σ
n
exp
(
−
1
2
[
(
x
1
−
μ
1
σ
1
)
2
+
(
x
2
−
μ
2
σ
2
)
2
+
⋯
+
(
x
n
−
μ
n
σ
n
)
2
]
)
=
1
(
2
π
)
n
∣
Σ
∣
exp
(
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
)
=
(
2
π
)
−
n
2
∣
Σ
∣
−
1
2
exp
(
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
)
=
f
(
x
;
μ
,
Σ
)
\begin{aligned} f(x) &=\frac{1}{(\sqrt{2 \pi})^{n} \sigma_{1} \sigma_{2} \cdots \sigma_{n}} \exp \left(-\frac{1}{2}\left[\left(\frac{x_{1}-\mu_{1}}{\sigma_{1}}\right)^{2}+\left(\frac{x_{2}-\mu_{2}}{\sigma_{2}}\right)^{2}+\cdots+\left(\frac{x_{n}-\mu_{n}}{\sigma_{n}}\right)^{2}\right]\right) \\ &=\frac{1}{(\sqrt{2 \pi})^{n} \sqrt{|\Sigma|}} \exp \left(-\frac{1}{2}(x-\mu)^{\mathrm{T}} \Sigma^{-1}(x-\mu)\right) \\ &=(2 \pi)^{-\frac{n}{2}}|\Sigma|^{-\frac{1}{2}} \exp \left(-\frac{1}{2}(x-\mu)^{\mathrm{T}} \Sigma^{-1}(x-\mu)\right) \\ &=f(x ; \mu, \Sigma) \end{aligned}
f(x)=(2π)nσ1σ2⋯σn1exp(−21[(σ1x1−μ1)2+(σ2x2−μ2)2+⋯+(σnxn−μn)2])=(2π)n∣Σ∣1exp(−21(x−μ)TΣ−1(x−μ))=(2π)−2n∣Σ∣−21exp(−21(x−μ)TΣ−1(x−μ))=f(x;μ,Σ)
最大似然估计量
n维相互独立的随机变量
x
x
x 服从正态分布:
x
∼
N
(
μ
,
σ
2
)
,
σ
i
≥
0
x \sim N\left(\mu, \sigma^{2}\right), \quad \sigma_{i} \geq 0
x∼N(μ,σ2),σi≥0
多维正态分布的最终形式为:
f
(
x
)
=
f
(
x
;
μ
,
Σ
)
f(x) = f(x ; \mu, \Sigma)
f(x)=f(x;μ,Σ)
假设有m个可观察样本,那么最大似然函数是:
L
(
μ
,
Σ
)
=
∏
i
=
1
m
f
(
x
(
i
)
;
μ
,
Σ
)
=
∏
i
=
1
m
(
2
π
)
−
n
2
∣
Σ
∣
−
1
2
exp
(
−
1
2
(
x
(
i
)
−
μ
)
T
Σ
−
1
(
x
(
i
)
−
μ
)
)
=
(
2
π
)
−
m
n
2
∣
Σ
∣
−
m
2
exp
(
−
1
2
∑
i
=
1
m
(
x
(
i
)
−
μ
)
T
Σ
−
1
(
x
(
i
)
−
μ
)
)
\begin{aligned} L(\mu, \Sigma) &=\prod_{i=1}^{m} f\left(x^{(i)} ; \mu, \Sigma\right) \\ &=\prod_{i=1}^{m}(2 \pi)^{-\frac{n}{2}}|\Sigma|^{-\frac{1}{2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)\right) \\ &=(2 \pi)^{-\frac{m n}{2}}|\Sigma|^{-\frac{m}{2}} \exp \left(-\frac{1}{2} \sum_{i=1}^{m}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)\right) \end{aligned}
L(μ,Σ)=i=1∏mf(x(i);μ,Σ)=i=1∏m(2π)−2n∣Σ∣−21exp(−21(x(i)−μ)TΣ−1(x(i)−μ))=(2π)−2mn∣Σ∣−2mexp(−21i=1∑m(x(i)−μ)TΣ−1(x(i)−μ))
其对数似然函数是:
ln
L
(
μ
,
Σ
)
=
ln
(
2
π
)
−
m
n
2
+
ln
∣
Σ
∣
−
m
2
+
ln
exp
(
−
1
2
∑
i
=
1
m
(
x
(
i
)
−
μ
)
T
Σ
−
1
(
x
(
i
)
−
μ
)
)
=
−
m
n
2
ln
2
π
−
m
2
ln
∣
Σ
∣
−
1
2
∑
i
=
1
m
(
x
(
i
)
−
μ
)
T
Σ
−
1
(
x
(
i
)
−
μ
)
=
C
−
m
2
ln
∣
Σ
∣
−
1
2
∑
i
=
1
m
(
x
(
i
)
−
μ
)
T
Σ
−
1
(
x
(
i
)
−
μ
)
\begin{aligned} \ln L(\mu, \Sigma) &=\ln (2 \pi)^{-\frac{m n}{2}}+\ln |\Sigma|^{-\frac{m}{2}}+\ln \exp \left(-\frac{1}{2} \sum_{i=1}^{m}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)\right) \\ &=-\frac{m n}{2} \ln 2 \pi-\frac{m}{2} \ln |\Sigma|-\frac{1}{2} \sum_{i=1}^{m}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) \\ &=C-\frac{m}{2} \ln |\Sigma|-\frac{1}{2} \sum_{i=1}^{m}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) \end{aligned}
lnL(μ,Σ)=ln(2π)−2mn+ln∣Σ∣−2m+lnexp(−21i=1∑m(x(i)−μ)TΣ−1(x(i)−μ))=−2mnln2π−2mln∣Σ∣−21i=1∑m(x(i)−μ)TΣ−1(x(i)−μ)=C−2mln∣Σ∣−21i=1∑m(x(i)−μ)TΣ−1(x(i)−μ)
其中m和n是已知的,m为可观察样本的个数,n为单个样本的特征维数,C 是一个常数,
C
=
−
m
n
2
ln
2
π
C = -\frac{m n}{2} \ln 2 \pi
C=−2mnln2π。
求极值需要对μ和∑求偏导:
{ ∂ ln L ∂ μ = 0 ∂ ln L ∂ Σ = 0 \left\{\begin{array}{l} \frac{\partial \ln L}{\partial \mu}=0 \\ \frac{\partial \ln L}{\partial \Sigma}=0 \end{array}\right. {∂μ∂lnL=0∂Σ∂lnL=0
μ和∑是矩阵,涉及到矩阵的求导法则。先看对μ的求导, l n L \mathrm{lnL} lnL由3个因子组成,只有一个因子含有μ,因此:
∂ ln L ∂ μ = ∂ ∂ μ ( − 1 2 ∑ i = 1 m ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ) \frac{\partial \ln L}{\partial \mu}=\frac{\partial}{\partial \mu}\left(-\frac{1}{2} \sum_{i=1}^{m}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)\right) ∂μ∂lnL=∂μ∂(−21i=1∑m(x(i)−μ)TΣ−1(x(i)−μ))
其中:
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
=
(
x
T
−
μ
T
)
Σ
−
1
(
x
−
μ
)
=
(
x
T
Σ
−
1
−
μ
T
Σ
−
1
)
(
x
−
μ
)
=
x
T
Σ
−
1
x
−
x
T
Σ
−
1
μ
−
μ
T
Σ
−
1
x
+
μ
T
Σ
−
1
μ
\begin{aligned} (x-\mu)^{\mathrm{T}} \Sigma^{-1}(x-\mu) &=\left(x^{\mathrm{T}}-\mu^{\mathrm{T}}\right) \Sigma^{-1}(x-\mu) \\ &=\left(x^{\mathrm{T}} \Sigma^{-1}-\mu^{\mathrm{T}} \Sigma^{-1}\right)(x-\mu) \\ &=x^{\mathrm{T}} \Sigma^{-1} x-x^{\mathrm{T}} \Sigma^{-1} \mu-\mu^{\mathrm{T}} \Sigma^{-1} x+\mu^{\mathrm{T}} \Sigma^{-1} \mu \end{aligned}
(x−μ)TΣ−1(x−μ)=(xT−μT)Σ−1(x−μ)=(xTΣ−1−μTΣ−1)(x−μ)=xTΣ−1x−xTΣ−1μ−μTΣ−1x+μTΣ−1μ
上式中:
x
T
Σ
−
1
μ
=
[
x
1
x
2
⋯
x
n
]
Σ
−
1
[
μ
1
μ
2
⋮
μ
n
]
=
[
μ
1
μ
2
⋯
μ
n
]
Σ
−
1
[
x
1
x
2
⋮
x
n
]
=
μ
T
Σ
−
1
x
x^{\mathrm{T}} \Sigma^{-1} \mu=\left[\begin{array}{llll} x_{1} & x_{2} & \cdots & x_{n} \end{array}\right] \Sigma^{-1}\left[\begin{array}{c} \mu_{1} \\ \mu_{2} \\ \vdots \\ \mu_{n} \end{array}\right]=\left[\begin{array}{llll} \mu_{1} & \mu_{2} & \cdots & \mu_{n} \end{array}\right] \Sigma^{-1}\left[\begin{array}{c} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{array}\right]=\mu^{\mathrm{T}} \Sigma^{-1} x
xTΣ−1μ=[x1x2⋯xn]Σ−1⎣
⎡μ1μ2⋮μn⎦
⎤=[μ1μ2⋯μn]Σ−1⎣
⎡x1x2⋮xn⎦
⎤=μTΣ−1x
因此:
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
=
x
T
Σ
−
1
x
−
x
T
Σ
−
1
μ
−
μ
T
Σ
−
1
x
+
μ
T
Σ
−
1
μ
=
x
T
Σ
−
1
x
−
2
x
T
Σ
−
1
μ
+
μ
T
Σ
−
1
μ
∑
i
=
1
m
(
x
(
i
)
−
μ
)
T
Σ
−
1
(
x
(
i
)
−
μ
)
=
∑
i
=
1
m
(
x
(
i
)
T
Σ
−
1
x
(
i
)
−
2
x
(
i
)
T
Σ
−
1
μ
+
μ
T
Σ
−
1
μ
)
=
∑
i
=
1
m
x
(
i
)
T
Σ
−
1
x
(
i
)
−
2
∑
i
=
1
m
x
(
i
)
T
Σ
−
1
μ
+
m
μ
T
Σ
−
1
μ
\begin{aligned} (x-\mu)^{\mathrm{T}} \Sigma^{-1}(x-\mu) &=x^{\mathrm{T}} \Sigma^{-1} x-x^{\mathrm{T}} \Sigma^{-1} \mu-\mu^{\mathrm{T}} \Sigma^{-1} x+\mu^{\mathrm{T}} \Sigma^{-1} \mu \\ &=x^{\mathrm{T}} \Sigma^{-1} x-2 x^{\mathrm{T}} \Sigma^{-1} \mu+\mu^{\mathrm{T}} \Sigma^{-1} \mu \\ \sum_{i=1}^{m}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) &=\sum_{i=1}^{m}\left(x^{(i)^{\mathrm{T}}} \Sigma^{-1} x^{(i)}-2 x^{(i)^{\mathrm{T}}} \Sigma^{-1} \mu+\mu^{\mathrm{T}} \Sigma^{-1} \mu\right) \\ &=\sum_{i=1}^{m} x^{(i)^{\mathrm{T}}} \Sigma^{-1} x^{(i)}-2 \sum_{i=1}^{m} x^{(i)^{\mathrm{T}}} \Sigma^{-1} \mu+m \mu^{\mathrm{T}} \Sigma^{-1} \mu \end{aligned}
(x−μ)TΣ−1(x−μ)i=1∑m(x(i)−μ)TΣ−1(x(i)−μ)=xTΣ−1x−xTΣ−1μ−μTΣ−1x+μTΣ−1μ=xTΣ−1x−2xTΣ−1μ+μTΣ−1μ=i=1∑m(x(i)TΣ−1x(i)−2x(i)TΣ−1μ+μTΣ−1μ)=i=1∑mx(i)TΣ−1x(i)−2i=1∑mx(i)TΣ−1μ+mμTΣ−1μ
将此结论代入
∂
ln
L
∂
μ
\frac{\partial \ln L} {\partial \mu}
∂μ∂lnL中:
∂
ln
L
∂
μ
=
∂
∂
μ
(
−
1
2
(
∑
i
=
1
m
x
(
i
)
T
Σ
−
1
x
(
i
)
−
2
∑
i
=
1
m
x
(
i
)
T
Σ
−
1
μ
+
m
μ
T
Σ
−
1
μ
)
)
=
∂
∂
μ
(
−
1
2
∑
i
=
1
m
x
(
i
)
T
Σ
−
1
x
(
i
)
)
+
∂
∂
μ
(
∑
i
=
1
m
x
(
i
)
T
Σ
−
1
μ
)
−
1
2
∂
∂
μ
m
μ
T
Σ
−
1
μ
=
∂
∂
μ
(
∑
i
=
1
m
x
(
i
)
T
Σ
−
1
μ
)
⏟
a
1
−
1
2
∂
∂
μ
m
μ
T
Σ
−
1
μ
⏟
a
2
\begin{aligned} \frac{\partial \ln L}{\partial \mu} &=\frac{\partial}{\partial \mu}\left(-\frac{1}{2}\left(\sum_{i=1}^{m} x^{(i)^{\mathrm{T}}} \Sigma^{-1} x^{(i)}-2 \sum_{i=1}^{m} x^{(i)^{\mathrm{T}}} \Sigma^{-1} \mu+m \mu^{\mathrm{T}} \Sigma^{-1} \mu\right)\right) \\ &=\frac{\partial}{\partial \mu}\left(-\frac{1}{2} \sum_{i=1}^{m} x^{(i)^{\mathrm{T}}} \Sigma^{-1} x^{(i)}\right)+\frac{\partial}{\partial \mu}\left(\sum_{i=1}^{m} x^{(i)^{\mathrm{T}}} \Sigma^{-1} \mu\right)-\frac{1}{2} \frac{\partial}{\partial \mu} m \mu^{\mathrm{T}} \Sigma^{-1} \mu \\ &=\underbrace{\frac{\partial}{\partial \mu}\left(\sum_{i=1}^{m} x^{(i)^{\mathrm{T}}} \Sigma^{-1} \mu\right)}_{a_{1}}-\underbrace{\frac{1}{2} \frac{\partial}{\partial \mu} m \mu^{\mathrm{T}} \Sigma^{-1} \mu}_{a_{2}} \end{aligned}
∂μ∂lnL=∂μ∂(−21(i=1∑mx(i)TΣ−1x(i)−2i=1∑mx(i)TΣ−1μ+mμTΣ−1μ))=∂μ∂(−21i=1∑mx(i)TΣ−1x(i))+∂μ∂(i=1∑mx(i)TΣ−1μ)−21∂μ∂mμTΣ−1μ=a1
∂μ∂(i=1∑mx(i)TΣ−1μ)−a2
21∂μ∂mμTΣ−1μ
μ和∑是矩阵,根据矩阵的求导法则:
i f f ( X ) = A T X , t h e n d f d X = A if \quad f(\boldsymbol{X})=\boldsymbol{A}^{\mathrm{T}} \boldsymbol{X}, \quad then \frac{\mathrm{d} f}{\mathrm{~d} \boldsymbol{X}}=\boldsymbol{A} iff(X)=ATX,then dXdf=A
⇒
a
1
=
∑
i
=
1
m
(
x
(
i
)
T
Σ
−
1
)
T
=
∑
i
=
1
m
∑
−
1
T
x
(
i
)
\Rightarrow a_{1}=\sum_{i=1}^{m}\left(x^{(i)^{\mathrm{T}}} \Sigma^{-1}\right)^{\mathrm{T}}=\sum_{i=1}^{m} {\textstyle \sum^{-{ }^{\mathrm{1^{T}}}}} x^{(i)}
⇒a1=i=1∑m(x(i)TΣ−1)T=i=1∑m∑−1Tx(i)
因为
∑
−
1
\sum^{-1}
∑−1是一个对称矩阵,所以:
Σ
−
1
T
=
Σ
−
1
,
a
1
=
∑
i
=
1
m
Σ
−
1
T
x
(
i
)
=
∑
i
=
1
m
Σ
−
1
x
(
i
)
=
Σ
−
1
∑
i
=
1
m
x
(
i
)
\Sigma^{-1^{T}} = \Sigma^{-1}, a1 = \sum_{i=1}^{m}\Sigma^{-1^{T}}x^{(i)} = \sum_{i=1}^{m}\Sigma^{-1}x^{(i)} = \Sigma^{-1}\sum_{i=1}^{m}x^{(i)}
Σ−1T=Σ−1,a1=i=1∑mΣ−1Tx(i)=i=1∑mΣ−1x(i)=Σ−1i=1∑mx(i)
根据矩阵的求导法则:
i
f
f
(
X
)
=
X
T
A
X
,
t
h
e
n
d
f
d
X
=
A
X
+
A
T
X
if \quad f(X) = X^{T}AX, \quad then \quad \frac{df}{dX} = AX + A^{T}X
iff(X)=XTAX,thendXdf=AX+ATX
w
h
e
n
A
=
A
T
,
t
h
e
n
d
f
d
X
=
A
X
+
A
T
X
=
2
A
X
when \quad A=A^{T},\quad then \quad \frac{df}{dX} = AX + A^{T}X =2AX
whenA=AT,thendXdf=AX+ATX=2AX
⇒
a
2
=
1
2
∂
(
m
μ
T
Σ
−
1
)
∂
μ
=
m
Σ
−
1
μ
\Rightarrow a_{2} = \frac{1}{2}\frac{\partial( m\mu^{T}\Sigma^{-1})}{\partial \mu}=m\Sigma^{-1}\mu
⇒a2=21∂μ∂(mμTΣ−1)=mΣ−1μ
将 a 1 , a 2 a_{1},a_{2} a1,a2代入 ∂ ln L ∂ μ \frac{\partial \ln L}{\partial \mu} ∂μ∂lnL中:
∂ ln L ∂ μ = a 1 + a 2 = ∑ i = 1 m ∑ − 1 x ( i ) − m ∑ − 1 μ = 0 μ ^ = ∑ − 1 ∑ i = 1 m x ( i ) m ∑ − 1 = 1 m ∑ i = 1 m x ( i ) = x ˉ \begin{array}{c} \frac{\partial \ln L}{\partial \mu}=a_{1}+a_{2}=\sum_{i=1}^{m} \sum^{-1} x^{(i)}-m \sum^{-1} \mu=0 \\ \hat{\mu}=\frac{\sum^{-1} \sum_{i=1}^{m} x^{(i)}}{m \sum^{-1}}=\frac{1}{m} \sum_{i=1}^{m} x^{(i)}=\bar{x} \end{array} ∂μ∂lnL=a1+a2=∑i=1m∑−1x(i)−m∑−1μ=0μ^=m∑−1∑−1∑i=1mx(i)=m1∑i=1mx(i)=xˉ
再看对 Σ \Sigma Σ 求偏导:
∂ ln L ∂ Σ = ∂ ∂ Σ ( C − m 2 ln ∣ Σ ∣ − 1 2 ∑ i = 1 m ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ) = − m 2 ∂ ∂ Σ ln ∣ Σ ∣ ⏟ b 1 − 1 2 ∂ ∂ ∑ ∑ i = 1 m ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ⏟ b 2 \begin{aligned} \frac{\partial \ln L}{\partial \Sigma} &=\frac{\partial}{\partial \Sigma}\left(C-\frac{m}{2} \ln |\Sigma|-\frac{1}{2} \sum_{i=1}^{m}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)\right) \\ &=-\frac{m}{2} \underbrace{\frac{\partial}{\partial \Sigma} \ln |\Sigma|}_{b_{1}}-\frac{1}{2} \underbrace{\frac{\partial}{\partial \sum} \sum_{i=1}^{m}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)}_{b_{2}} \end{aligned} ∂Σ∂lnL=∂Σ∂(C−2mln∣Σ∣−21i=1∑m(x(i)−μ)TΣ−1(x(i)−μ))=−2mb1 ∂Σ∂ln∣Σ∣−21b2 ∂∑∂i=1∑m(x(i)−μ)TΣ−1(x(i)−μ)
Σ
\Sigma
Σ 和
Σ
−
1
\Sigma^{-1}
Σ−1都是实对称矩阵,根据矩阵的求导法则,当A是实对称矩阵是:
∂
ln
A
∂
A
=
A
−
1
⇒
b
1
=
∂
ln
∣
Σ
∣
∂
Σ
=
Σ
−
1
\frac{\partial \ln A}{\partial A} = A^{-1} \Rightarrow b_{1} =\frac{\partial \ln |\Sigma|}{\partial \Sigma} =\Sigma^{-1}
∂A∂lnA=A−1⇒b1=∂Σ∂ln∣Σ∣=Σ−1
再看
b
2
b_{2}
b2。设
ω
,
p
,
q
\omega, p,q
ω,p,q 是
Σ
\Sigma
Σ第
p
p
p 行第
q
q
q列的元素,
E
p
q
E_{pq}
Epq是一个第
p
p
p 行第
q
q
q 列元素为1,其他元素全为0的矩阵,
E
E
E与
Σ
−
1
\Sigma^{-1}
Σ−1同阶。根据矩阵的求导公式:
∂
X
−
1
∂
x
=
−
X
−
1
∂
X
∂
x
X
−
1
⇒
∂
Σ
−
1
∂
ω
p
q
=
−
Σ
−
1
∂
Σ
∂
ω
p
q
Σ
−
1
=
−
Σ
−
1
E
p
q
Σ
−
1
⇒
∂
(
x
(
i
)
−
μ
)
T
Σ
−
1
(
x
(
i
)
−
μ
)
∂
ω
p
q
=
(
x
(
i
)
−
μ
)
T
∂
Σ
−
1
∂
ω
p
q
(
x
(
i
)
−
μ
)
=
(
x
(
i
)
−
μ
)
T
(
−
Σ
−
1
E
p
q
Σ
−
1
)
(
x
(
i
)
−
μ
)
=
−
(
x
(
i
)
−
μ
)
T
(
Σ
−
1
E
p
q
Σ
−
1
)
(
x
(
i
)
−
μ
)
\begin{aligned} \frac{\partial \boldsymbol{X}^{-1}}{\partial x}=-\boldsymbol{X}^{-1} \frac{\partial \boldsymbol{X}}{\partial x} \boldsymbol{X}^{-1} \\ \Rightarrow \frac{\partial \Sigma^{-1}}{\partial \omega_{p q}}=-\Sigma^{-1} \frac{\partial \Sigma}{\partial \omega_{p q}} \Sigma^{-1}=&-\Sigma^{-1} E_{p q} \Sigma^{-1} \\ \Rightarrow \frac{\partial\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)}{\partial \omega_{p q}} &=\left(x^{(i)}-\mu\right)^{\mathrm{T}} \frac{\partial \Sigma^{-1}}{\partial \omega_{p q}}\left(x^{(i)}-\mu\right) \\ &=\left(x^{(i)}-\mu\right)^{\mathrm{T}}\left(-\Sigma^{-1} E_{p q} \Sigma^{-1}\right)\left(x^{(i)}-\mu\right) \\ &=-\left(x^{(i)}-\mu\right)^{\mathrm{T}}\left(\Sigma^{-1} E_{p q} \Sigma^{-1}\right)\left(x^{(i)}-\mu\right) \end{aligned}
∂x∂X−1=−X−1∂x∂XX−1⇒∂ωpq∂Σ−1=−Σ−1∂ωpq∂ΣΣ−1=⇒∂ωpq∂(x(i)−μ)TΣ−1(x(i)−μ)−Σ−1EpqΣ−1=(x(i)−μ)T∂ωpq∂Σ−1(x(i)−μ)=(x(i)−μ)T(−Σ−1EpqΣ−1)(x(i)−μ)=−(x(i)−μ)T(Σ−1EpqΣ−1)(x(i)−μ)
已经知道了
Σ
−
1
\Sigma^{-1}
Σ−1是一个对称矩阵,矩阵乘法满足结合律,在不改变矩阵顺序的条件下可以任意加括号:
∂
(
x
(
i
)
−
μ
)
T
Σ
−
1
(
x
(
i
)
−
μ
)
∂
ω
p
q
=
−
(
x
(
i
)
−
μ
)
T
(
Σ
−
1
E
p
q
Σ
−
1
)
(
x
(
i
)
−
μ
)
=
−
(
(
x
(
i
)
−
μ
)
T
Σ
−
1
)
E
p
q
(
Σ
−
1
(
x
(
i
)
−
μ
)
)
=
−
(
(
x
(
i
)
−
μ
)
T
(
Σ
−
1
)
T
)
E
p
q
(
Σ
−
1
(
x
(
i
)
−
μ
)
)
=
−
(
Σ
−
1
(
x
(
i
)
−
μ
)
)
T
⏟
A
T
B
T
=
(
A
B
)
T
E
p
q
(
Σ
−
1
(
x
(
i
)
−
μ
)
)
=
−
(
Σ
−
1
(
x
(
i
)
−
μ
)
)
p
T
(
Σ
−
1
(
x
(
i
)
−
μ
)
)
q
\begin{aligned} \frac{\partial\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)}{\partial \omega_{p q}} &=-\left(x^{(i)}-\mu\right)^{\mathrm{T}}\left(\Sigma^{-1} E_{p q} \Sigma^{-1}\right)\left(x^{(i)}-\mu\right) \\ &=-\left(\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\right) E_{p q}\left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right) \\ &=-\left(\left(x^{(i)}-\mu\right)^{\mathrm{T}}\left(\Sigma^{-1}\right)^{\mathrm{T}}\right) E_{p q}\left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right) \\ &=-\underbrace{\left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right)^{\mathrm{T}}}_{A^{\mathrm{T}} B^{\mathrm{T}}=(A B)^{\mathrm{T}}} E_{p q}\left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right) \\ &=-\left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right)_{p}^{\mathrm{T}}\left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right)_{q} \end{aligned}
∂ωpq∂(x(i)−μ)TΣ−1(x(i)−μ)=−(x(i)−μ)T(Σ−1EpqΣ−1)(x(i)−μ)=−((x(i)−μ)TΣ−1)Epq(Σ−1(x(i)−μ))=−((x(i)−μ)T(Σ−1)T)Epq(Σ−1(x(i)−μ))=−ATBT=(AB)T
(Σ−1(x(i)−μ))TEpq(Σ−1(x(i)−μ))=−(Σ−1(x(i)−μ))pT(Σ−1(x(i)−μ))q
其中
(
Σ
−
1
(
x
(
i
)
−
μ
)
)
T
(\Sigma^{-1}(x^{(i)}-\mu))^{T}
(Σ−1(x(i)−μ))T是一个1 * n的矩阵,
(
Σ
−
1
(
x
(
i
)
−
μ
)
)
p
T
(\Sigma^{-1}(x^{(i)}-μ))_{p}^{T}
(Σ−1(x(i)−μ))pT表示矩阵中的第p个元素;
Σ
−
1
(
x
(
i
)
−
μ
)
\Sigma^{-1}(x^{(i)}-\mu)
Σ−1(x(i)−μ)是一个n*1的矩阵,
(
Σ
−
1
(
x
(
i
)
−
μ
)
)
q
(\Sigma^{-1}(x^{(i)}-μ))_{q}
(Σ−1(x(i)−μ))q表示矩阵中的第q个元素。将该结论推广到矩阵对矩阵的的求导,根据矩阵对矩阵的求导公式:
[ ∂ F ∂ x 11 ∂ F ∂ x 12 ⋯ ∂ F ∂ x 1 s ∂ F ∂ x 21 ∂ F ∂ x 22 ⋯ ∂ F ∂ x 2 s ⋮ ⋮ ⋱ ⋮ ∂ F ∂ x r 1 ∂ F ∂ x r 2 ⋯ ∂ F ∂ x r s ] ∂ ∂ Σ ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) = − [ ∂ ∂ ω 11 ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ∂ ∂ ω 12 ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ⋯ ∂ ∂ ω 1 n ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ∂ ∂ ω 21 ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ∂ ∂ ω 22 ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ⋯ ∂ ∂ ω 2 n ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ⋮ ⋮ ⋱ ⋮ ∂ ∂ ω n 1 ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ∂ ∂ ω n 1 ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ⋯ ∂ ∂ ω n n ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ] \begin{array}{l} \left[\begin{array}{cccc} \frac{\partial \boldsymbol{F}}{\partial x_{11}} & \frac{\partial \boldsymbol{F}}{\partial x_{12}} & \cdots & \frac{\partial \boldsymbol{F}}{\partial x_{1 s}} \\ \frac{\partial \boldsymbol{F}}{\partial x_{21}} & \frac{\partial \boldsymbol{F}}{\partial x_{22}} & \cdots & \frac{\partial \boldsymbol{F}}{\partial x_{2 s}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial \boldsymbol{F}}{\partial x_{r 1}} & \frac{\partial \boldsymbol{F}}{\partial x_{r 2}} & \cdots & \frac{\partial \boldsymbol{F}}{\partial x_{r s}} \end{array}\right]\\ \\ \\ \frac{\partial}{\partial \Sigma}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)\\ =-\left[\begin{array}{cccc} \frac{\partial}{\partial \omega_{11}}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) & \frac{\partial}{\partial \omega_{12}}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) & \cdots & \frac{\partial}{\partial \omega_{1 n}}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) \\ \frac{\partial}{\partial \omega_{21}}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) & \frac{\partial}{\partial \omega_{22}}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) & \cdots & \frac{\partial}{\partial \omega_{2 n}}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial}{\partial \omega_{n 1}}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) & \frac{\partial}{\partial \omega_{n 1}}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) & \cdots & \frac{\partial}{\partial \omega_{n n}}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) \end{array}\right] \end{array} ⎣ ⎡∂x11∂F∂x21∂F⋮∂xr1∂F∂x12∂F∂x22∂F⋮∂xr2∂F⋯⋯⋱⋯∂x1s∂F∂x2s∂F⋮∂xrs∂F⎦ ⎤∂Σ∂(x(i)−μ)TΣ−1(x(i)−μ)=−⎣ ⎡∂ω11∂(x(i)−μ)TΣ−1(x(i)−μ)∂ω21∂(x(i)−μ)TΣ−1(x(i)−μ)⋮∂ωn1∂(x(i)−μ)TΣ−1(x(i)−μ)∂ω12∂(x(i)−μ)TΣ−1(x(i)−μ)∂ω22∂(x(i)−μ)TΣ−1(x(i)−μ)⋮∂ωn1∂(x(i)−μ)TΣ−1(x(i)−μ)⋯⋯⋱⋯∂ω1n∂(x(i)−μ)TΣ−1(x(i)−μ)∂ω2n∂(x(i)−μ)TΣ−1(x(i)−μ)⋮∂ωnn∂(x(i)−μ)TΣ−1(x(i)−μ)⎦ ⎤
其中:
A
2
=
[
(
Σ
−
1
(
x
(
i
)
−
μ
)
)
1
(
Σ
−
1
(
x
(
i
)
−
μ
)
)
2
⋯
(
Σ
−
1
(
x
(
i
)
−
μ
)
)
n
]
=
(
Σ
−
1
(
x
(
i
)
−
μ
)
)
T
A_{2}=\left[\begin{array}{llll} \left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right)_{1} & \left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right)_{2} & \cdots & \left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right)_{n} \end{array}\right]=\left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right)^{\mathrm{T}}
A2=[(Σ−1(x(i)−μ))1(Σ−1(x(i)−μ))2⋯(Σ−1(x(i)−μ))n]=(Σ−1(x(i)−μ))T
在 A 1 A_{1} A1中, ( Σ − 1 ( x ( i ) − μ ) ) T (\Sigma^{-1}(x^{(i)}-\mu))^{T} (Σ−1(x(i)−μ))T是一个1 * n的矩阵, ( Σ − 1 ( x ( i ) − μ ) ) i T (\Sigma^{-1}(x^{(i)}-\mu))^{T}_{i} (Σ−1(x(i)−μ))iT表示矩阵中的第i个元素,是一个标量; Σ − 1 ( x ( i ) − μ ) \Sigma^{-1}(x^{(i)}-\mu) Σ−1(x(i)−μ)是一个n*1的矩阵, ( Σ − 1 ( x ( i ) − μ ) ) i (\Sigma^{-1}(x^{(i)}-\mu))_{i} (Σ−1(x(i)−μ))i表示矩阵中的第i个元素,也是一个标量,因此:
( Σ − 1 ( x ( i ) − μ ) ) i T = ( Σ − 1 ( x ( i ) − μ ) ) i ∂ ∂ Σ ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) = − A 1 A 2 = − Σ − 1 ( x ( i ) − μ ) ( Σ − 1 ( x ( i ) − μ ) ) T = − Σ − 1 ( x ( i ) − μ ) ( x ( i ) − μ ) T ( Σ − 1 ) T = − Σ − 1 ( x ( i ) − μ ) ( x ( i ) − μ ) T Σ − 1 \begin{array}{l} \left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right)_{i}^{\mathrm{T}}=\left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right)_{i}\\ \frac{\partial}{\partial \Sigma}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)=-A_{1} A_{2}\\ =-\Sigma^{-1}\left(x^{(i)}-\mu\right)\left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right)^{\mathrm{T}}\\ =-\Sigma^{-1}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}}\left(\Sigma^{-1}\right)^{\mathrm{T}}\\ =-\Sigma^{-1}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1} \end{array} (Σ−1(x(i)−μ))iT=(Σ−1(x(i)−μ))i∂Σ∂(x(i)−μ)TΣ−1(x(i)−μ)=−A1A2=−Σ−1(x(i)−μ)(Σ−1(x(i)−μ))T=−Σ−1(x(i)−μ)(x(i)−μ)T(Σ−1)T=−Σ−1(x(i)−μ)(x(i)−μ)TΣ−1
终于可以求得
b
2
b_{2}
b2了:
b
2
=
∂
∂
Σ
∑
i
=
1
m
(
x
(
i
)
−
μ
)
T
Σ
−
1
(
x
(
i
)
−
μ
)
=
∑
i
=
1
m
(
−
Σ
−
1
(
x
(
i
)
−
μ
)
(
x
(
i
)
−
μ
)
T
Σ
−
1
)
b_{2}=\frac{\partial}{\partial \Sigma} \sum_{i=1}^{m}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)=\sum_{i=1}^{m}\left(-\Sigma^{-1}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\right)
b2=∂Σ∂i=1∑m(x(i)−μ)TΣ−1(x(i)−μ)=i=1∑m(−Σ−1(x(i)−μ)(x(i)−μ)TΣ−1)
现在可以看看最终的似然函数:
∂ ln L ∂ Σ = − m 2 ∂ ∂ Σ ln ∣ Σ ∣ ⏟ b 1 − 1 2 ∂ ∂ Σ ∑ i = 1 m ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ⏟ b 2 = − m 2 Σ − 1 − 1 2 ∑ i = 1 m ( − Σ − 1 ( x ( i ) − μ ) ( x ( i ) − μ ) T Σ − 1 ) \begin{aligned} \frac{\partial \ln L}{\partial \Sigma} &=-\frac{m}{2} \underbrace{\frac{\partial}{\partial \Sigma} \ln \left|\Sigma\right|}_{b_{1}}-\frac{1}{2} \underbrace{\frac{\partial}{\partial \Sigma} \sum_{i=1}^{m}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)}_{b_{2}} \\ &=-\frac{m}{2} \Sigma^{-1}-\frac{1}{2} \sum_{i=1}^{m}\left(-\Sigma^{-1}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\right) \end{aligned} ∂Σ∂lnL=−2mb1 ∂Σ∂ln∣Σ∣−21b2 ∂Σ∂i=1∑m(x(i)−μ)TΣ−1(x(i)−μ)=−2mΣ−1−21i=1∑m(−Σ−1(x(i)−μ)(x(i)−μ)TΣ−1)
I
I
I是单位矩阵,
Σ
−
1
.
I
=
Σ
−
1
\Sigma^{-1}.I = \Sigma^{-1}
Σ−1.I=Σ−1
∂
ln
L
∂
Σ
=
−
m
2
Σ
−
1
I
−
1
2
∑
i
=
1
m
(
−
Σ
−
1
(
x
(
i
)
−
μ
)
(
x
(
i
)
−
μ
)
T
Σ
−
1
)
=
−
m
2
Σ
−
1
I
+
1
2
∑
i
=
1
m
(
Σ
−
1
(
x
(
i
)
−
μ
)
(
x
(
i
)
−
μ
)
T
Σ
−
1
)
=
−
1
2
Σ
−
1
(
m
I
−
∑
i
=
1
m
(
x
(
i
)
−
μ
)
(
x
(
i
)
−
μ
)
T
Σ
−
1
)
=
−
1
2
Σ
−
1
(
m
Σ
Σ
−
1
−
∑
i
=
1
m
(
x
(
i
)
−
μ
)
(
x
(
i
)
−
μ
)
T
Σ
−
1
)
=
−
1
2
Σ
−
1
(
m
Σ
−
∑
i
=
1
m
(
x
(
i
)
−
μ
)
(
x
(
i
)
−
μ
)
T
)
Σ
−
1
=
0
\begin{aligned} \frac{\partial \ln L}{\partial \Sigma} &=-\frac{m}{2} \Sigma^{-1} I-\frac{1}{2} \sum_{i=1}^{m}\left(-\Sigma^{-1}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\right) \\ &=-\frac{m}{2} \Sigma^{-1} I+\frac{1}{2} \sum_{i=1}^{m}\left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\right) \\ &=-\frac{1}{2} \Sigma^{-1}\left(m \boldsymbol{I}-\sum_{i=1}^{m}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\right) \\ &=-\frac{1}{2} \Sigma^{-1}\left(m \Sigma \Sigma^{-1}-\sum_{i=1}^{m}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\right) \\ &=-\frac{1}{2} \Sigma^{-1}\left(m \Sigma-\sum_{i=1}^{m}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}}\right) \Sigma^{-1} \\ &=0 \end{aligned}
∂Σ∂lnL=−2mΣ−1I−21i=1∑m(−Σ−1(x(i)−μ)(x(i)−μ)TΣ−1)=−2mΣ−1I+21i=1∑m(Σ−1(x(i)−μ)(x(i)−μ)TΣ−1)=−21Σ−1(mI−i=1∑m(x(i)−μ)(x(i)−μ)TΣ−1)=−21Σ−1(mΣΣ−1−i=1∑m(x(i)−μ)(x(i)−μ)TΣ−1)=−21Σ−1(mΣ−i=1∑m(x(i)−μ)(x(i)−μ)T)Σ−1=0
等号两侧同时左乘
Σ
\Sigma
Σ :
Σ
(
−
1
2
Σ
−
1
)
(
m
Σ
−
∑
i
=
1
m
(
x
(
i
)
−
μ
)
(
x
(
i
)
−
μ
)
T
)
Σ
−
1
=
Σ
0
−
1
2
I
(
m
Σ
−
∑
i
=
1
m
(
x
(
i
)
−
μ
)
(
x
(
i
)
−
μ
)
T
)
Σ
−
1
=
0
(
m
Σ
−
∑
i
=
1
m
(
x
(
i
)
−
μ
)
(
x
(
i
)
−
μ
)
T
)
Σ
−
1
=
0
\begin{equation*} \begin{aligned} \Sigma\left(-\frac{1}{2} \Sigma^{-1}\right) &\left(m \Sigma-\sum_{i=1}^{m}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}}\right) \Sigma^{-1} &= \Sigma 0 \\ -\frac{1}{2} I\left(m \Sigma-\sum_{i=1}^{m}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}}\right) \Sigma^{-1} &=0 \\ \left(m \Sigma-\sum_{i=1}^{m}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}}\right) \Sigma^{-1} &=0 \end{aligned} \end{equation*}
Σ(−21Σ−1)−21I(mΣ−i=1∑m(x(i)−μ)(x(i)−μ)T)Σ−1(mΣ−i=1∑m(x(i)−μ)(x(i)−μ)T)Σ−1(mΣ−i=1∑m(x(i)−μ)(x(i)−μ)T)Σ−1=0=0=Σ0
两侧同时右乘
Σ
\Sigma
Σ :
(
m
Σ
−
∑
i
=
1
m
(
x
(
i
)
−
μ
)
(
x
(
i
)
−
μ
)
T
)
Σ
−
1
Σ
=
0
Σ
m
Σ
−
∑
i
=
1
m
(
x
(
i
)
−
μ
)
(
x
(
i
)
−
μ
)
T
=
0
\begin{aligned} \left(m \Sigma-\sum_{i=1}^{m}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}}\right) \Sigma^{-1}\Sigma &=0\Sigma\\ m \Sigma-\sum_{i=1}^{m}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}}&=0 \end{aligned}
(mΣ−i=1∑m(x(i)−μ)(x(i)−μ)T)Σ−1ΣmΣ−i=1∑m(x(i)−μ)(x(i)−μ)T=0Σ=0
最终解得:
Σ
=
1
m
∑
i
=
1
m
(
x
(
i
)
−
μ
)
(
x
(
i
)
−
μ
)
T
\begin{aligned} \Sigma=\frac{1}{m}\sum_{i=1}^{m}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}} \end{aligned}
Σ=m1i=1∑m(x(i)−μ)(x(i)−μ)T
综上所述,多维正态分布的最大似然估计量是:
μ
^
=
1
m
∑
i
=
1
m
x
(
i
)
=
x
ˉ
\hat{\mu} = \frac{1}{m} \sum^{m}_{i=1}x^{(i)}=\bar{x}
μ^=m1i=1∑mx(i)=xˉ
Σ
=
1
m
∑
i
=
1
m
(
x
(
i
)
−
μ
)
(
x
(
i
)
−
μ
)
T
\Sigma = \frac{1}{m}\sum_{i=1}^{m}(x^{(i)}-\mu)(x^{(i)}-\mu)^{T}
Σ=m1i=1∑m(x(i)−μ)(x(i)−μ)T