先试着证明一些小结论(可能不规范)
先记p×p方阵
A
A
A满足,
A
=
(
a
11
a
12
⋯
a
1
p
a
21
a
22
⋯
a
2
p
⋮
⋮
⋮
a
p
1
a
p
2
⋯
a
p
p
)
A= \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1p}\\ a_{21} & a_{22} & \cdots & a_{2p}\\ \vdots & \vdots & & \vdots\\ a_{p1} & a_{p2} & \cdots & a_{pp}\\ \end{pmatrix}
A=
a11a21⋮ap1a12a22⋮ap2⋯⋯⋯a1pa2p⋮app
矩阵的行列式对矩阵求导(分母布局)
因为行列式可以写为,
∣
A
∣
=
a
i
1
A
i
1
+
.
.
.
+
a
i
j
A
i
j
+
.
.
.
+
a
i
p
A
i
p
|A|=a_{i1}A_{i1}+...+a_{ij}A_{ij}+...+a_{ip}A_{ip}
∣A∣=ai1Ai1+...+aijAij+...+aipAip
所以有,
∂
∣
A
∣
∂
a
i
j
=
A
i
j
\frac{\partial |A|}{\partial a_{ij}}=A_{ij}
∂aij∂∣A∣=Aij
根据分母布局的原理,
∂
∣
A
∣
∂
A
=
(
A
∗
)
T
\frac{\partial |A|}{\partial A}={(A^{*})}^T
∂A∂∣A∣=(A∗)T
矩阵的逆对矩阵求导(分母布局)
首先逆矩阵的定义,
A
A
−
1
=
I
AA^{-1}=I
AA−1=I
直接两边对A求导,
A
−
1
∂
A
∂
A
+
A
∂
A
−
1
∂
A
=
O
A^{-1}\frac{\partial A}{\partial A}+A\frac{\partial A^{-1}}{\partial A}=O
A−1∂A∂A+A∂A∂A−1=O
直接就能得到,
∂
A
−
1
∂
A
=
−
(
A
−
1
)
2
\frac{\partial A^{-1}}{\partial A}=-{(A^{-1})}^2
∂A∂A−1=−(A−1)2
还有一些常见求导公式(不证明了)
这里
A
A
A为矩阵,
x
x
x为向量。
∂
x
T
A
x
∂
x
=
(
A
T
+
A
)
x
\frac{\partial x^TAx}{\partial x} = (A^T+A)x
∂x∂xTAx=(AT+A)x
∂
x
T
A
x
∂
A
=
x
x
T
\frac{\partial x^TAx}{\partial A} = xx^T
∂A∂xTAx=xxT
多元正态分布参数的MLE估计值
首先,多元正态分布的公式为,
N
(
μ
,
Σ
)
=
1
(
2
π
)
p
/
2
1
∣
Σ
∣
1
/
2
exp
(
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
)
N(\mu,\Sigma)=\frac{1}{{(2\pi)}^{p/2}}\frac{1}{{|\Sigma|}^{1/2}}\exp{(-\frac{1}{2}{(x-\mu)}^T\Sigma^{-1}{(x-\mu)})}
N(μ,Σ)=(2π)p/21∣Σ∣1/21exp(−21(x−μ)TΣ−1(x−μ))
记有
n
n
n个样本,
x
1
,
x
2
,
.
.
.
.
.
.
,
x
n
x_1,x_2,......,x_n
x1,x2,......,xn,似然函数为,
L
(
μ
,
Σ
)
=
∏
i
=
1
n
1
(
2
π
)
p
/
2
1
∣
Σ
∣
1
/
2
exp
(
−
1
2
(
x
i
−
μ
)
T
Σ
−
1
(
x
i
−
μ
)
)
L(\mu,\Sigma)=\prod_{i=1}^n{\frac{1}{{(2\pi)}^{p/2}}\frac{1}{{|\Sigma|}^{1/2}}\exp{(-\frac{1}{2}{(x_i-\mu)}^T\Sigma^{-1}{(x_i-\mu)})}}
L(μ,Σ)=i=1∏n(2π)p/21∣Σ∣1/21exp(−21(xi−μ)TΣ−1(xi−μ))
对数似然函数为,
ln
L
(
μ
,
Σ
)
=
−
n
p
2
ln
(
2
π
)
−
n
2
ln
∣
Σ
∣
−
1
2
∑
i
=
1
n
(
(
x
i
−
μ
)
T
Σ
−
1
(
x
i
−
μ
)
)
\ln L(\mu,\Sigma)=-\frac{np}{2}\ln{(2\pi)}-\frac{n}{2}\ln{|\Sigma|}-\frac{1}{2}\sum_{i=1}^n{({(x_i-\mu)}^T\Sigma^{-1}{(x_i-\mu)})}
lnL(μ,Σ)=−2npln(2π)−2nln∣Σ∣−21i=1∑n((xi−μ)TΣ−1(xi−μ))
分别对
μ
\mu
μ和
Σ
\Sigma
Σ求导,
∂
ln
L
∂
μ
=
∑
i
=
1
n
Σ
−
1
(
x
i
−
μ
)
=
0
\frac{\partial \ln L}{\partial \mu}=\sum_{i=1}^n{\Sigma^{-1}(x_i-\mu)}=0
∂μ∂lnL=i=1∑nΣ−1(xi−μ)=0
∂
ln
L
∂
Σ
=
−
n
Σ
∗
2
∣
Σ
∣
+
1
2
Σ
−
2
∑
i
=
1
n
(
x
i
−
μ
)
(
x
i
−
μ
)
T
=
0
\frac{\partial \ln L}{\partial \Sigma}=-\frac{n\Sigma^*}{2|\Sigma|}+\frac{1}{2}\Sigma^{-2}\sum_{i=1}^n{(x_i-\mu){(x_i-\mu)}^T}=0
∂Σ∂lnL=−2∣Σ∣nΣ∗+21Σ−2i=1∑n(xi−μ)(xi−μ)T=0
上式可直接得到,
μ
^
=
1
n
∑
i
=
1
n
x
i
\hat{\mu}=\frac{1}{n}\sum_{i=1}^n{x_i}
μ^=n1i=1∑nxi
下式可先化简,
n
2
Σ
−
1
=
1
2
Σ
−
2
∑
i
=
1
n
(
x
i
−
μ
)
(
x
i
−
μ
)
T
\frac{n}{2}\Sigma^{-1}=\frac{1}{2}\Sigma^{-2}\sum_{i=1}^n{(x_i-\mu){(x_i-\mu)}^T}
2nΣ−1=21Σ−2i=1∑n(xi−μ)(xi−μ)T
进而得到,
Σ
^
=
1
n
∑
i
=
1
n
(
x
i
−
μ
^
)
(
x
i
−
μ
^
)
T
\hat{\Sigma}=\frac{1}{n}\sum_{i=1}^n{(x_i-\hat{\mu}){(x_i-\hat{\mu})}^T}
Σ^=n1i=1∑n(xi−μ^)(xi−μ^)T