多维正态分布的极大似然估计

二维正态分布

(X,Y)服从参数为μ1, μ2, σ1, σ2, ρ的二维正态分布,记作(X, Y)~N(μ1, μ2, σ1, σ2, ρ),它的密度函数:
f ( x , y ) = 1 2 π σ 1 σ 2 1 − ρ 2 exp ⁡ ( − 1 2 ( 1 − ρ 2 ) [ ( x − μ 1 ) 2 σ 1 2 − 2 ρ ( x − μ 1 ) ( y − μ 2 ) 2 σ 1 σ 2 + ( y − μ 2 ) 2 σ 2 2 ] ) = 1 ( 2 π ) 2 σ 1 σ 2 1 − ρ 2 exp ⁡ ( − 1 2 ( 1 − ρ 2 ) [ ( x − μ 1 ) 2 σ 1 2 − 2 ρ ( x − μ 1 ) ( y − μ 2 ) 2 σ 1 σ 2 + ( y − μ 2 ) 2 σ 2 2 ] ) \begin{array}{l} f\left(x, y\right) \\ \quad=\frac{1}{2 \pi \sigma_{1} \sigma_{2} \sqrt{1-\rho^{2}}} \exp \left(-\frac{1}{2\left(1-\rho^{2}\right)}\left[\frac{\left(x-\mu_{1}\right)^{2}}{\sigma_{1}^{2}}-2 \rho \frac{\left(x-\mu_{1}\right)\left(y-\mu_{2}\right)}{2 \sigma_{1} \sigma_{2}}+\frac{\left(y-\mu_{2}\right)^{2}}{\sigma_{2}^{2}}\right]\right) \\ \quad=\frac{1}{(\sqrt{2 \pi})^{2} \sigma_{1} \sigma_{2} \sqrt{1-\rho^{2}}} \exp \left(-\frac{1}{2\left(1-\rho^{2}\right)}\left[\frac{\left(x-\mu_{1}\right)^{2}}{\sigma_{1}^{2}}-2 \rho \frac{\left(x-\mu_{1}\right)\left(y-\mu_{2}\right)}{2 \sigma_{1} \sigma_{2}}+\frac{\left(y-\mu_{2}\right)^{2}}{\sigma_{2}^{2}}\right]\right) \end{array} f(x,y)=2πσ1σ21ρ2 1exp(2(1ρ2)1[σ12(xμ1)22ρ2σ1σ2(xμ1)(yμ2)+σ22(yμ2)2])=(2π )2σ1σ21ρ2 1exp(2(1ρ2)1[σ12(xμ1)22ρ2σ1σ2(xμ1)(yμ2)+σ22(yμ2)2])
其中μ1是第1维度的均值,σ1是第1维度的方差,ρ是将两个维度的相关性规范到-1到+1之间的统计量,称为样本的相关系数,定义为:
ρ = COV ⁡ ( X , Y ) σ 1 σ 2 , ∣ ρ ∣ < 1 \rho=\frac{\operatorname{COV}(X, Y)}{\sigma_{1} \sigma_{2}}, \quad|\rho|<1 ρ=σ1σ2COV(X,Y),ρ<1
对于二维正态随机变量(X,Y),X和Y相互独立的充要条件是二者的协方差为0,也就是参数ρ=0。由于一维随机变量没有是否独立一说,ρ一定是0,因此没有在一维随机变量的正态分布中体现ρ。

多维正态分布

假设n维随机变量 x = [ x 1 , x 2 , ⋯   , x n ] T x=\left[x_{1}, x_{2}, \cdots, x_{n}\right]^{\mathrm{T}} x=[x1,x2,,xn]T的各个维度之间互不相关,且服从正态分布(维度不相关多元正态分布),各个维度的均值为 E ( x ) = [ μ 1 , μ 2 , ⋯   , μ n ] T E(x)=\left[\mu_{1}, \mu_{2}, \cdots, \mu_{n}\right]^{\mathrm{T}} E(x)=[μ1,μ2,,μn]T,各个维度的方差为 σ ( x ) = [ σ 1 , σ 2 , ⋯   , σ n ] T \sigma(x)=\left[\sigma_{1}, \sigma_{2}, \cdots, \sigma_{n}\right]^{\mathrm{T}} σ(x)=[σ1,σ2,,σn]T
用列向量的形式表示随机变量和参数,对于n维随机变量有:
x = [ x 1 x 2 ⋮ x n ] , μ = [ μ 1 μ 2 ⋮ μ n ] , σ = [ σ 1 σ 2 ⋮ σ n ] x=\left[\begin{array}{c} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{array}\right], \quad \mu=\left[\begin{array}{c} \mu_{1} \\ \mu_{2} \\ \vdots \\ \mu_{n} \end{array}\right], \quad \sigma=\left[\begin{array}{c} \sigma_{1} \\ \sigma_{2} \\ \vdots \\ \sigma_{n} \end{array}\right] x= x1x2xn ,μ= μ1μ2μn ,σ= σ1σ2σn
根据联合概率密度公式:
f ( x ) = p ( x 1 , x 2 … x n ) = p ( x 1 ) p ( x 2 ) … p ( x n ) = 1 2 π σ 1 exp ⁡ ( − 1 2 ( x 1 − μ 1 σ 1 ) 2 ) 1 2 π σ 2 exp ⁡ ( − 1 2 ( x 2 − μ 2 σ 2 ) 2 ) ⋯ 1 2 π σ n exp ⁡ ( − 1 2 ( x n − μ n σ n ) 2 ) = 1 ( 2 π ) n σ 1 σ 2 ⋯ σ n exp ⁡ ( − 1 2 [ ( x 1 − μ 1 σ 1 ) 2 + ( x 2 − μ 2 σ 2 ) 2 + ⋯ + ( x n − μ n σ n ) 2 ] ) \begin{array}{l} f(x)=p\left(x_{1}, x_{2} \ldots x_{n}\right)=p\left(x_{1}\right) p\left(x_{2}\right) \ldots p\left(x_{n}\right) \\ =\frac{1}{\sqrt{2 \pi} \sigma_{1}} \exp \left(-\frac{1}{2}\left(\frac{x_{1}-\mu_{1}}{\sigma_{1}}\right)^{2}\right) \frac{1}{\sqrt{2 \pi} \sigma_{2}} \exp \left(-\frac{1}{2}\left(\frac{x_{2}-\mu_{2}}{\sigma_{2}}\right)^{2}\right) \cdots \frac{1}{\sqrt{2 \pi} \sigma_{n}} \exp \left(-\frac{1}{2}\left(\frac{x_{n}-\mu_{n}}{\sigma_{n}}\right)^{2}\right)\\ =\frac{1}{(\sqrt{2 \pi})^{n} \sigma_{1} \sigma_{2} \cdots \sigma_{n}} \exp \left(-\frac{1}{2}\left[\left(\frac{x_{1}-\mu_{1}}{\sigma_{1}}\right)^{2}+\left(\frac{x_{2}-\mu_{2}}{\sigma_{2}}\right)^{2}+\cdots+\left(\frac{x_{n}-\mu_{n}}{\sigma_{n}}\right)^{2}\right]\right)\end{array} f(x)=p(x1,x2xn)=p(x1)p(x2)p(xn)=2π σ11exp(21(σ1x1μ1)2)2π σ21exp(21(σ2x2μ2)2)2π σn1exp(21(σnxnμn)2)=(2π )nσ1σ2σn1exp(21[(σ1x1μ1)2+(σ2x2μ2)2++(σnxnμn)2])

z 2 = ( x 1 − μ 1 ) 2 σ 1 2 + ( x 2 − μ 2 ) 2 σ 2 2 ⋯ + ( x n − μ n ) 2 σ n 2 , σ z = σ 1 σ 2 ⋯ σ n z^{2}=\frac{\left(x_{1}-\mu_{1}\right)^{2}}{\sigma_{1}^{2}}+\frac{\left(x_{2}-\mu_{2}\right)^{2}}{\sigma_{2}^{2}} \cdots+\frac{\left(x_{n}-\mu_{n}\right)^{2}}{\sigma_{n}^{2}}, \quad\sigma_{z}=\sigma_{1} \sigma_{2} \cdots \sigma_{n} z2=σ12(x1μ1)2+σ22(x2μ2)2+σn2(xnμn)2,σz=σ1σ2σn

f ( x ) f(x) f(x)可以化为:
f ( z ) = 1 ( 2 π ) n σ z e − z 2 2 ① f(z)=\frac{1}{(\sqrt{2 \pi})^{n} \sigma_{z}} e^{-\frac{z^{2}}{2}}\quad\quad① f(z)=(2π )nσz1e2z2

因为多元正态分布有着很强的几何思想,单纯从代数的角度看待z很难看出z的概率分布规律,这里需要转换成矩阵形式:
z 2 = z T z = [ x 1 − μ 1 x 2 − μ 2 ⋯ x n − μ n ] [ 1 σ 1 2 0 ⋯ 0 0 1 σ 2 2 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 0 0 ⋯ 1 σ n 2 ] [ x 1 − μ 1 x 2 − μ 2 ⋮ x n − μ n ] ② \begin{array}{l} z^{2}=z^{\mathrm{T}} z \\ =\left[\begin{array}{llll} x_{1}-\mu_{1} & x_{2}-\mu_{2} & \cdots & x_{n}-\mu_{n} \end{array}\right]\left[\begin{array}{cccc} \frac{1}{\sigma_{1}^{2}} & 0 & \cdots & 0 \\ 0 & \frac{1}{\sigma_{2}^{2}} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \frac{1}{\sigma_{n}^{2}} \end{array}\right]\left[\begin{array}{c} x_{1}-\mu_{1} \\ x_{2}-\mu_{2} \\ \vdots \\ x_{n}-\mu_{n} \end{array}\right] \end{array}\quad\quad② z2=zTz=[x1μ1x2μ2xnμn] σ121000σ221000σn21 x1μ1x2μ2xnμn
上面的等式比较长,这里做一下变量替换,记

x − μ = [ x 1 − μ 1 , x 2 − μ 2 , ⋯   , x n − μ n ] T x-\mu=\left[x_{1}-\mu_{1}, x_{2}-\mu_{2}, \cdots, x_{n}-\mu_{n}\right]^{\mathrm{T}} xμ=[x1μ1,x2μ2,,xnμn]T

定义一个符号 \quad Σ = [ σ 1 2 0 ⋯ 0 0 σ 2 2 ⋯ 0 ⋮ ⋯ ⋯ ⋮ 0 0 ⋯ σ n 2 ] \Sigma=\left[\begin{array}{cccc} \sigma_{1}^{2} & 0 & \cdots & 0 \\ 0 & \sigma_{2}^{2} & \cdots & 0 \\ \vdots & \cdots & \cdots & \vdots \\ 0 & 0 & \cdots & \sigma_{n}^{2} \end{array}\right] Σ= σ12000σ22000σn2

Σ \Sigma Σ 表示变量 x x x 的协方差矩阵, i i i j j j 列的元素值表示 x i x_{i} xi x j x_{j} xj 的协方差。
因为现在变量之间是相互独立的,所以只有对角线上 ( i = j ) (i = j) (i=j)存在元素,其他地方都等于0,且 x i x_{i} xi 与它本身的协方差就等于方差。
Σ \Sigma Σ 为一个对角矩阵,根据对角矩阵的性质, Σ \Sigma Σ 的逆矩阵为:
( Σ ) − 1 = [ 1 σ 1 2 0 ⋯ 0 0 1 σ 2 2 ⋯ 0 ⋮ ⋯ ⋯ ⋮ 0 0 ⋯ 1 σ n 2 ] \left(\Sigma\right)^{-1}=\left[\begin{array}{cccc} \frac{1}{\sigma_{1}^{2}} & 0 & \cdots & 0 \\ 0 & \frac{1}{\sigma_{2}^{2}} & \cdots & 0 \\ \vdots & \cdots & \cdots & \vdots \\ 0 & 0 & \cdots & \frac{1}{\sigma_{n}^{2}} \end{array}\right] (Σ)1= σ121000σ221000σn21
因为对角矩阵的行列式 = 对角元素的乘积

∣ Σ ∣ = σ 1 2 σ 2 2 ⋯ σ n 2 |\Sigma|=\sigma_{1}^{2} \sigma_{2}^{2} \cdots \sigma_{n}^{2} ∣Σ∣=σ12σ22σn2

σ z = ∣ Σ ∣ 1 2 = σ 1 σ 2 … σ n \sigma_{z}=\left|\Sigma\right|^{\frac{1}{2}}=\sigma_{1} \sigma_{2} \ldots \sigma_{n} σz=Σ21=σ1σ2σn

带入②中可得:
z T z = ( x − μ ) T Σ − 1 ( x − μ ) ③ z^{\mathrm{T}} z=\left(x-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x-\mu\right)\quad\quad③ zTz=(xμ)TΣ1(xμ)
带入①中可得:
f ( z ) = 1 ( 2 π ) n σ z e − z 2 2 = 1 ( 2 π ) n ∣ Σ ∣ 1 2 e − ( x − μ ) T ( Σ ) − 1 ( x − μ ) 2 f(z)=\frac{1}{(\sqrt{2 \pi})^{n} \sigma_{z}} e^{-\frac{z^{2}}{2}}=\frac{1}{(\sqrt{2 \pi})^{n}\left|\Sigma\right|^{\frac{1}{2}}} e^{-\frac{\left(x-\mu\right)^{\mathrm{T}}\left(\Sigma\right)^{-1}\left(x-\mu\right)}{2}} f(z)=(2π )nσz1e2z2=(2π )nΣ211e2(xμ)T(Σ)1(xμ)

所以得到:
f ( x ) = 1 ( 2 π ) n σ 1 σ 2 ⋯ σ n exp ⁡ ( − 1 2 [ ( x 1 − μ 1 σ 1 ) 2 + ( x 2 − μ 2 σ 2 ) 2 + ⋯ + ( x n − μ n σ n ) 2 ] ) = 1 ( 2 π ) n ∣ Σ ∣ exp ⁡ ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) = ( 2 π ) − n 2 ∣ Σ ∣ − 1 2 exp ⁡ ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) = f ( x ; μ , Σ ) \begin{aligned} f(x) &=\frac{1}{(\sqrt{2 \pi})^{n} \sigma_{1} \sigma_{2} \cdots \sigma_{n}} \exp \left(-\frac{1}{2}\left[\left(\frac{x_{1}-\mu_{1}}{\sigma_{1}}\right)^{2}+\left(\frac{x_{2}-\mu_{2}}{\sigma_{2}}\right)^{2}+\cdots+\left(\frac{x_{n}-\mu_{n}}{\sigma_{n}}\right)^{2}\right]\right) \\ &=\frac{1}{(\sqrt{2 \pi})^{n} \sqrt{|\Sigma|}} \exp \left(-\frac{1}{2}(x-\mu)^{\mathrm{T}} \Sigma^{-1}(x-\mu)\right) \\ &=(2 \pi)^{-\frac{n}{2}}|\Sigma|^{-\frac{1}{2}} \exp \left(-\frac{1}{2}(x-\mu)^{\mathrm{T}} \Sigma^{-1}(x-\mu)\right) \\ &=f(x ; \mu, \Sigma) \end{aligned} f(x)=(2π )nσ1σ2σn1exp(21[(σ1x1μ1)2+(σ2x2μ2)2++(σnxnμn)2])=(2π )n∣Σ∣ 1exp(21(xμ)TΣ1(xμ))=(2π)2n∣Σ21exp(21(xμ)TΣ1(xμ))=f(x;μ,Σ)

最大似然估计量

n维相互独立的随机变量 x x x 服从正态分布:
x ∼ N ( μ , σ 2 ) , σ i ≥ 0 x \sim N\left(\mu, \sigma^{2}\right), \quad \sigma_{i} \geq 0 xN(μ,σ2),σi0
多维正态分布的最终形式为:
f ( x ) = f ( x ; μ , Σ ) f(x) = f(x ; \mu, \Sigma) f(x)=f(x;μ,Σ)
假设有m个可观察样本,那么最大似然函数是:

L ( μ , Σ ) = ∏ i = 1 m f ( x ( i ) ; μ , Σ ) = ∏ i = 1 m ( 2 π ) − n 2 ∣ Σ ∣ − 1 2 exp ⁡ ( − 1 2 ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ) = ( 2 π ) − m n 2 ∣ Σ ∣ − m 2 exp ⁡ ( − 1 2 ∑ i = 1 m ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ) \begin{aligned} L(\mu, \Sigma) &=\prod_{i=1}^{m} f\left(x^{(i)} ; \mu, \Sigma\right) \\ &=\prod_{i=1}^{m}(2 \pi)^{-\frac{n}{2}}|\Sigma|^{-\frac{1}{2}} \exp \left(-\frac{1}{2}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)\right) \\ &=(2 \pi)^{-\frac{m n}{2}}|\Sigma|^{-\frac{m}{2}} \exp \left(-\frac{1}{2} \sum_{i=1}^{m}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)\right) \end{aligned} L(μ,Σ)=i=1mf(x(i);μ,Σ)=i=1m(2π)2n∣Σ21exp(21(x(i)μ)TΣ1(x(i)μ))=(2π)2mn∣Σ2mexp(21i=1m(x(i)μ)TΣ1(x(i)μ))
其对数似然函数是:
ln ⁡ L ( μ , Σ ) = ln ⁡ ( 2 π ) − m n 2 + ln ⁡ ∣ Σ ∣ − m 2 + ln ⁡ exp ⁡ ( − 1 2 ∑ i = 1 m ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ) = − m n 2 ln ⁡ 2 π − m 2 ln ⁡ ∣ Σ ∣ − 1 2 ∑ i = 1 m ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) = C − m 2 ln ⁡ ∣ Σ ∣ − 1 2 ∑ i = 1 m ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) \begin{aligned} \ln L(\mu, \Sigma) &=\ln (2 \pi)^{-\frac{m n}{2}}+\ln |\Sigma|^{-\frac{m}{2}}+\ln \exp \left(-\frac{1}{2} \sum_{i=1}^{m}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)\right) \\ &=-\frac{m n}{2} \ln 2 \pi-\frac{m}{2} \ln |\Sigma|-\frac{1}{2} \sum_{i=1}^{m}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) \\ &=C-\frac{m}{2} \ln |\Sigma|-\frac{1}{2} \sum_{i=1}^{m}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) \end{aligned} lnL(μ,Σ)=ln(2π)2mn+ln∣Σ2m+lnexp(21i=1m(x(i)μ)TΣ1(x(i)μ))=2mnln2π2mln∣Σ∣21i=1m(x(i)μ)TΣ1(x(i)μ)=C2mln∣Σ∣21i=1m(x(i)μ)TΣ1(x(i)μ)
其中m和n是已知的,m为可观察样本的个数,n为单个样本的特征维数,C 是一个常数, C = − m n 2 ln ⁡ 2 π C = -\frac{m n}{2} \ln 2 \pi C=2mnln2π

求极值需要对μ和∑求偏导:

{ ∂ ln ⁡ L ∂ μ = 0 ∂ ln ⁡ L ∂ Σ = 0 \left\{\begin{array}{l} \frac{\partial \ln L}{\partial \mu}=0 \\ \frac{\partial \ln L}{\partial \Sigma}=0 \end{array}\right. {μlnL=0ΣlnL=0

μ和∑是矩阵,涉及到矩阵的求导法则。先看对μ的求导, l n L \mathrm{lnL} lnL由3个因子组成,只有一个因子含有μ,因此:

∂ ln ⁡ L ∂ μ = ∂ ∂ μ ( − 1 2 ∑ i = 1 m ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ) \frac{\partial \ln L}{\partial \mu}=\frac{\partial}{\partial \mu}\left(-\frac{1}{2} \sum_{i=1}^{m}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)\right) μlnL=μ(21i=1m(x(i)μ)TΣ1(x(i)μ))

其中:
( x − μ ) T Σ − 1 ( x − μ ) = ( x T − μ T ) Σ − 1 ( x − μ ) = ( x T Σ − 1 − μ T Σ − 1 ) ( x − μ ) = x T Σ − 1 x − x T Σ − 1 μ − μ T Σ − 1 x + μ T Σ − 1 μ \begin{aligned} (x-\mu)^{\mathrm{T}} \Sigma^{-1}(x-\mu) &=\left(x^{\mathrm{T}}-\mu^{\mathrm{T}}\right) \Sigma^{-1}(x-\mu) \\ &=\left(x^{\mathrm{T}} \Sigma^{-1}-\mu^{\mathrm{T}} \Sigma^{-1}\right)(x-\mu) \\ &=x^{\mathrm{T}} \Sigma^{-1} x-x^{\mathrm{T}} \Sigma^{-1} \mu-\mu^{\mathrm{T}} \Sigma^{-1} x+\mu^{\mathrm{T}} \Sigma^{-1} \mu \end{aligned} (xμ)TΣ1(xμ)=(xTμT)Σ1(xμ)=(xTΣ1μTΣ1)(xμ)=xTΣ1xxTΣ1μμTΣ1x+μTΣ1μ

上式中:
x T Σ − 1 μ = [ x 1 x 2 ⋯ x n ] Σ − 1 [ μ 1 μ 2 ⋮ μ n ] = [ μ 1 μ 2 ⋯ μ n ] Σ − 1 [ x 1 x 2 ⋮ x n ] = μ T Σ − 1 x x^{\mathrm{T}} \Sigma^{-1} \mu=\left[\begin{array}{llll} x_{1} & x_{2} & \cdots & x_{n} \end{array}\right] \Sigma^{-1}\left[\begin{array}{c} \mu_{1} \\ \mu_{2} \\ \vdots \\ \mu_{n} \end{array}\right]=\left[\begin{array}{llll} \mu_{1} & \mu_{2} & \cdots & \mu_{n} \end{array}\right] \Sigma^{-1}\left[\begin{array}{c} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{array}\right]=\mu^{\mathrm{T}} \Sigma^{-1} x xTΣ1μ=[x1x2xn]Σ1 μ1μ2μn =[μ1μ2μn]Σ1 x1x2xn =μTΣ1x

因此:
( x − μ ) T Σ − 1 ( x − μ ) = x T Σ − 1 x − x T Σ − 1 μ − μ T Σ − 1 x + μ T Σ − 1 μ = x T Σ − 1 x − 2 x T Σ − 1 μ + μ T Σ − 1 μ ∑ i = 1 m ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) = ∑ i = 1 m ( x ( i ) T Σ − 1 x ( i ) − 2 x ( i ) T Σ − 1 μ + μ T Σ − 1 μ ) = ∑ i = 1 m x ( i ) T Σ − 1 x ( i ) − 2 ∑ i = 1 m x ( i ) T Σ − 1 μ + m μ T Σ − 1 μ \begin{aligned} (x-\mu)^{\mathrm{T}} \Sigma^{-1}(x-\mu) &=x^{\mathrm{T}} \Sigma^{-1} x-x^{\mathrm{T}} \Sigma^{-1} \mu-\mu^{\mathrm{T}} \Sigma^{-1} x+\mu^{\mathrm{T}} \Sigma^{-1} \mu \\ &=x^{\mathrm{T}} \Sigma^{-1} x-2 x^{\mathrm{T}} \Sigma^{-1} \mu+\mu^{\mathrm{T}} \Sigma^{-1} \mu \\ \sum_{i=1}^{m}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) &=\sum_{i=1}^{m}\left(x^{(i)^{\mathrm{T}}} \Sigma^{-1} x^{(i)}-2 x^{(i)^{\mathrm{T}}} \Sigma^{-1} \mu+\mu^{\mathrm{T}} \Sigma^{-1} \mu\right) \\ &=\sum_{i=1}^{m} x^{(i)^{\mathrm{T}}} \Sigma^{-1} x^{(i)}-2 \sum_{i=1}^{m} x^{(i)^{\mathrm{T}}} \Sigma^{-1} \mu+m \mu^{\mathrm{T}} \Sigma^{-1} \mu \end{aligned} (xμ)TΣ1(xμ)i=1m(x(i)μ)TΣ1(x(i)μ)=xTΣ1xxTΣ1μμTΣ1x+μTΣ1μ=xTΣ1x2xTΣ1μ+μTΣ1μ=i=1m(x(i)TΣ1x(i)2x(i)TΣ1μ+μTΣ1μ)=i=1mx(i)TΣ1x(i)2i=1mx(i)TΣ1μ+mμTΣ1μ

将此结论代入 ∂ ln ⁡ L ∂ μ \frac{\partial \ln L} {\partial \mu} μlnL中:
∂ ln ⁡ L ∂ μ = ∂ ∂ μ ( − 1 2 ( ∑ i = 1 m x ( i ) T Σ − 1 x ( i ) − 2 ∑ i = 1 m x ( i ) T Σ − 1 μ + m μ T Σ − 1 μ ) ) = ∂ ∂ μ ( − 1 2 ∑ i = 1 m x ( i ) T Σ − 1 x ( i ) ) + ∂ ∂ μ ( ∑ i = 1 m x ( i ) T Σ − 1 μ ) − 1 2 ∂ ∂ μ m μ T Σ − 1 μ = ∂ ∂ μ ( ∑ i = 1 m x ( i ) T Σ − 1 μ ) ⏟ a 1 − 1 2 ∂ ∂ μ m μ T Σ − 1 μ ⏟ a 2 \begin{aligned} \frac{\partial \ln L}{\partial \mu} &=\frac{\partial}{\partial \mu}\left(-\frac{1}{2}\left(\sum_{i=1}^{m} x^{(i)^{\mathrm{T}}} \Sigma^{-1} x^{(i)}-2 \sum_{i=1}^{m} x^{(i)^{\mathrm{T}}} \Sigma^{-1} \mu+m \mu^{\mathrm{T}} \Sigma^{-1} \mu\right)\right) \\ &=\frac{\partial}{\partial \mu}\left(-\frac{1}{2} \sum_{i=1}^{m} x^{(i)^{\mathrm{T}}} \Sigma^{-1} x^{(i)}\right)+\frac{\partial}{\partial \mu}\left(\sum_{i=1}^{m} x^{(i)^{\mathrm{T}}} \Sigma^{-1} \mu\right)-\frac{1}{2} \frac{\partial}{\partial \mu} m \mu^{\mathrm{T}} \Sigma^{-1} \mu \\ &=\underbrace{\frac{\partial}{\partial \mu}\left(\sum_{i=1}^{m} x^{(i)^{\mathrm{T}}} \Sigma^{-1} \mu\right)}_{a_{1}}-\underbrace{\frac{1}{2} \frac{\partial}{\partial \mu} m \mu^{\mathrm{T}} \Sigma^{-1} \mu}_{a_{2}} \end{aligned} μlnL=μ(21(i=1mx(i)TΣ1x(i)2i=1mx(i)TΣ1μ+mμTΣ1μ))=μ(21i=1mx(i)TΣ1x(i))+μ(i=1mx(i)TΣ1μ)21μmμTΣ1μ=a1 μ(i=1mx(i)TΣ1μ)a2 21μmμTΣ1μ
μ和∑是矩阵,根据矩阵的求导法则:

i f f ( X ) = A T X , t h e n d f   d X = A if \quad f(\boldsymbol{X})=\boldsymbol{A}^{\mathrm{T}} \boldsymbol{X}, \quad then \frac{\mathrm{d} f}{\mathrm{~d} \boldsymbol{X}}=\boldsymbol{A} iff(X)=ATX,then dXdf=A

⇒ a 1 = ∑ i = 1 m ( x ( i ) T Σ − 1 ) T = ∑ i = 1 m ∑ − 1 T x ( i ) \Rightarrow a_{1}=\sum_{i=1}^{m}\left(x^{(i)^{\mathrm{T}}} \Sigma^{-1}\right)^{\mathrm{T}}=\sum_{i=1}^{m} {\textstyle \sum^{-{ }^{\mathrm{1^{T}}}}} x^{(i)} a1=i=1m(x(i)TΣ1)T=i=1m1Tx(i)
因为 ∑ − 1 \sum^{-1} 1是一个对称矩阵,所以:
Σ − 1 T = Σ − 1 , a 1 = ∑ i = 1 m Σ − 1 T x ( i ) = ∑ i = 1 m Σ − 1 x ( i ) = Σ − 1 ∑ i = 1 m x ( i ) \Sigma^{-1^{T}} = \Sigma^{-1}, a1 = \sum_{i=1}^{m}\Sigma^{-1^{T}}x^{(i)} = \sum_{i=1}^{m}\Sigma^{-1}x^{(i)} = \Sigma^{-1}\sum_{i=1}^{m}x^{(i)} Σ1T=Σ1,a1=i=1mΣ1Tx(i)=i=1mΣ1x(i)=Σ1i=1mx(i)
根据矩阵的求导法则:
i f f ( X ) = X T A X , t h e n d f d X = A X + A T X if \quad f(X) = X^{T}AX, \quad then \quad \frac{df}{dX} = AX + A^{T}X iff(X)=XTAX,thendXdf=AX+ATX
w h e n A = A T , t h e n d f d X = A X + A T X = 2 A X when \quad A=A^{T},\quad then \quad \frac{df}{dX} = AX + A^{T}X =2AX whenA=AT,thendXdf=AX+ATX=2AX
⇒ a 2 = 1 2 ∂ ( m μ T Σ − 1 ) ∂ μ = m Σ − 1 μ \Rightarrow a_{2} = \frac{1}{2}\frac{\partial( m\mu^{T}\Sigma^{-1})}{\partial \mu}=m\Sigma^{-1}\mu a2=21μ(mμTΣ1)=mΣ1μ

a 1 , a 2 a_{1},a_{2} a1,a2代入 ∂ ln ⁡ L ∂ μ \frac{\partial \ln L}{\partial \mu} μlnL中:

∂ ln ⁡ L ∂ μ = a 1 + a 2 = ∑ i = 1 m ∑ − 1 x ( i ) − m ∑ − 1 μ = 0 μ ^ = ∑ − 1 ∑ i = 1 m x ( i ) m ∑ − 1 = 1 m ∑ i = 1 m x ( i ) = x ˉ \begin{array}{c} \frac{\partial \ln L}{\partial \mu}=a_{1}+a_{2}=\sum_{i=1}^{m} \sum^{-1} x^{(i)}-m \sum^{-1} \mu=0 \\ \hat{\mu}=\frac{\sum^{-1} \sum_{i=1}^{m} x^{(i)}}{m \sum^{-1}}=\frac{1}{m} \sum_{i=1}^{m} x^{(i)}=\bar{x} \end{array} μlnL=a1+a2=i=1m1x(i)m1μ=0μ^=m11i=1mx(i)=m1i=1mx(i)=xˉ

再看对 Σ \Sigma Σ 求偏导:

∂ ln ⁡ L ∂ Σ = ∂ ∂ Σ ( C − m 2 ln ⁡ ∣ Σ ∣ − 1 2 ∑ i = 1 m ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ) = − m 2 ∂ ∂ Σ ln ⁡ ∣ Σ ∣ ⏟ b 1 − 1 2 ∂ ∂ ∑ ∑ i = 1 m ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ⏟ b 2 \begin{aligned} \frac{\partial \ln L}{\partial \Sigma} &=\frac{\partial}{\partial \Sigma}\left(C-\frac{m}{2} \ln |\Sigma|-\frac{1}{2} \sum_{i=1}^{m}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)\right) \\ &=-\frac{m}{2} \underbrace{\frac{\partial}{\partial \Sigma} \ln |\Sigma|}_{b_{1}}-\frac{1}{2} \underbrace{\frac{\partial}{\partial \sum} \sum_{i=1}^{m}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)}_{b_{2}} \end{aligned} ΣlnL=Σ(C2mln∣Σ∣21i=1m(x(i)μ)TΣ1(x(i)μ))=2mb1 Σln∣Σ∣21b2 i=1m(x(i)μ)TΣ1(x(i)μ)

Σ \Sigma Σ Σ − 1 \Sigma^{-1} Σ1都是实对称矩阵,根据矩阵的求导法则,当A是实对称矩阵是:
∂ ln ⁡ A ∂ A = A − 1 ⇒ b 1 = ∂ ln ⁡ ∣ Σ ∣ ∂ Σ = Σ − 1 \frac{\partial \ln A}{\partial A} = A^{-1} \Rightarrow b_{1} =\frac{\partial \ln |\Sigma|}{\partial \Sigma} =\Sigma^{-1} AlnA=A1b1=Σln∣Σ∣=Σ1

再看 b 2 b_{2} b2。设 ω , p , q \omega, p,q ω,p,q Σ \Sigma Σ p p p 行第 q q q列的元素, E p q E_{pq} Epq是一个第 p p p 行第 q q q 列元素为1,其他元素全为0的矩阵, E E E Σ − 1 \Sigma^{-1} Σ1同阶。根据矩阵的求导公式:
∂ X − 1 ∂ x = − X − 1 ∂ X ∂ x X − 1 ⇒ ∂ Σ − 1 ∂ ω p q = − Σ − 1 ∂ Σ ∂ ω p q Σ − 1 = − Σ − 1 E p q Σ − 1 ⇒ ∂ ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ∂ ω p q = ( x ( i ) − μ ) T ∂ Σ − 1 ∂ ω p q ( x ( i ) − μ ) = ( x ( i ) − μ ) T ( − Σ − 1 E p q Σ − 1 ) ( x ( i ) − μ ) = − ( x ( i ) − μ ) T ( Σ − 1 E p q Σ − 1 ) ( x ( i ) − μ ) \begin{aligned} \frac{\partial \boldsymbol{X}^{-1}}{\partial x}=-\boldsymbol{X}^{-1} \frac{\partial \boldsymbol{X}}{\partial x} \boldsymbol{X}^{-1} \\ \Rightarrow \frac{\partial \Sigma^{-1}}{\partial \omega_{p q}}=-\Sigma^{-1} \frac{\partial \Sigma}{\partial \omega_{p q}} \Sigma^{-1}=&-\Sigma^{-1} E_{p q} \Sigma^{-1} \\ \Rightarrow \frac{\partial\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)}{\partial \omega_{p q}} &=\left(x^{(i)}-\mu\right)^{\mathrm{T}} \frac{\partial \Sigma^{-1}}{\partial \omega_{p q}}\left(x^{(i)}-\mu\right) \\ &=\left(x^{(i)}-\mu\right)^{\mathrm{T}}\left(-\Sigma^{-1} E_{p q} \Sigma^{-1}\right)\left(x^{(i)}-\mu\right) \\ &=-\left(x^{(i)}-\mu\right)^{\mathrm{T}}\left(\Sigma^{-1} E_{p q} \Sigma^{-1}\right)\left(x^{(i)}-\mu\right) \end{aligned} xX1=X1xXX1ωpqΣ1=Σ1ωpqΣΣ1=ωpq(x(i)μ)TΣ1(x(i)μ)Σ1EpqΣ1=(x(i)μ)TωpqΣ1(x(i)μ)=(x(i)μ)T(Σ1EpqΣ1)(x(i)μ)=(x(i)μ)T(Σ1EpqΣ1)(x(i)μ)

已经知道了 Σ − 1 \Sigma^{-1} Σ1是一个对称矩阵,矩阵乘法满足结合律,在不改变矩阵顺序的条件下可以任意加括号:
∂ ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ∂ ω p q = − ( x ( i ) − μ ) T ( Σ − 1 E p q Σ − 1 ) ( x ( i ) − μ ) = − ( ( x ( i ) − μ ) T Σ − 1 ) E p q ( Σ − 1 ( x ( i ) − μ ) ) = − ( ( x ( i ) − μ ) T ( Σ − 1 ) T ) E p q ( Σ − 1 ( x ( i ) − μ ) ) = − ( Σ − 1 ( x ( i ) − μ ) ) T ⏟ A T B T = ( A B ) T E p q ( Σ − 1 ( x ( i ) − μ ) ) = − ( Σ − 1 ( x ( i ) − μ ) ) p T ( Σ − 1 ( x ( i ) − μ ) ) q \begin{aligned} \frac{\partial\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)}{\partial \omega_{p q}} &=-\left(x^{(i)}-\mu\right)^{\mathrm{T}}\left(\Sigma^{-1} E_{p q} \Sigma^{-1}\right)\left(x^{(i)}-\mu\right) \\ &=-\left(\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\right) E_{p q}\left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right) \\ &=-\left(\left(x^{(i)}-\mu\right)^{\mathrm{T}}\left(\Sigma^{-1}\right)^{\mathrm{T}}\right) E_{p q}\left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right) \\ &=-\underbrace{\left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right)^{\mathrm{T}}}_{A^{\mathrm{T}} B^{\mathrm{T}}=(A B)^{\mathrm{T}}} E_{p q}\left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right) \\ &=-\left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right)_{p}^{\mathrm{T}}\left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right)_{q} \end{aligned} ωpq(x(i)μ)TΣ1(x(i)μ)=(x(i)μ)T(Σ1EpqΣ1)(x(i)μ)=((x(i)μ)TΣ1)Epq(Σ1(x(i)μ))=((x(i)μ)T(Σ1)T)Epq(Σ1(x(i)μ))=ATBT=(AB)T (Σ1(x(i)μ))TEpq(Σ1(x(i)μ))=(Σ1(x(i)μ))pT(Σ1(x(i)μ))q
其中 ( Σ − 1 ( x ( i ) − μ ) ) T (\Sigma^{-1}(x^{(i)}-\mu))^{T} (Σ1(x(i)μ))T是一个1 * n的矩阵, ( Σ − 1 ( x ( i ) − μ ) ) p T (\Sigma^{-1}(x^{(i)}-μ))_{p}^{T} (Σ1(x(i)μ))pT表示矩阵中的第p个元素; Σ − 1 ( x ( i ) − μ ) \Sigma^{-1}(x^{(i)}-\mu) Σ1(x(i)μ)是一个n*1的矩阵, ( Σ − 1 ( x ( i ) − μ ) ) q (\Sigma^{-1}(x^{(i)}-μ))_{q} (Σ1(x(i)μ))q表示矩阵中的第q个元素。将该结论推广到矩阵对矩阵的的求导,根据矩阵对矩阵的求导公式:

[ ∂ F ∂ x 11 ∂ F ∂ x 12 ⋯ ∂ F ∂ x 1 s ∂ F ∂ x 21 ∂ F ∂ x 22 ⋯ ∂ F ∂ x 2 s ⋮ ⋮ ⋱ ⋮ ∂ F ∂ x r 1 ∂ F ∂ x r 2 ⋯ ∂ F ∂ x r s ] ∂ ∂ Σ ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) = − [ ∂ ∂ ω 11 ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ∂ ∂ ω 12 ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ⋯ ∂ ∂ ω 1 n ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ∂ ∂ ω 21 ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ∂ ∂ ω 22 ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ⋯ ∂ ∂ ω 2 n ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ⋮ ⋮ ⋱ ⋮ ∂ ∂ ω n 1 ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ∂ ∂ ω n 1 ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ⋯ ∂ ∂ ω n n ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ] \begin{array}{l} \left[\begin{array}{cccc} \frac{\partial \boldsymbol{F}}{\partial x_{11}} & \frac{\partial \boldsymbol{F}}{\partial x_{12}} & \cdots & \frac{\partial \boldsymbol{F}}{\partial x_{1 s}} \\ \frac{\partial \boldsymbol{F}}{\partial x_{21}} & \frac{\partial \boldsymbol{F}}{\partial x_{22}} & \cdots & \frac{\partial \boldsymbol{F}}{\partial x_{2 s}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial \boldsymbol{F}}{\partial x_{r 1}} & \frac{\partial \boldsymbol{F}}{\partial x_{r 2}} & \cdots & \frac{\partial \boldsymbol{F}}{\partial x_{r s}} \end{array}\right]\\ \\ \\ \frac{\partial}{\partial \Sigma}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)\\ =-\left[\begin{array}{cccc} \frac{\partial}{\partial \omega_{11}}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) & \frac{\partial}{\partial \omega_{12}}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) & \cdots & \frac{\partial}{\partial \omega_{1 n}}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) \\ \frac{\partial}{\partial \omega_{21}}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) & \frac{\partial}{\partial \omega_{22}}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) & \cdots & \frac{\partial}{\partial \omega_{2 n}}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial}{\partial \omega_{n 1}}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) & \frac{\partial}{\partial \omega_{n 1}}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) & \cdots & \frac{\partial}{\partial \omega_{n n}}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right) \end{array}\right] \end{array} x11Fx21Fxr1Fx12Fx22Fxr2Fx1sFx2sFxrsF Σ(x(i)μ)TΣ1(x(i)μ)= ω11(x(i)μ)TΣ1(x(i)μ)ω21(x(i)μ)TΣ1(x(i)μ)ωn1(x(i)μ)TΣ1(x(i)μ)ω12(x(i)μ)TΣ1(x(i)μ)ω22(x(i)μ)TΣ1(x(i)μ)ωn1(x(i)μ)TΣ1(x(i)μ)ω1n(x(i)μ)TΣ1(x(i)μ)ω2n(x(i)μ)TΣ1(x(i)μ)ωnn(x(i)μ)TΣ1(x(i)μ)

其中:
A 2 = [ ( Σ − 1 ( x ( i ) − μ ) ) 1 ( Σ − 1 ( x ( i ) − μ ) ) 2 ⋯ ( Σ − 1 ( x ( i ) − μ ) ) n ] = ( Σ − 1 ( x ( i ) − μ ) ) T A_{2}=\left[\begin{array}{llll} \left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right)_{1} & \left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right)_{2} & \cdots & \left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right)_{n} \end{array}\right]=\left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right)^{\mathrm{T}} A2=[(Σ1(x(i)μ))1(Σ1(x(i)μ))2(Σ1(x(i)μ))n]=(Σ1(x(i)μ))T

A 1 A_{1} A1中, ( Σ − 1 ( x ( i ) − μ ) ) T (\Sigma^{-1}(x^{(i)}-\mu))^{T} (Σ1(x(i)μ))T是一个1 * n的矩阵, ( Σ − 1 ( x ( i ) − μ ) ) i T (\Sigma^{-1}(x^{(i)}-\mu))^{T}_{i} (Σ1(x(i)μ))iT表示矩阵中的第i个元素,是一个标量; Σ − 1 ( x ( i ) − μ ) \Sigma^{-1}(x^{(i)}-\mu) Σ1(x(i)μ)是一个n*1的矩阵, ( Σ − 1 ( x ( i ) − μ ) ) i (\Sigma^{-1}(x^{(i)}-\mu))_{i} (Σ1(x(i)μ))i表示矩阵中的第i个元素,也是一个标量,因此:

( Σ − 1 ( x ( i ) − μ ) ) i T = ( Σ − 1 ( x ( i ) − μ ) ) i ∂ ∂ Σ ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) = − A 1 A 2 = − Σ − 1 ( x ( i ) − μ ) ( Σ − 1 ( x ( i ) − μ ) ) T = − Σ − 1 ( x ( i ) − μ ) ( x ( i ) − μ ) T ( Σ − 1 ) T = − Σ − 1 ( x ( i ) − μ ) ( x ( i ) − μ ) T Σ − 1 \begin{array}{l} \left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right)_{i}^{\mathrm{T}}=\left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right)_{i}\\ \frac{\partial}{\partial \Sigma}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)=-A_{1} A_{2}\\ =-\Sigma^{-1}\left(x^{(i)}-\mu\right)\left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\right)^{\mathrm{T}}\\ =-\Sigma^{-1}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}}\left(\Sigma^{-1}\right)^{\mathrm{T}}\\ =-\Sigma^{-1}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1} \end{array} (Σ1(x(i)μ))iT=(Σ1(x(i)μ))iΣ(x(i)μ)TΣ1(x(i)μ)=A1A2=Σ1(x(i)μ)(Σ1(x(i)μ))T=Σ1(x(i)μ)(x(i)μ)T(Σ1)T=Σ1(x(i)μ)(x(i)μ)TΣ1

终于可以求得 b 2 b_{2} b2了:
b 2 = ∂ ∂ Σ ∑ i = 1 m ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) = ∑ i = 1 m ( − Σ − 1 ( x ( i ) − μ ) ( x ( i ) − μ ) T Σ − 1 ) b_{2}=\frac{\partial}{\partial \Sigma} \sum_{i=1}^{m}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)=\sum_{i=1}^{m}\left(-\Sigma^{-1}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\right) b2=Σi=1m(x(i)μ)TΣ1(x(i)μ)=i=1m(Σ1(x(i)μ)(x(i)μ)TΣ1)

现在可以看看最终的似然函数:

∂ ln ⁡ L ∂ Σ = − m 2 ∂ ∂ Σ ln ⁡ ∣ Σ ∣ ⏟ b 1 − 1 2 ∂ ∂ Σ ∑ i = 1 m ( x ( i ) − μ ) T Σ − 1 ( x ( i ) − μ ) ⏟ b 2 = − m 2 Σ − 1 − 1 2 ∑ i = 1 m ( − Σ − 1 ( x ( i ) − μ ) ( x ( i ) − μ ) T Σ − 1 ) \begin{aligned} \frac{\partial \ln L}{\partial \Sigma} &=-\frac{m}{2} \underbrace{\frac{\partial}{\partial \Sigma} \ln \left|\Sigma\right|}_{b_{1}}-\frac{1}{2} \underbrace{\frac{\partial}{\partial \Sigma} \sum_{i=1}^{m}\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\left(x^{(i)}-\mu\right)}_{b_{2}} \\ &=-\frac{m}{2} \Sigma^{-1}-\frac{1}{2} \sum_{i=1}^{m}\left(-\Sigma^{-1}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\right) \end{aligned} ΣlnL=2mb1 ΣlnΣ21b2 Σi=1m(x(i)μ)TΣ1(x(i)μ)=2mΣ121i=1m(Σ1(x(i)μ)(x(i)μ)TΣ1)

I I I是单位矩阵, Σ − 1 . I = Σ − 1 \Sigma^{-1}.I = \Sigma^{-1} Σ1.I=Σ1
∂ ln ⁡ L ∂ Σ = − m 2 Σ − 1 I − 1 2 ∑ i = 1 m ( − Σ − 1 ( x ( i ) − μ ) ( x ( i ) − μ ) T Σ − 1 ) = − m 2 Σ − 1 I + 1 2 ∑ i = 1 m ( Σ − 1 ( x ( i ) − μ ) ( x ( i ) − μ ) T Σ − 1 ) = − 1 2 Σ − 1 ( m I − ∑ i = 1 m ( x ( i ) − μ ) ( x ( i ) − μ ) T Σ − 1 ) = − 1 2 Σ − 1 ( m Σ Σ − 1 − ∑ i = 1 m ( x ( i ) − μ ) ( x ( i ) − μ ) T Σ − 1 ) = − 1 2 Σ − 1 ( m Σ − ∑ i = 1 m ( x ( i ) − μ ) ( x ( i ) − μ ) T ) Σ − 1 = 0 \begin{aligned} \frac{\partial \ln L}{\partial \Sigma} &=-\frac{m}{2} \Sigma^{-1} I-\frac{1}{2} \sum_{i=1}^{m}\left(-\Sigma^{-1}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\right) \\ &=-\frac{m}{2} \Sigma^{-1} I+\frac{1}{2} \sum_{i=1}^{m}\left(\Sigma^{-1}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\right) \\ &=-\frac{1}{2} \Sigma^{-1}\left(m \boldsymbol{I}-\sum_{i=1}^{m}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\right) \\ &=-\frac{1}{2} \Sigma^{-1}\left(m \Sigma \Sigma^{-1}-\sum_{i=1}^{m}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}} \Sigma^{-1}\right) \\ &=-\frac{1}{2} \Sigma^{-1}\left(m \Sigma-\sum_{i=1}^{m}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}}\right) \Sigma^{-1} \\ &=0 \end{aligned} ΣlnL=2mΣ1I21i=1m(Σ1(x(i)μ)(x(i)μ)TΣ1)=2mΣ1I+21i=1m(Σ1(x(i)μ)(x(i)μ)TΣ1)=21Σ1(mIi=1m(x(i)μ)(x(i)μ)TΣ1)=21Σ1(mΣΣ1i=1m(x(i)μ)(x(i)μ)TΣ1)=21Σ1(mΣi=1m(x(i)μ)(x(i)μ)T)Σ1=0
等号两侧同时左乘 Σ \Sigma Σ :
Σ ( − 1 2 Σ − 1 ) ( m Σ − ∑ i = 1 m ( x ( i ) − μ ) ( x ( i ) − μ ) T ) Σ − 1 = Σ 0 − 1 2 I ( m Σ − ∑ i = 1 m ( x ( i ) − μ ) ( x ( i ) − μ ) T ) Σ − 1 = 0 ( m Σ − ∑ i = 1 m ( x ( i ) − μ ) ( x ( i ) − μ ) T ) Σ − 1 = 0 \begin{equation*} \begin{aligned} \Sigma\left(-\frac{1}{2} \Sigma^{-1}\right) &\left(m \Sigma-\sum_{i=1}^{m}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}}\right) \Sigma^{-1} &= \Sigma 0 \\ -\frac{1}{2} I\left(m \Sigma-\sum_{i=1}^{m}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}}\right) \Sigma^{-1} &=0 \\ \left(m \Sigma-\sum_{i=1}^{m}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}}\right) \Sigma^{-1} &=0 \end{aligned} \end{equation*} Σ(21Σ1)21I(mΣi=1m(x(i)μ)(x(i)μ)T)Σ1(mΣi=1m(x(i)μ)(x(i)μ)T)Σ1(mΣi=1m(x(i)μ)(x(i)μ)T)Σ1=0=0=Σ0

两侧同时右乘 Σ \Sigma Σ :
( m Σ − ∑ i = 1 m ( x ( i ) − μ ) ( x ( i ) − μ ) T ) Σ − 1 Σ = 0 Σ m Σ − ∑ i = 1 m ( x ( i ) − μ ) ( x ( i ) − μ ) T = 0 \begin{aligned} \left(m \Sigma-\sum_{i=1}^{m}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}}\right) \Sigma^{-1}\Sigma &=0\Sigma\\ m \Sigma-\sum_{i=1}^{m}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}}&=0 \end{aligned} (mΣi=1m(x(i)μ)(x(i)μ)T)Σ1ΣmΣi=1m(x(i)μ)(x(i)μ)T==0

最终解得:
Σ = 1 m ∑ i = 1 m ( x ( i ) − μ ) ( x ( i ) − μ ) T \begin{aligned} \Sigma=\frac{1}{m}\sum_{i=1}^{m}\left(x^{(i)}-\mu\right)\left(x^{(i)}-\mu\right)^{\mathrm{T}} \end{aligned} Σ=m1i=1m(x(i)μ)(x(i)μ)T

综上所述,多维正态分布的最大似然估计量是:
μ ^ = 1 m ∑ i = 1 m x ( i ) = x ˉ \hat{\mu} = \frac{1}{m} \sum^{m}_{i=1}x^{(i)}=\bar{x} μ^=m1i=1mx(i)=xˉ
Σ = 1 m ∑ i = 1 m ( x ( i ) − μ ) ( x ( i ) − μ ) T \Sigma = \frac{1}{m}\sum_{i=1}^{m}(x^{(i)}-\mu)(x^{(i)}-\mu)^{T} Σ=m1i=1m(x(i)μ)(x(i)μ)T

参考博客,讲的非常好!!!

参考博客

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值