高斯判别分析(GDA)是经典的生成学习模型,也是一种监督分类学习算法。
假设有样本集
D
=
{
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
.
.
.
,
(
x
n
,
y
n
)
}
D=\{(x_1,y_1),(x_2,y_2),...,(x_n,y_n)\}
D={(x1,y1),(x2,y2),...,(xn,yn)},其中
x
i
∈
R
d
,
y
i
∈
{
0
,
1
}
x_i \in R^d,y_i \in \{0,1\}
xi∈Rd,yi∈{0,1}。高斯判别分析作为生成学习算法,同样也是对联合概率
P
(
x
,
y
)
P(x,y)
P(x,y)建模,在GDA模型中首先假设:
y
∼
B
e
r
n
o
u
l
l
i
(
ϕ
)
x
∣
y
=
0
∼
N
(
μ
0
,
Σ
)
x
∣
y
=
1
∼
N
(
μ
1
,
Σ
)
y \sim Bernoulli(\phi) \\ x|y=0 \sim N(\mu_0,\Sigma) \\ x|y=1 \sim N(\mu_1,\Sigma)
y∼Bernoulli(ϕ)x∣y=0∼N(μ0,Σ)x∣y=1∼N(μ1,Σ)
其概率分布:
p
(
y
)
=
ϕ
y
(
1
−
ϕ
)
1
−
y
p
(
x
∣
y
=
0
)
=
1
(
2
π
)
d
2
∣
Σ
∣
1
2
e
x
p
(
−
1
2
(
x
−
μ
0
)
T
Σ
−
1
(
x
−
μ
0
)
)
p
(
x
∣
y
=
1
)
=
1
(
2
π
)
d
2
∣
Σ
∣
1
2
e
x
p
(
−
1
2
(
x
−
μ
1
)
T
Σ
−
1
(
x
−
μ
1
)
)
p(y)= \phi^y(1- \phi)^{1-y} \\ p(x|y=0) = \frac{1}{(2\pi)^{\frac d2} |\Sigma|^{\frac12}} exp \left(-\frac 12 (x - \mu_0)^T \Sigma^{-1} (x - \mu_0) \right) \\ p(x|y=1) = \frac{1}{(2\pi)^{\frac d2} |\Sigma|^{\frac12}} exp \left(-\frac 12 (x - \mu_1)^T \Sigma^{-1} (x - \mu_1) \right)
p(y)=ϕy(1−ϕ)1−yp(x∣y=0)=(2π)2d∣Σ∣211exp(−21(x−μ0)TΣ−1(x−μ0))p(x∣y=1)=(2π)2d∣Σ∣211exp(−21(x−μ1)TΣ−1(x−μ1))
在样本集D上的对数似然函数:
l
(
ϕ
,
μ
0
,
μ
1
,
Σ
)
=
l
o
g
∏
i
=
1
m
P
(
x
i
,
y
i
;
ϕ
,
μ
0
,
μ
1
,
Σ
)
=
l
o
g
∏
i
=
1
m
P
(
x
i
∣
y
i
;
μ
0
,
μ
1
,
Σ
)
P
(
y
i
;
ϕ
)
=
∑
i
=
1
m
l
o
g
P
(
x
i
∣
y
i
;
μ
0
,
μ
1
,
Σ
)
+
l
o
g
P
(
y
i
;
ϕ
)
=
∑
i
=
1
m
l
o
g
P
(
x
i
∣
y
i
=
0
;
μ
0
,
Σ
)
1
−
y
i
P
(
x
i
∣
y
i
=
1
;
μ
1
,
Σ
)
y
i
+
l
o
g
P
(
y
i
;
ϕ
)
=
∑
i
=
1
m
(
1
−
y
i
)
l
o
g
P
(
x
i
∣
y
i
=
0
;
μ
0
,
Σ
)
+
y
i
l
o
g
P
(
x
i
∣
y
i
=
1
;
μ
1
,
Σ
)
+
l
o
g
P
(
y
i
;
ϕ
)
=
∑
i
=
1
m
(
1
−
y
i
)
[
−
d
2
l
o
g
2
π
−
1
2
l
o
g
∣
Σ
∣
−
1
2
(
x
−
μ
0
)
T
Σ
−
1
(
x
−
μ
0
)
]
+
y
i
[
−
d
2
l
o
g
2
π
−
1
2
l
o
g
∣
Σ
∣
−
1
2
(
x
−
μ
1
)
T
Σ
−
1
(
x
−
μ
1
)
]
+
l
o
g
ϕ
y
(
1
−
ϕ
)
1
−
y
\begin{aligned} l(\phi, \mu_0 ,\mu_1 ,\Sigma) &= log \prod_{i=1}^m P(x_i,y_i;\phi, \mu_0 ,\mu_1 ,\Sigma) \\ & = log \prod_{i=1}^m P(x_i|y_i;\mu_0 ,\mu_1 ,\Sigma)P(y_i;\phi) \\ & = \sum_{i=1}^m logP(x_i|y_i;\mu_0 ,\mu_1 ,\Sigma) + log P(y_i;\phi) \\ & = \sum_{i=1}^m logP(x_i|y_i=0;\mu_0 ,\Sigma)^{1-y_i} P(x_i|y_i=1;\mu_1 ,\Sigma)^{y_i} + log P(y_i;\phi) \\ & = \sum_{i=1}^m (1-y_i)logP(x_i|y_i=0;\mu_0 ,\Sigma) + y_i log P(x_i|y_i=1;\mu_1 ,\Sigma) + log P(y_i;\phi) \\ & = \sum_{i=1}^m (1-y_i)[-\frac d2log 2\pi - \frac12 log|\Sigma| - \frac 12 (x - \mu_0)^T \Sigma^{-1} (x - \mu_0)] \\ & \qquad + y_i[-\frac d2log 2\pi - \frac12 log|\Sigma| - \frac 12 (x - \mu_1)^T \Sigma^{-1} (x - \mu_1)] + log \phi^y(1- \phi)^{1-y} \end{aligned}
l(ϕ,μ0,μ1,Σ)=logi=1∏mP(xi,yi;ϕ,μ0,μ1,Σ)=logi=1∏mP(xi∣yi;μ0,μ1,Σ)P(yi;ϕ)=i=1∑mlogP(xi∣yi;μ0,μ1,Σ)+logP(yi;ϕ)=i=1∑mlogP(xi∣yi=0;μ0,Σ)1−yiP(xi∣yi=1;μ1,Σ)yi+logP(yi;ϕ)=i=1∑m(1−yi)logP(xi∣yi=0;μ0,Σ)+yilogP(xi∣yi=1;μ1,Σ)+logP(yi;ϕ)=i=1∑m(1−yi)[−2dlog2π−21log∣Σ∣−21(x−μ0)TΣ−1(x−μ0)]+yi[−2dlog2π−21log∣Σ∣−21(x−μ1)TΣ−1(x−μ1)]+logϕy(1−ϕ)1−y
在计算似然函数的最大值我们先了解几个公式:
t r A B C = t r C A B = t r B C A ∂ t r A X ∂ X = ∂ t r X A ∂ X = A T ∂ u T v ∂ x = ∂ u v ∂ x = ∂ u ∂ x v + ∂ v ∂ x u ∂ l o g ∣ X ∣ ∂ X = 1 ∣ X ∣ ∣ X ∣ ( X − 1 ) T ∂ ∣ X ∣ ∂ X = ( X − 1 ) T ∂ t r X − 1 A ∂ X = − ( X − 1 ) T A T ( X − 1 ) T trABC=trCAB = tr BCA \\ \frac{\partial trAX}{ \partial X}=\frac{\partial trXA}{ \partial X} =A^T \\ \frac{\partial u^Tv}{\partial x} = \frac{\partial uv}{\partial x} = \frac{\partial u}{\partial x}v+\frac{\partial v}{\partial x}u \\ \frac{\partial log|X|}{\partial X} =\frac{1}{|X|}|X|(X^{-1})^T \\ \frac{\partial |X|}{\partial X} = (X^{-1})^T \\ \frac{\partial trX^{-1}A}{\partial X} = -(X^{-1})^TA^T(X^{-1})^T trABC=trCAB=trBCA∂X∂trAX=∂X∂trXA=AT∂x∂uTv=∂x∂uv=∂x∂uv+∂x∂vu∂X∂log∣X∣=∣X∣1∣X∣(X−1)T∂X∂∣X∣=(X−1)T∂X∂trX−1A=−(X−1)TAT(X−1)T
我们通过最大似然函数估计参数:
∂
l
(
ϕ
,
μ
0
,
μ
1
,
Σ
)
∂
ϕ
=
∂
∑
i
=
1
m
l
o
g
ϕ
y
i
(
1
−
ϕ
)
1
−
y
i
∂
ϕ
=
∂
∑
i
=
1
m
y
i
l
o
g
ϕ
+
(
1
−
y
i
)
l
o
g
(
1
−
ϕ
)
∂
ϕ
=
∑
i
=
1
m
y
i
ϕ
−
1
−
y
i
1
−
ϕ
=
∑
i
=
1
m
y
i
−
ϕ
ϕ
(
1
−
ϕ
)
=
0
⇒
∑
i
=
1
m
y
i
−
ϕ
=
0
⇒
∑
i
=
1
m
y
i
=
∑
i
=
1
m
ϕ
=
m
ϕ
ϕ
=
∑
i
=
1
m
I
(
y
i
=
1
)
m
\begin{aligned} \frac{\partial l(\phi, \mu_0 ,\mu_1 ,\Sigma) }{ \partial \phi} & = \frac{ \partial \sum_{i=1}^m log \phi^{y_i}(1- \phi)^{1-y_i}}{\partial \phi} \\ & = \frac{ \partial \sum_{i=1}^m y_i log \phi + (1-y_i)log(1- \phi) }{ \partial \phi } \\ & = \sum_{i=1}^m \frac {y_i}{\phi} - \frac{ 1-y_i }{ 1- \phi } = \sum_{i=1}^m \frac{y_i - \phi}{\phi (1- \phi)} = 0 \\ & \Rightarrow \sum_{i=1}^m y_i - \phi = 0 \Rightarrow \sum_{i=1}^m y_i = \sum_{i=1}^m \phi = m \phi \\ \phi & = \frac{ \sum_{i=1}^m I(y_i=1) }{m} \end{aligned}
∂ϕ∂l(ϕ,μ0,μ1,Σ)ϕ=∂ϕ∂∑i=1mlogϕyi(1−ϕ)1−yi=∂ϕ∂∑i=1myilogϕ+(1−yi)log(1−ϕ)=i=1∑mϕyi−1−ϕ1−yi=i=1∑mϕ(1−ϕ)yi−ϕ=0⇒i=1∑myi−ϕ=0⇒i=1∑myi=i=1∑mϕ=mϕ=m∑i=1mI(yi=1)
∂
l
(
ϕ
,
μ
0
,
μ
1
,
Σ
)
∂
μ
0
=
∂
∑
i
=
1
m
(
1
−
y
i
)
[
−
1
2
(
x
i
−
μ
0
)
T
Σ
−
1
(
x
i
−
μ
0
)
]
∂
μ
0
=
∑
i
=
1
m
−
1
2
(
1
−
y
i
)
[
∂
(
x
i
−
μ
0
)
∂
μ
0
Σ
−
1
(
x
i
−
μ
0
)
+
∂
Σ
−
1
(
x
i
−
μ
0
)
∂
μ
0
(
x
i
−
μ
0
)
]
=
∑
i
=
1
m
−
1
2
(
1
−
y
i
)
[
−
Σ
−
1
(
x
i
−
μ
0
)
−
(
Σ
−
1
)
T
(
x
i
−
μ
0
)
]
=
∑
i
=
1
m
(
1
−
y
i
)
Σ
−
1
(
x
i
−
μ
0
)
=
0
⇒
∑
i
=
1
m
(
1
−
y
i
)
Σ
Σ
−
1
(
x
i
−
μ
0
)
=
0
Σ
⇒
∑
i
=
1
m
(
1
−
y
i
)
(
x
i
−
μ
0
)
=
0
μ
0
=
∑
i
=
1
m
I
(
y
i
=
0
)
x
i
m
\begin{aligned} \frac{\partial l(\phi, \mu_0 ,\mu_1 ,\Sigma) }{ \partial \mu_0} & = \frac{ \partial \sum_{i=1}^m (1-y_i)[ - \frac 12 (x_i - \mu_0)^T \Sigma^{-1} (x_i - \mu_0) ]}{\partial \mu_0} \\ & = \sum_{i=1}^m - \frac 12 (1-y_i) [ \frac{\partial (x_i - \mu_0)}{\partial \mu_0} \Sigma^{-1} (x_i - \mu_0) + \frac{\partial \Sigma^{-1} (x_i - \mu_0) }{\partial \mu_0} (x_i - \mu_0)] \\ & = \sum_{i=1}^m - \frac 12 (1-y_i) [- \Sigma^{-1} (x_i - \mu_0) - (\Sigma_{-1})^T (x_i - \mu_0)] \\ & = \sum_{i=1}^m (1-y_i) \Sigma^{-1} (x_i - \mu_0) = 0 \\ & \Rightarrow \sum_{i=1}^m (1-y_i) \Sigma \Sigma^{-1} (x_i - \mu_0) = 0 \Sigma \Rightarrow \sum_{i=1}^m (1-y_i) (x_i - \mu_0) =0 \\ \mu_0 & = \frac{ \sum_{i=1}^m I(y_i=0) x_i}{m} \end{aligned}
∂μ0∂l(ϕ,μ0,μ1,Σ)μ0=∂μ0∂∑i=1m(1−yi)[−21(xi−μ0)TΣ−1(xi−μ0)]=i=1∑m−21(1−yi)[∂μ0∂(xi−μ0)Σ−1(xi−μ0)+∂μ0∂Σ−1(xi−μ0)(xi−μ0)]=i=1∑m−21(1−yi)[−Σ−1(xi−μ0)−(Σ−1)T(xi−μ0)]=i=1∑m(1−yi)Σ−1(xi−μ0)=0⇒i=1∑m(1−yi)ΣΣ−1(xi−μ0)=0Σ⇒i=1∑m(1−yi)(xi−μ0)=0=m∑i=1mI(yi=0)xi
∂
l
(
ϕ
,
μ
0
,
μ
1
,
Σ
)
∂
μ
1
=
∂
∑
i
=
1
m
y
i
[
−
1
2
(
x
i
−
μ
1
)
T
Σ
−
1
(
x
i
−
μ
1
)
]
∂
μ
1
=
∑
i
=
1
m
−
1
2
y
i
[
∂
(
x
i
−
μ
1
)
∂
μ
1
Σ
−
1
(
x
i
−
μ
1
)
+
∂
Σ
−
1
(
x
i
−
μ
1
)
∂
μ
1
(
x
i
−
μ
1
)
]
=
∑
i
=
1
m
−
1
2
y
i
[
−
Σ
−
1
(
x
i
−
μ
1
)
−
(
Σ
−
1
)
T
(
x
i
−
μ
1
)
]
=
∑
i
=
1
m
y
i
Σ
−
1
(
x
i
−
μ
1
)
=
0
⇒
∑
i
=
1
m
y
i
Σ
Σ
−
1
(
x
i
−
μ
1
)
=
0
Σ
⇒
∑
i
=
1
m
y
i
(
x
i
−
μ
1
)
=
0
μ
1
=
∑
i
=
1
m
I
(
y
i
=
1
)
x
i
m
\begin{aligned} \frac{\partial l(\phi, \mu_0 ,\mu_1 ,\Sigma) }{ \partial \mu_1} & = \frac{ \partial \sum_{i=1}^m y_i[ - \frac 12 (x_i - \mu_1)^T \Sigma^{-1} (x_i - \mu_1) ] }{\partial \mu_1} \\ & = \sum_{i=1}^m - \frac 12 y_i [ \frac{\partial (x_i - \mu_1)}{\partial \mu_1} \Sigma^{-1} (x_i - \mu_1) + \frac{\partial \Sigma^{-1} (x_i - \mu_1) }{\partial \mu_1} (x_i - \mu_1)] \\ & = \sum_{i=1}^m - \frac 12 y_i [- \Sigma^{-1} (x_i - \mu_1) - (\Sigma^{-1})^T (x_i - \mu_1)] \\ & = \sum_{i=1}^m y_i \Sigma^{-1} (x_i - \mu_1) = 0 \\ & \Rightarrow \sum_{i=1}^m y_i \Sigma \Sigma^{-1} (x_i - \mu_1) = 0 \Sigma \Rightarrow \sum_{i=1}^m y_i (x_i - \mu_1) =0 \\ \mu_1 & = \frac{ \sum_{i=1}^m I(y_i=1) x_i}{m} \end{aligned}
∂μ1∂l(ϕ,μ0,μ1,Σ)μ1=∂μ1∂∑i=1myi[−21(xi−μ1)TΣ−1(xi−μ1)]=i=1∑m−21yi[∂μ1∂(xi−μ1)Σ−1(xi−μ1)+∂μ1∂Σ−1(xi−μ1)(xi−μ1)]=i=1∑m−21yi[−Σ−1(xi−μ1)−(Σ−1)T(xi−μ1)]=i=1∑myiΣ−1(xi−μ1)=0⇒i=1∑myiΣΣ−1(xi−μ1)=0Σ⇒i=1∑myi(xi−μ1)=0=m∑i=1mI(yi=1)xi
∂
l
(
ϕ
,
μ
0
,
μ
1
,
Σ
)
∂
Σ
=
∂
∑
i
=
1
m
y
i
[
−
1
2
(
x
i
−
μ
1
)
T
Σ
−
1
(
x
i
−
μ
1
)
]
∂
Σ
+
∂
∑
i
=
1
m
(
1
−
y
i
)
[
−
1
2
(
x
i
−
μ
0
)
T
Σ
−
1
(
x
i
−
μ
0
)
]
∂
Σ
+
∂
∑
i
=
1
m
−
1
2
l
o
g
∣
Σ
∣
∂
Σ
=
∑
i
=
1
m
y
i
∂
t
r
[
−
1
2
(
x
i
−
μ
1
)
T
Σ
−
1
(
x
i
−
μ
1
)
]
∂
Σ
+
(
1
−
y
i
)
∂
t
r
[
−
1
2
(
x
i
−
μ
0
)
T
Σ
−
1
(
x
i
−
μ
0
)
]
∂
Σ
+
∑
i
=
1
m
∂
(
−
1
2
l
o
g
∣
Σ
∣
)
∂
Σ
=
∑
i
=
1
m
−
1
2
y
i
∂
t
r
[
Σ
−
1
(
x
i
−
μ
1
)
(
x
i
−
μ
1
)
T
]
∂
Σ
−
1
2
(
1
−
y
i
)
∂
t
r
[
Σ
−
1
(
x
i
−
μ
0
)
(
x
i
−
μ
0
)
T
]
∂
Σ
+
∑
i
=
1
m
−
1
2
1
∣
Σ
∣
∣
Σ
∣
(
Σ
−
1
)
T
=
∑
i
=
1
m
−
1
2
y
i
[
−
(
Σ
−
1
)
T
(
(
x
i
−
μ
1
)
(
x
i
−
μ
1
)
T
)
T
(
Σ
−
1
)
T
]
−
1
2
(
1
−
y
i
)
[
−
(
Σ
−
1
)
T
(
(
x
i
−
μ
0
)
(
x
i
−
μ
0
)
T
)
T
(
Σ
−
1
)
T
]
−
1
2
m
(
Σ
−
1
)
T
=
∑
i
=
1
m
−
1
2
y
i
[
−
Σ
−
1
(
x
i
−
μ
1
)
(
x
i
−
μ
1
)
T
Σ
−
1
]
−
1
2
(
1
−
y
i
)
[
−
Σ
−
1
(
x
i
−
μ
0
)
(
x
i
−
μ
0
)
T
Σ
−
1
]
−
1
2
m
Σ
−
1
=
0
⇒
∑
i
=
1
m
−
1
2
y
i
[
−
Σ
Σ
−
1
(
x
i
−
μ
1
)
(
x
i
−
μ
1
)
T
Σ
Σ
−
1
]
−
1
2
(
1
−
y
i
)
[
−
Σ
Σ
−
1
(
x
i
−
μ
0
)
(
x
i
−
μ
0
)
T
Σ
Σ
−
1
]
−
1
2
m
Σ
Σ
−
1
Σ
=
Σ
0
Σ
⇒
∑
i
=
1
m
−
1
2
y
i
[
−
(
x
i
−
μ
1
)
(
x
i
−
μ
1
)
T
]
−
1
2
(
1
−
y
i
)
[
−
(
x
i
−
μ
0
)
(
x
i
−
μ
0
)
T
]
−
1
2
m
Σ
=
0
⇒
∑
i
=
1
m
y
i
[
(
x
i
−
μ
1
)
(
x
i
−
μ
1
)
T
]
+
(
1
−
y
i
)
[
(
x
i
−
μ
0
)
(
x
i
−
μ
0
)
T
]
−
m
Σ
=
0
Σ
=
∑
i
=
1
m
y
i
[
(
x
i
−
μ
1
)
(
x
i
−
μ
1
)
T
]
+
(
1
−
y
i
)
[
(
x
i
−
μ
0
)
(
x
i
−
μ
0
)
T
]
m
\begin{aligned} \frac{\partial l(\phi, \mu_0 ,\mu_1 ,\Sigma) }{ \partial \Sigma} & = \frac{ \partial \sum_{i=1}^m y_i[ - \frac 12 (x_i - \mu_1)^T \Sigma^{-1} (x_i - \mu_1) ]}{\partial \Sigma} + \frac{ \partial \sum_{i=1}^m (1-y_i) [ - \frac 12 (x_i - \mu_0)^T \Sigma^{-1} (x_i - \mu_0) ]}{\partial \Sigma} + \frac{ \partial \sum_{i=1}^m - \frac 12 log| \Sigma| }{ \partial \Sigma } \\ & = \sum_{i=1}^m y_i \frac{ \partial tr[ - \frac 12 (x_i - \mu_1)^T \Sigma^{-1} (x_i - \mu_1) ]}{\partial \Sigma} + (1-y_i) \frac{ \partial tr[ - \frac 12 (x_i - \mu_0)^T \Sigma^{-1} (x_i - \mu_0) ]}{\partial \Sigma} + \sum_{i=1}^m \frac{ \partial (- \frac 12 log| \Sigma| ) }{ \partial \Sigma } \\ & = \sum_{i=1}^m - \frac 12 y_i \frac{ \partial tr[ \Sigma^{-1} (x_i - \mu_1)(x_i - \mu_1)^T ]}{\partial \Sigma} - \frac 12 (1-y_i) \frac{ \partial tr[ \Sigma^{-1} (x_i - \mu_0)(x_i - \mu_0)^T ]}{\partial \Sigma} + \sum_{i=1}^m - \frac 12 \frac{1}{|\Sigma|} |\Sigma| (\Sigma^{-1})^T \\ & = \sum_{i=1}^m - \frac 12 y_i [- (\Sigma^{-1})^T( (x_i - \mu_1)(x_i - \mu_1)^T)^T (\Sigma^{-1})^T] - \frac 12 (1-y_i) [- (\Sigma^{-1})^T( (x_i - \mu_0)(x_i - \mu_0)^T)^T (\Sigma^{-1})^T] - \frac 12 m (\Sigma^{-1})^T\\ & = \sum_{i=1}^m - \frac 12 y_i [- \Sigma^{-1} (x_i - \mu_1)(x_i - \mu_1)^T \Sigma^{-1}] - \frac 12 (1-y_i) [- \Sigma^{-1} (x_i - \mu_0)(x_i - \mu_0)^T \Sigma^{-1}] - \frac 12 m \Sigma^{-1} =0\\ & \Rightarrow \sum_{i=1}^m - \frac 12 y_i [- \Sigma \Sigma^{-1} (x_i - \mu_1)(x_i - \mu_1)^T \Sigma \Sigma^{-1}] - \frac 12 (1-y_i) [- \Sigma \Sigma^{-1} (x_i - \mu_0)(x_i - \mu_0)^T \Sigma \Sigma^{-1}] - \frac 12 m \Sigma \Sigma^{-1}\Sigma =\Sigma0\Sigma \\ & \Rightarrow \sum_{i=1}^m - \frac 12 y_i [- (x_i - \mu_1)(x_i - \mu_1)^T] - \frac 12 (1-y_i) [-(x_i - \mu_0)(x_i - \mu_0)^T] - \frac 12 m \Sigma =0 \\ & \Rightarrow \sum_{i=1}^m y_i [ (x_i - \mu_1)(x_i - \mu_1)^T] + (1-y_i) [(x_i - \mu_0)(x_i - \mu_0)^T] -m \Sigma =0 \\ \Sigma & = \frac{ \sum_{i=1}^m y_i [ (x_i - \mu_1)(x_i - \mu_1)^T] + (1-y_i) [(x_i - \mu_0)(x_i - \mu_0)^T] }{m} \end{aligned}
∂Σ∂l(ϕ,μ0,μ1,Σ)Σ=∂Σ∂∑i=1myi[−21(xi−μ1)TΣ−1(xi−μ1)]+∂Σ∂∑i=1m(1−yi)[−21(xi−μ0)TΣ−1(xi−μ0)]+∂Σ∂∑i=1m−21log∣Σ∣=i=1∑myi∂Σ∂tr[−21(xi−μ1)TΣ−1(xi−μ1)]+(1−yi)∂Σ∂tr[−21(xi−μ0)TΣ−1(xi−μ0)]+i=1∑m∂Σ∂(−21log∣Σ∣)=i=1∑m−21yi∂Σ∂tr[Σ−1(xi−μ1)(xi−μ1)T]−21(1−yi)∂Σ∂tr[Σ−1(xi−μ0)(xi−μ0)T]+i=1∑m−21∣Σ∣1∣Σ∣(Σ−1)T=i=1∑m−21yi[−(Σ−1)T((xi−μ1)(xi−μ1)T)T(Σ−1)T]−21(1−yi)[−(Σ−1)T((xi−μ0)(xi−μ0)T)T(Σ−1)T]−21m(Σ−1)T=i=1∑m−21yi[−Σ−1(xi−μ1)(xi−μ1)TΣ−1]−21(1−yi)[−Σ−1(xi−μ0)(xi−μ0)TΣ−1]−21mΣ−1=0⇒i=1∑m−21yi[−ΣΣ−1(xi−μ1)(xi−μ1)TΣΣ−1]−21(1−yi)[−ΣΣ−1(xi−μ0)(xi−μ0)TΣΣ−1]−21mΣΣ−1Σ=Σ0Σ⇒i=1∑m−21yi[−(xi−μ1)(xi−μ1)T]−21(1−yi)[−(xi−μ0)(xi−μ0)T]−21mΣ=0⇒i=1∑myi[(xi−μ1)(xi−μ1)T]+(1−yi)[(xi−μ0)(xi−μ0)T]−mΣ=0=m∑i=1myi[(xi−μ1)(xi−μ1)T]+(1−yi)[(xi−μ0)(xi−μ0)T]
综上,我们有:
ϕ
=
∑
i
=
1
m
I
(
y
i
=
1
)
m
μ
0
=
∑
i
=
1
m
I
(
y
i
=
0
)
x
i
m
μ
1
=
∑
i
=
1
m
I
(
y
i
=
1
)
x
i
m
Σ
=
∑
i
=
1
m
y
i
[
(
x
i
−
μ
1
)
(
x
i
−
μ
1
)
T
]
+
(
1
−
y
i
)
[
(
x
i
−
μ
0
)
(
x
i
−
μ
0
)
T
]
m
\phi = \frac{ \sum_{i=1}^m I(y_i=1) }{m} \\ \mu_0 = \frac{ \sum_{i=1}^m I(y_i=0) x_i}{m} \\ \mu_1 = \frac{ \sum_{i=1}^m I(y_i=1) x_i}{m} \\ \Sigma = \frac{ \sum_{i=1}^m y_i [ (x_i - \mu_1)(x_i - \mu_1)^T] + (1-y_i) [(x_i - \mu_0)(x_i - \mu_0)^T] }{m}
ϕ=m∑i=1mI(yi=1)μ0=m∑i=1mI(yi=0)xiμ1=m∑i=1mI(yi=1)xiΣ=m∑i=1myi[(xi−μ1)(xi−μ1)T]+(1−yi)[(xi−μ0)(xi−μ0)T]