高斯判别分析(Gaussian discriminative analysis)属于概率生成式模型,并不是直接计算p(y|x)的概率,而是基于bayes,比较p(y=1|x)和p(y=0|x)的大小,从而确定分类
贝叶斯公式:
p
(
y
∣
x
)
=
p
(
x
∣
y
)
p
(
y
)
p
(
x
)
p(y|x)=\frac {p(x|y)p(y)}{p(x)}
p(y∣x)=p(x)p(x∣y)p(y)
p(x)项和p(y)没有关系,所以可以去掉,原式可以写为基于联合概率建模,形式
a
r
g
m
a
x
p
(
y
∣
x
)
=
a
r
g
m
a
x
p
(
x
∣
y
)
p
(
y
)
=
a
r
g
m
a
x
p
(
x
,
y
)
argmax \ p(y|x)=argmax\ p(x|y)p(y)=argmax\ p(x,y)
argmax p(y∣x)=argmax p(x∣y)p(y)=argmax p(x,y)
这里
p
(
y
)
是
先
验
概
率
,
p
(
y
∣
x
)
是
后
验
概
率
,
p
(
x
∣
y
)
是
似
然
函
数
p(y)是先验概率,p(y|x)是后验概率,p(x|y)是似然函数
p(y)是先验概率,p(y∣x)是后验概率,p(x∣y)是似然函数
假定:
p
(
y
)
∼
B
(
1
,
p
)
p(y) \ \ \thicksim B(1,p)
p(y) ∼B(1,p)
p
(
x
∣
y
=
1
)
∼
N
(
μ
1
,
σ
)
p(x|y=1) \ \ \thicksim N(\mu_1,\sigma)
p(x∣y=1) ∼N(μ1,σ)
p
(
x
∣
y
=
0
)
∼
N
(
μ
2
,
σ
)
p(x|y=0) \ \ \thicksim N(\mu_2,\sigma)
p(x∣y=0) ∼N(μ2,σ)
令
y
=
1
y=1
y=1
则有:
p
(
y
)
=
ρ
y
(
1
−
ρ
)
1
−
y
p(y)=\rho^y(1-\rho)^{1-y}
p(y)=ρy(1−ρ)1−y
p
(
x
∣
y
)
=
N
(
μ
1
,
σ
)
y
N
(
μ
2
,
σ
)
1
−
y
p(x|y) = N(\mu_1,\sigma)^yN(\mu_2,\sigma)^{1-y}
p(x∣y)=N(μ1,σ)yN(μ2,σ)1−y
建立似然函数有
L
(
θ
)
=
log
∏
P
(
x
∣
y
)
p
(
y
)
L(\theta) = \log\prod P(x|y)p(y)
L(θ)=log∏P(x∣y)p(y)
=
∑
log
P
(
x
∣
y
)
p
(
y
)
\quad =\sum \log P(x|y)p(y)
=∑logP(x∣y)p(y)
=
∑
log
(
N
(
μ
1
,
σ
)
y
N
(
μ
2
,
σ
)
1
−
y
ρ
y
(
1
−
ρ
)
1
−
y
)
\quad =\sum \log ( N(\mu_1,\sigma)^yN(\mu_2,\sigma)^{1-y}\rho^y(1-\rho)^{1-y})
=∑log(N(μ1,σ)yN(μ2,σ)1−yρy(1−ρ)1−y)
=
∑
log
(
N
(
μ
1
,
σ
)
y
N
(
μ
2
,
σ
)
1
−
y
)
+
log
(
ρ
y
(
1
−
ρ
)
1
−
y
)
\quad =\sum \log ( N(\mu_1,\sigma)^yN(\mu_2,\sigma)^{1-y})+\log(\rho^y(1-\rho)^{1-y})
=∑log(N(μ1,σ)yN(μ2,σ)1−y)+log(ρy(1−ρ)1−y)
=
∑
log
(
N
(
μ
1
,
σ
)
y
)
+
log
(
N
(
μ
2
,
σ
)
1
−
y
)
+
log
(
ρ
y
(
1
−
ρ
)
1
−
y
)
\quad =\sum \log ( N(\mu_1,\sigma)^y)+\log(N(\mu_2,\sigma)^{1-y})+\log(\rho^y(1-\rho)^{1-y})
=∑log(N(μ1,σ)y)+log(N(μ2,σ)1−y)+log(ρy(1−ρ)1−y)
所以
θ
=
(
μ
1
,
μ
2
,
σ
,
ρ
)
\theta=(\mu_1,\mu_2,\sigma,\rho)
θ=(μ1,μ2,σ,ρ)
最后求解
θ
^
=
a
r
g
m
a
x
L
(
θ
)
\hat \theta = argmaxL(\theta)
θ^=argmaxL(θ)
1:求
ρ
\rho
ρ
∂
L
(
θ
)
∂
ρ
=
d
∑
(
log
ρ
y
+
log
(
1
−
ρ
)
1
−
y
)
\frac{\partial L(\theta)}{\partial \rho}=d\sum( \log \rho^y+\log(1-\rho)^{1-y})
∂ρ∂L(θ)=d∑(logρy+log(1−ρ)1−y)
=
∑
(
y
1
ρ
+
(
1
−
y
)
1
1
−
ρ
(
−
1
)
)
=
0
\quad =\sum(y \frac{1}{\rho}+(1-y) \frac{1}{1-\rho}(-1)) =0
=∑(yρ1+(1−y)1−ρ1(−1))=0
=
∑
(
y
(
1
−
ρ
)
−
(
1
−
y
)
ρ
)
=
0
\quad =\sum(y(1-\rho)-(1-y) \rho) =0
=∑(y(1−ρ)−(1−y)ρ)=0
=
∑
(
ρ
−
y
ρ
−
y
+
y
ρ
)
=
0
\quad =\sum(\rho-y\rho-y+y \rho) =0
=∑(ρ−yρ−y+yρ)=0
=
∑
(
ρ
−
y
)
=
0
\quad =\sum(\rho-y) =0
=∑(ρ−y)=0
所以有
∑
y
=
∑
ρ
\quad \sum y=\sum \rho
∑y=∑ρ
因为:
∑
=
N
\quad \sum =N
∑=N
y=1的个数有
∑
y
=
N
1
\quad \sum y=N1
∑y=N1
y=0的个数有
∑
(
1
−
y
)
=
N
2
\quad \sum (1-y)=N2
∑(1−y)=N2
N
1
+
N
2
=
N
N1+N2 =N
N1+N2=N
所以
N
1
=
N
ρ
N1=N\rho
N1=Nρ
ρ
=
N
1
N
\rho = \frac{N_1}{N}
ρ=NN1
2:求
μ
1
\mu_1
μ1
∂
L
(
θ
)
∂
μ
1
=
d
∑
log
(
N
(
μ
1
,
σ
)
y
)
\frac{\partial L(\theta)}{\partial \mu_1}=d\sum \log ( N(\mu_1,\sigma)^y)
∂μ1∂L(θ)=d∑log(N(μ1,σ)y)
定义:
∑
=
[
σ
1
2
0
⋯
0
0
σ
2
2
⋯
0
⋮
⋯
⋯
⋮
0
0
⋯
σ
n
2
]
∑_{}^{} = \left[ \begin{matrix} σ_{1}^2&0&\cdots&0\\ 0&σ_{2}^2&\cdots&0\\ \vdots&\cdots&\cdots&\vdots\\ 0&0&\cdots&σ_{n}^2 \end{matrix}\right]
∑=⎣⎢⎢⎢⎡σ120⋮00σ22⋯0⋯⋯⋯⋯00⋮σn2⎦⎥⎥⎥⎤
∑
\sum
∑代表协方差矩阵, i行j列的元素值表示不同元素的协方差
因为现在变量之间是相互独立的,所以只有对角线上 (i = j)存在非0元素,其他地方都等于0,且元素与它本身的协方差就等于方差
∑是一个对角阵,根据对角矩阵的性质,它的逆矩阵表示为:
(
∑
)
−
1
=
[
1
σ
1
2
0
⋯
0
0
1
σ
2
2
⋯
0
⋮
⋯
⋯
⋮
0
0
⋯
1
σ
n
2
]
(∑_{}^{})^{-1} = \left[ \begin{matrix} \frac{1}{σ_{1}^2}&0&\cdots&0\\ 0&\frac{1}{σ_{2}^2}&\cdots&0\\ \vdots&\cdots&\cdots&\vdots\\ 0&0&\cdots&\frac{1}{σ_{n}^2} \end{matrix}\right]
(∑)−1=⎣⎢⎢⎢⎢⎡σ1210⋮00σ221⋯0⋯⋯⋯⋯00⋮σn21⎦⎥⎥⎥⎥⎤
对角矩阵的行列式 = 对角元素的乘积
σ
z
=
∣
∑
∣
1
2
=
σ
1
σ
2
.
.
.
.
.
σ
n
σ_{z}= \left|∑_{}^{}\right|^\frac{1}{2} =σ_{1}σ_{2}.....σ_{n}
σz=∣∑∣21=σ1σ2.....σn
展开有
∂
L
(
θ
)
∂
μ
1
=
d
∑
y
log
(
1
2
π
)
n
∣
∑
∣
1
2
e
x
p
(
−
1
2
(
x
−
μ
1
)
T
∑
−
1
(
x
−
μ
1
)
)
\frac{\partial L(\theta)}{\partial \mu_1}=d\sum y\log (\frac{1}{\sqrt{2\pi})^n|\sum|^{\frac{1}{2}}}exp(-\frac{1}{2}(x-\mu_1)^T\sum^{-1}(x-\mu_1))
∂μ1∂L(θ)=d∑ylog(2π)n∣∑∣211exp(−21(x−μ1)T∑−1(x−μ1))
∂
L
(
θ
)
∂
μ
1
=
d
∑
y
log
(
1
2
π
)
n
∣
∑
∣
1
2
)
−
y
1
2
(
x
−
μ
1
)
T
∑
−
1
(
x
−
μ
1
)
\frac{\partial L(\theta)}{\partial \mu_1}=d\sum y\log (\frac{1}{\sqrt{2\pi})^n|\sum|^{\frac{1}{2}}})-y\frac{1}{2}(x-\mu_1)^T\sum^{-1}(x-\mu_1)
∂μ1∂L(θ)=d∑ylog(2π)n∣∑∣211)−y21(x−μ1)T∑−1(x−μ1)
这里的第一个
∑
\sum
∑是求和符号
第一项和
μ
1
\mu_1
μ1无关,所以也就是
−
1
2
d
μ
∑
y
(
x
−
μ
1
)
T
∑
−
1
(
x
−
μ
1
)
=
0
- \frac{1}{2}d_\mu\sum y(x-\mu_1)^T\sum^{-1}(x-\mu_1) =0
−21dμ∑y(x−μ1)T∑−1(x−μ1)=0
−
1
2
d
μ
∑
y
(
x
T
∑
−
1
−
μ
1
T
∑
−
1
)
(
x
−
μ
1
)
=
0
- \frac{1}{2}d_\mu\sum y(x^T\sum^{-1}-\mu_1^T\sum^{-1})(x-\mu_1) =0
−21dμ∑y(xT∑−1−μ1T∑−1)(x−μ1)=0
−
1
2
d
μ
∑
y
(
x
T
∑
−
1
x
−
x
T
∑
−
1
μ
1
−
μ
1
T
∑
−
1
x
+
μ
1
T
∑
−
1
μ
1
)
=
0
- \frac{1}{2}d_\mu\sum y(x^T\sum^{-1}x-x^T\sum^{-1}\mu_1-\mu_1^T\sum^{-1}x+\mu_1^T\sum^{-1}\mu_1)=0
−21dμ∑y(xT∑−1x−xT∑−1μ1−μ1T∑−1x+μ1T∑−1μ1)=0
−
1
2
d
μ
∑
y
(
x
T
∑
−
1
x
−
x
T
∑
−
1
μ
1
−
μ
1
T
∑
−
1
x
+
μ
1
T
∑
−
1
μ
1
)
=
0
- \frac{1}{2}d_\mu\sum y(x^T\sum^{-1}x-x^T\sum^{-1}\mu_1-\mu_1^T\sum^{-1}x+\mu_1^T\sum^{-1}\mu_1)=0
−21dμ∑y(xT∑−1x−xT∑−1μ1−μ1T∑−1x+μ1T∑−1μ1)=0
也就是
−
1
2
∑
y
(
−
2
x
T
∑
−
1
+
2
∑
−
1
μ
1
)
=
0
- \frac{1}{2}\sum y(-2x^T\sum^{-1}+2\sum^{-1}\mu_1)=0
−21∑y(−2xT∑−1+2∑−1μ1)=0
∑
y
(
x
T
∑
−
1
−
∑
−
1
μ
1
)
=
0
\sum y(x^T\sum^{-1}-\sum^{-1}\mu_1)=0
∑y(xT∑−1−∑−1μ1)=0
∑
y
(
x
−
μ
1
)
=
0
\sum y(x-\mu_1)=0
∑y(x−μ1)=0
∑
x
y
=
∑
y
μ
1
\sum xy=\sum y\mu_1
∑xy=∑yμ1
μ
1
=
∑
x
y
∑
y
=
∑
x
y
N
1
\mu_1 =\frac{\sum xy}{\sum y} =\frac{\sum xy}{N1}
μ1=∑y∑xy=N1∑xy
求
∑
\sum
∑
矩阵的迹相关定理:
t
r
(
A
)
=
∑
A
i
i
tr(A)=\sum A_{ii}
tr(A)=∑Aii
t
r
(
A
B
)
=
t
r
(
B
A
)
tr(AB)=tr(BA)
tr(AB)=tr(BA)
t
r
(
A
B
C
)
=
t
r
(
C
B
A
)
tr(ABC)=tr(CBA)
tr(ABC)=tr(CBA)
∂
t
r
(
A
B
)
∂
A
=
B
T
\frac{\partial tr(AB)}{\partial A}=B^T
∂A∂tr(AB)=BT
|A|表示矩阵A的行列式
∂
∣
A
∣
∂
A
=
∣
A
∣
.
A
−
1
\frac{\partial |A|}{\partial A}=|A|.A^{-1}
∂A∂∣A∣=∣A∣.A−1
如果a∈实数,则有tr(a)=a
令:
C
1
=
{
x
i
∣
y
=
1
;
x
i
∈
1...
n
}
C1=\{x_i |y=1;x_i \in 1...n\}
C1={xi∣y=1;xi∈1...n}
C
2
=
{
x
i
∣
y
=
0
;
x
i
∈
1...
n
}
C2=\{x_i |y=0;x_i \in 1...n\}
C2={xi∣y=0;xi∈1...n}
∣
C
1
∣
=
N
1
|C1|=N1
∣C1∣=N1
∣
C
2
∣
=
N
2
|C2|=N2
∣C2∣=N2
N
1
+
N
2
=
N
N1+N2=N
N1+N2=N
原函数对
∑
\sum
∑ 求偏导有
∂
J
(
θ
)
∂
∑
=
d
(
∑
x
i
∈
C
1
log
(
N
(
μ
1
,
∑
)
+
∑
x
i
∈
C
2
log
(
N
(
μ
2
,
∑
)
)
=
0
\frac{\partial J(\theta)}{\partial \sum}=d(\displaystyle \sum_{x_i \in C1}\log ( N(\mu_1,\sum) +\displaystyle \sum_{x_i \in C2}\log ( N(\mu_2,\sum)) =0
∂∑∂J(θ)=d(xi∈C1∑log(N(μ1,∑)+xi∈C2∑log(N(μ2,∑))=0
令:
f
(
μ
1
)
=
∑
x
i
∈
C
1
log
(
N
(
μ
1
,
∑
)
f(\mu_1) =\displaystyle \sum_{x_i \in C1}\log ( N(\mu_1,\sum)
f(μ1)=xi∈C1∑log(N(μ1,∑)
f
(
μ
1
)
=
∑
x
i
∈
C
1
log
(
1
2
π
)
n
∣
∑
∣
1
2
e
x
p
(
−
1
2
(
x
−
μ
1
)
T
∑
−
1
(
x
−
μ
1
)
)
f(\mu_1) = \sum_{x_i \in C1}\log ( \frac{1}{\sqrt{2\pi})^n|\sum|^{\frac{1}{2}}}exp(-\frac{1}{2}(x-\mu_1)^T \sum^{-1}(x-\mu_1))
f(μ1)=∑xi∈C1log(2π)n∣∑∣211exp(−21(x−μ1)T∑−1(x−μ1))
f
(
μ
1
)
=
∑
x
i
∈
C
1
log
1
2
π
)
n
∣
∑
∣
1
2
−
1
2
(
x
−
μ
1
)
T
∑
−
1
(
x
−
μ
1
)
f(\mu_1) = \sum_{x_i \in C1}\log \frac{1}{\sqrt{2\pi})^n|\sum|^{\frac{1}{2}}}-\frac{1}{2}(x-\mu_1)^T\sum^{-1}(x-\mu_1)
f(μ1)=∑xi∈C1log2π)n∣∑∣211−21(x−μ1)T∑−1(x−μ1)
f
(
μ
1
)
=
∑
x
i
∈
C
1
log
1
2
π
)
n
−
1
2
log
∣
∑
∣
−
1
2
(
x
−
μ
1
)
T
∑
−
1
(
x
−
μ
1
)
f(\mu_1) =\sum_{x_i \in C1}\log \frac{1}{\sqrt{2\pi})^n}-{\frac{1}{2}}\log |\sum|-\frac{1}{2}(x-\mu_1)^T\sum^{-1}(x-\mu_1)
f(μ1)=∑xi∈C1log2π)n1−21log∣∑∣−21(x−μ1)T∑−1(x−μ1)
把求和符号带人有
f
(
μ
1
)
=
∑
x
i
∈
C
1
log
1
2
π
)
n
−
1
2
∑
x
i
∈
C
1
log
∣
∑
∣
−
1
2
∑
x
i
∈
C
1
(
x
−
μ
1
)
T
∑
−
1
(
x
−
μ
1
)
f(\mu_1) =\sum_{x_i \in C1}\log \frac{1}{\sqrt{2\pi})^n}-{\frac{1}{2}} \sum_{x_i \in C1}\log |\sum|-\frac{1}{2}\sum_{x_i \in C1}(x-\mu_1)^T\sum^{-1}(x-\mu_1)
f(μ1)=∑xi∈C1log2π)n1−21∑xi∈C1log∣∑∣−21∑xi∈C1(x−μ1)T∑−1(x−μ1)
∑ x i ∈ C 1 log 1 2 π ) n 和 ∑ 无 关 , 记 作 常 识 C 3 \displaystyle \sum_{x_i \in C1}\log \frac{1}{\sqrt{2\pi})^n} 和\sum无关,记作常识C3 xi∈C1∑log2π)n1和∑无关,记作常识C3
− 1 2 ∑ x i ∈ C 1 log ∣ ∑ ∣ = − 1 2 N 1 log ∣ ∑ ∣ -{\frac{1}{2}}\displaystyle \sum_{x_i \in C1}\log |\sum|=-\frac{1}{2}N1\log |\sum| −21xi∈C1∑log∣∑∣=−21N1log∣∑∣
由于
(
x
−
μ
1
)
T
(x-\mu_1)^T
(x−μ1)T是(1xn)维
∑
−
1
\sum^{-1}
∑−1是pxp 维
(
x
−
μ
1
)
(x-\mu_1)
(x−μ1)是px1维
所以
(
x
−
μ
1
)
T
∑
−
1
(
x
−
μ
1
)
(x-\mu_1)^T\sum^{-1}(x-\mu_1)
(x−μ1)T∑−1(x−μ1)结果是实数
也就可以表示为
(
x
−
μ
1
)
T
∑
−
1
(
x
−
μ
1
)
=
t
r
(
(
x
−
μ
1
)
T
∑
−
1
(
x
−
μ
1
)
)
(x-\mu_1)^T\sum^{-1}(x-\mu_1)=tr((x-\mu_1)^T\sum^{-1}(x-\mu_1))
(x−μ1)T∑−1(x−μ1)=tr((x−μ1)T∑−1(x−μ1))
=
t
r
(
(
x
−
μ
1
)
T
(
x
−
μ
1
)
∑
−
1
)
=tr((x-\mu_1)^T(x-\mu_1)\sum^{-1})
=tr((x−μ1)T(x−μ1)∑−1)
∑
x
i
∈
C
1
(
x
−
μ
1
)
T
∑
−
1
(
x
−
μ
1
)
=
∑
x
i
∈
C
1
t
r
(
(
x
−
μ
1
)
T
(
x
−
μ
1
)
∑
−
1
)
\sum_{x_i \in C1}(x-\mu_1)^T\sum^{-1}(x-\mu_1)= \sum_{x_i \in C1}tr((x-\mu_1)^T(x-\mu_1)\sum^{-1})
∑xi∈C1(x−μ1)T∑−1(x−μ1)=∑xi∈C1tr((x−μ1)T(x−μ1)∑−1)
∑
x
i
∈
C
1
(
x
−
μ
1
)
T
∑
−
1
(
x
−
μ
1
)
=
t
r
(
∑
x
i
∈
C
1
(
x
−
μ
1
)
T
(
x
−
μ
1
)
∑
−
1
)
\sum_{x_i \in C1}(x-\mu_1)^T\sum^{-1}(x-\mu_1)=tr( \sum_{x_i \in C1}(x-\mu_1)^T(x-\mu_1)\sum^{-1})
∑xi∈C1(x−μ1)T∑−1(x−μ1)=tr(∑xi∈C1(x−μ1)T(x−μ1)∑−1)
因为有:方差矩阵
S
1
=
1
N
1
(
∑
x
i
∈
C
1
(
x
−
μ
1
)
T
(
x
−
μ
1
)
)
S1=\frac{1}{N1}(\displaystyle \sum_{x_i \in C1}(x-\mu_1)^T(x-\mu_1))
S1=N11(xi∈C1∑(x−μ1)T(x−μ1))
所以
∑
x
i
∈
C
1
(
x
−
μ
1
)
T
∣
∑
∣
−
1
(
x
−
μ
1
)
=
N
1
t
r
(
S
1
∑
−
1
)
\sum_{x_i \in C1}(x-\mu_1)^T|\sum|^{-1}(x-\mu_1)=N1tr(S1\sum^{-1})
∑xi∈C1(x−μ1)T∣∑∣−1(x−μ1)=N1tr(S1∑−1)
f
(
μ
1
)
=
−
1
2
(
C
3
+
N
1
log
∣
∑
∣
+
N
1
t
r
(
S
1
∑
−
1
)
f(\mu_1) =-\frac{1}{2}(C3+N1\log |\sum|+N1tr(S1\sum^{-1})
f(μ1)=−21(C3+N1log∣∑∣+N1tr(S1∑−1)
同理:
f
(
μ
2
)
=
−
1
2
(
C
4
+
N
2
log
∣
∑
∣
+
N
2
t
r
(
S
2
∑
−
1
)
f(\mu_2) =-\frac{1}{2}(C4+N2\log |\sum|+N2tr(S2\sum^{-1})
f(μ2)=−21(C4+N2log∣∑∣+N2tr(S2∑−1)
对原函数求导可以写为
∂
J
(
θ
)
∂
∑
=
d
(
f
(
μ
1
)
+
f
(
μ
2
)
)
=
0
\frac{\partial J(\theta)}{\partial \sum}=d(f(\mu_1)+f(\mu_2)) =0
∂∑∂J(θ)=d(f(μ1)+f(μ2))=0
∂
J
(
θ
)
∂
∑
=
d
(
−
1
2
(
N
1
log
∣
∑
∣
+
N
1
t
r
(
S
1
∑
−
1
)
−
1
2
(
N
2
log
∣
∑
∣
+
N
2
t
r
(
S
2
∑
−
1
)
)
=
0
\frac{\partial J(\theta)}{\partial \sum}=d(-\frac{1}{2}(N1\log |\sum|+N1tr(S1\sum^{-1})-\frac{1}{2}(N2\log |\sum|+N2tr(S2\sum^{-1})) =0
∂∑∂J(θ)=d(−21(N1log∣∑∣+N1tr(S1∑−1)−21(N2log∣∑∣+N2tr(S2∑−1))=0
∂
J
(
θ
)
∂
∑
=
d
(
−
1
2
(
N
log
∣
∑
∣
+
N
1
t
r
(
S
1
∑
−
1
)
+
N
2
t
r
(
S
2
∑
−
1
)
)
)
=
0
\frac{\partial J(\theta)}{\partial \sum}=d(-\frac{1}{2}(N\log |\sum|+N1tr(S1\sum^{-1})+N2tr(S2\sum^{-1}))) =0
∂∑∂J(θ)=d(−21(Nlog∣∑∣+N1tr(S1∑−1)+N2tr(S2∑−1)))=0
∂
J
(
θ
)
∂
∑
=
−
1
2
(
N
1
∣
∑
∣
∣
∑
∣
∑
−
1
+
N
1
t
r
(
∑
−
1
S
1
)
+
N
2
t
r
(
∑
−
1
S
2
)
)
=
0
\frac{\partial J(\theta)}{\partial \sum}=-\frac{1}{2}(N\frac{1}{|\sum|} |\sum|\sum^{-1}+N1tr(\sum^{-1}S1)+N2tr(\sum^{-1}S2)) =0
∂∑∂J(θ)=−21(N∣∑∣1∣∑∣∑−1+N1tr(∑−1S1)+N2tr(∑−1S2))=0
∂
J
(
θ
)
∂
∑
=
−
1
2
(
N
∑
−
1
+
N
1
t
r
(
∑
−
1
S
1
)
+
N
2
t
r
(
∑
−
1
S
2
)
)
=
0
\frac{\partial J(\theta)}{\partial \sum}=-\frac{1}{2}(N\sum^{-1}+N1tr(\sum^{-1}S1)+N2tr(\sum^{-1}S2)) =0
∂∑∂J(θ)=−21(N∑−1+N1tr(∑−1S1)+N2tr(∑−1S2))=0
∂
J
(
θ
)
∂
∑
=
−
1
2
(
N
∑
−
1
−
N
1
S
1
T
∑
−
2
−
N
2
S
2
T
∑
−
2
)
=
0
\frac{\partial J(\theta)}{\partial \sum}=-\frac{1}{2}(N\sum^{-1}-N1S1^T\sum^{-2}-N2S2^T\sum^{-2})=0
∂∑∂J(θ)=−21(N∑−1−N1S1T∑−2−N2S2T∑−2)=0
两边乘以
∑
2
\sum^{2}
∑2有
N
∑
=
N
1
S
1
T
+
N
2
S
2
T
N\sum =N1S1^T+N2S2^T
N∑=N1S1T+N2S2T
∑
=
N
1
S
1
T
+
N
2
S
2
T
N
\sum =\frac{N1S1^T+N2S2^T}{N}
∑=NN1S1T+N2S2T
由于方差矩阵的对称型,所以可写为
∑
=
N
1
S
1
+
N
2
S
2
N
\sum =\frac{N1S1+N2S2}{N}
∑=NN1S1+N2S2