高斯判别分析模型定义
高斯判别分析:Gaussian discriminant analysis。
假设存在样本 X N × p X_{N\times p} XN×p满足如下形式:
X = ( x 1 x 2 . . . x N ) T = ( x 1 T x 2 T ⋮ x N T ) N × p = ( x 11 x 12 . . . x 1 p x 21 x 22 . . . x 2 p ⋮ ⋮ ⋮ x N 1 x N 2 . . . x N p ) N × p X=\left ( x_{1} \ x_{2} \ ...\ x_{N}\right )^{T} =\left( \begin{matrix} x^T_1 \\ x^T_2 \\ \vdots \\ x^T_N \\ \end{matrix} \right)_{N \times p} = \left( \begin{matrix} x_{11} & x_{12} & ... & x_{1p} \\ x_{21} & x_{22} & ... & x_{2p} \\ \vdots & \vdots & & \vdots \\ x_{N1} & x_{N2} & ... & x_{Np} \\ \end{matrix} \right )_{N\times p} X=(x1 x2 ... xN)T=⎝⎜⎜⎜⎛x1Tx2T⋮xNT⎠⎟⎟⎟⎞N×p=⎝⎜⎜⎜⎛x11x21⋮xN1x12x22⋮xN2.........x1px2p⋮xNp⎠⎟⎟⎟⎞N×p
存在样本 Y N × 1 Y_{N\times 1} YN×1满足如下形式:
Y
=
(
y
1
y
2
⋮
y
N
)
N
×
1
Y =\left( \begin{matrix} y_{1} \\ y_{2} \\ \vdots \\ y_{N} \\ \end{matrix} \right )_{N \times 1}
Y=⎝⎜⎜⎜⎛y1y2⋮yN⎠⎟⎟⎟⎞N×1
上述
X
X
X和
Y
Y
Y组成
{
(
x
i
,
y
i
)
}
i
=
1
N
\left\{ \left( x_i,y_i\right) \right\}_{i=1}^{N}
{(xi,yi)}i=1N样式样本点。
首先,高斯判别分析的作用也是用于分类。对于两类样本,其服从伯努利分布,假设
Y
Y
Y满足伯努利分布,则有:
y i y_i yi | 1 | 0 |
---|---|---|
P P P | ϕ \phi ϕ | 1 − ϕ 1-\phi 1−ϕ |
⇒
{
ϕ
y
,
y
i
=
1
(
1
−
ϕ
)
1
−
y
i
,
y
i
=
0
⇒
ϕ
y
(
1
−
ϕ
)
1
−
y
i
\Rightarrow\left\{\begin{matrix} \phi^y, &y_i=1 \\ (1-\phi)^{1-y_i},&y_i=0 & \end{matrix}\right. \Rightarrow \phi^y(1-\phi)^{1-y_i}
⇒{ϕy,(1−ϕ)1−yi,yi=1yi=0⇒ϕy(1−ϕ)1−yi
对于每个类中的样本,假定都服从高斯分布,并有相同的协方差
Σ
\Sigma
Σ,则有:
x
i
∣
y
i
=
1
∼
N
(
μ
1
,
Σ
)
x
i
∣
y
i
=
0
∼
N
(
μ
2
,
Σ
)
}
⇒
N
(
μ
1
,
Σ
)
y
i
⋅
N
(
μ
2
,
Σ
)
1
−
y
i
\left.\begin{matrix} x_i|y_i=1\sim N(\mu_1,\Sigma)\\ x_i|y_i=0 \sim N(\mu_2,\Sigma) \end{matrix}\right\} \Rightarrow N(\mu_1,\Sigma)^{y_i}\cdot N(\mu_2,\Sigma)^{1-y_i}
xi∣yi=1∼N(μ1,Σ)xi∣yi=0∼N(μ2,Σ)}⇒N(μ1,Σ)yi⋅N(μ2,Σ)1−yi
并假设有
N
1
N_1
N1个
y
i
=
1
y_i=1
yi=1,
N
2
N_2
N2个
y
i
=
0
y_i=0
yi=0,并且有
N
1
+
N
2
=
N
N_1+N_2=N
N1+N2=N。
这样,根据训练样本,估计出先验概率以及高斯分布的均值和协方差矩阵,即可通过如下贝叶斯公式求出一个新样本分别属于两类的概率,进而可实现对该样本的分类。
P
(
y
∣
x
)
=
P
(
x
∣
y
)
P
(
y
)
P
(
x
)
∝
P
(
x
∣
y
)
P
(
y
)
P(y|x)=\frac{P(x|y)P(y)}{P(x)} \propto P(x|y)P(y)
P(y∣x)=P(x)P(x∣y)P(y)∝P(x∣y)P(y)
对于新来的样本
y
y
y,我们通过计算
P
(
y
=
1
∣
x
)
P(y=1|x)
P(y=1∣x)和
P
(
y
=
0
∣
x
)
P(y=0|x)
P(y=0∣x)并比较两者大小,将
y
y
y分类至求出概率大的一类,为此有:
y
^
=
arg
max
y
∈
{
0
,
1
}
P
(
y
∣
x
)
=
arg
max
y
∈
{
0
,
1
}
P
(
x
∣
y
)
P
(
y
)
=
arg
max
y
∈
{
0
,
1
}
P
(
x
,
y
)
\hat{y} = \underset{y\in \left \{ 0,1\right\}}{\arg\max}P(y|x) = \underset{y\in \left \{ 0,1\right\}}{\arg\max} P(x|y)P(y)=\underset{y\in \left \{ 0,1\right\}}{\arg\max} P(x,y)
y^=y∈{0,1}argmaxP(y∣x)=y∈{0,1}argmaxP(x∣y)P(y)=y∈{0,1}argmaxP(x,y)
高斯判别分析的核心工作就是估计上述未知量
μ
1
,
μ
2
,
Σ
,
ϕ
\mu_1,\mu_2,\Sigma,\phi
μ1,μ2,Σ,ϕ。现通过对数似然函数
L
(
θ
)
L(\theta)
L(θ)估计上述未知量,其中
θ
=
(
μ
1
,
μ
2
,
Σ
,
ϕ
)
\theta=(\mu_1,\mu_2,\Sigma,\phi)
θ=(μ1,μ2,Σ,ϕ):
L
(
θ
)
=
log
∏
i
=
1
N
P
(
x
,
y
)
=
log
∏
i
=
1
N
P
(
x
∣
y
)
P
(
y
)
=
∑
i
=
1
N
log
P
(
x
∣
y
)
+
∑
i
=
1
N
log
P
(
y
)
L(\theta) = \log \prod_{i=1}^NP(x,y) =\log \prod_{i=1}^N P(x|y)P(y) =\sum_{i=1}^N \log P(x|y)+\sum_{i=1}^N \log P(y)
L(θ)=logi=1∏NP(x,y)=logi=1∏NP(x∣y)P(y)=i=1∑NlogP(x∣y)+i=1∑NlogP(y)
代入概率,得:
L
(
θ
)
=
∑
i
=
1
N
[
log
N
(
μ
1
,
Σ
)
y
i
+
log
N
(
μ
2
,
Σ
)
1
−
y
i
+
log
ϕ
y
(
1
−
ϕ
)
1
−
y
i
]
L(\theta) = \sum_{i=1}^N \left [ \log N(\mu_1,\Sigma)^{y_i} +\log N(\mu_2,\Sigma)^{1-y_i} +\log \phi^y(1-\phi)^{1-y_i} \right ]
L(θ)=i=1∑N[logN(μ1,Σ)yi+logN(μ2,Σ)1−yi+logϕy(1−ϕ)1−yi]
高斯判别分析模型求 ϕ \phi ϕ
求
ϕ
\phi
ϕ,因为
ϕ
\phi
ϕ只与
L
(
θ
)
L(\theta)
L(θ)第三项有关,我们令:
Δ
=
∑
i
=
1
N
log
ϕ
y
(
1
−
ϕ
)
1
−
y
i
=
∑
i
=
1
N
y
log
ϕ
+
∑
i
=
1
N
(
1
−
y
i
)
log
(
1
−
ϕ
)
\Delta =\sum_{i=1}^N \log \phi^y(1-\phi)^{1-y_i} =\sum_{i=1}^N y\log \phi+\sum_{i=1}^N (1-y_i)\log (1-\phi)
Δ=i=1∑Nlogϕy(1−ϕ)1−yi=i=1∑Nylogϕ+i=1∑N(1−yi)log(1−ϕ)
对
Δ
\Delta
Δ求导有:
∂
Δ
∂
ϕ
=
∑
i
=
1
N
y
ϕ
−
∑
i
=
1
N
1
−
y
1
−
ϕ
∑
i
=
1
N
[
y
(
1
−
ϕ
)
−
ϕ
(
1
−
y
)
=
0
]
∑
i
=
1
N
[
y
−
ϕ
]
=
0
∑
i
=
1
N
y
=
N
ϕ
\frac{\partial{\Delta}}{\partial{\phi}} =\sum_{i=1}^N \frac{y}{\phi}-\sum_{i=1}^N \frac{1-y}{1-\phi}\\ \sum_{i=1}^N \left [ y(1-\phi)-\phi(1-y) = 0 \right ] \\ \sum_{i=1}^N \left [y-\phi \right ] = 0\\ \sum_{i=1}^N y = N\phi
∂ϕ∂Δ=i=1∑Nϕy−i=1∑N1−ϕ1−yi=1∑N[y(1−ϕ)−ϕ(1−y)=0]i=1∑N[y−ϕ]=0i=1∑Ny=Nϕ
所以有:
ϕ
=
∑
i
=
1
N
y
N
=
N
1
N
\phi = \frac{\sum_{i=1}^N y}{N}=\frac{N_1}{N}
ϕ=N∑i=1Ny=NN1
高斯判别分析模型求 μ 1 , μ 2 \mu_1,\mu_2 μ1,μ2
前面求出:
L
(
θ
)
=
∑
i
=
1
N
[
log
N
(
μ
1
,
Σ
)
y
i
+
log
N
(
μ
2
,
Σ
)
1
−
y
i
+
log
ϕ
y
(
1
−
ϕ
)
1
−
y
i
]
L(\theta) = \sum_{i=1}^N \left [ \log N(\mu_1,\Sigma)^{y_i} +\log N(\mu_2,\Sigma)^{1-y_i} +\log \phi^y(1-\phi)^{1-y_i} \right ]
L(θ)=i=1∑N[logN(μ1,Σ)yi+logN(μ2,Σ)1−yi+logϕy(1−ϕ)1−yi]
观察
μ
1
,
μ
2
\mu_1,\mu_2
μ1,μ2只与第一项和第二项有关,并且
μ
1
\mu_1
μ1只与第一项有关,我们令
Δ
=
∑
i
=
1
N
log
N
(
μ
1
,
Σ
)
y
i
=
∑
i
=
1
N
y
i
log
{
1
(
2
π
)
p
2
∣
Σ
∣
1
2
exp
[
−
1
2
(
x
i
−
μ
1
)
T
Σ
−
1
(
x
i
−
μ
1
)
]
}
=
∑
i
=
1
N
y
i
log
{
1
(
2
π
)
p
2
∣
Σ
∣
1
2
}
−
1
2
∑
i
=
1
N
y
i
(
x
i
−
μ
1
)
T
Σ
−
1
(
x
i
−
μ
1
)
\Delta =\sum_{i=1}^N \log N(\mu_1,\Sigma)^{y_i}\\ =\sum_{i=1}^N y_i\log \left \{ \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}} \exp\left [ -\frac{1}{2} (x_i-\mu_1)^T\Sigma^{-1} (x_i-\mu_1) \right ]\right\}\\ =\sum_{i=1}^N y_i\log \left \{ \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\right\} -\frac{1}{2} \sum_{i=1}^Ny_i (x_i-\mu_1)^T\Sigma^{-1} (x_i-\mu_1)
Δ=i=1∑NlogN(μ1,Σ)yi=i=1∑Nyilog{(2π)2p∣Σ∣211exp[−21(xi−μ1)TΣ−1(xi−μ1)]}=i=1∑Nyilog{(2π)2p∣Σ∣211}−21i=1∑Nyi(xi−μ1)TΣ−1(xi−μ1)
Δ
\Delta
Δ对
μ
1
\mu_1
μ1求导有:
∂
Δ
∂
μ
1
=
∂
[
−
1
2
∑
i
=
1
N
y
i
(
x
i
−
μ
1
)
T
Σ
−
1
(
x
i
−
μ
1
)
]
∂
μ
1
=
−
1
2
∑
i
=
1
N
y
i
(
x
i
−
μ
1
)
Σ
−
1
=
0
\frac{\partial{\Delta}}{\partial{\mu_1}}=\frac{\partial{[-\frac{1}{2} \sum_{i=1}^Ny_i (x_i-\mu_1)^T\Sigma^{-1} (x_i-\mu_1)]}}{\partial{\mu_1}}\\ =-\frac{1}{2} \sum_{i=1}^Ny_i (x_i-\mu_1) \Sigma^{-1}=0
∂μ1∂Δ=∂μ1∂[−21∑i=1Nyi(xi−μ1)TΣ−1(xi−μ1)]=−21i=1∑Nyi(xi−μ1)Σ−1=0
那么:
∑
i
=
1
N
y
i
(
x
i
−
μ
1
)
=
0
∑
i
=
1
N
y
i
x
i
=
∑
i
=
1
N
y
i
μ
1
=
N
1
μ
1
\sum_{i=1}^Ny_i (x_i-\mu_1)=0\\ \sum_{i=1}^Ny_i x_i=\sum_{i=1}^Ny_i \mu_1=N_1\mu_1
i=1∑Nyi(xi−μ1)=0i=1∑Nyixi=i=1∑Nyiμ1=N1μ1
所以有:
μ
1
^
=
1
N
1
∑
i
=
1
N
y
i
x
i
\hat{\mu_1}=\frac{1}{N_1}\sum_{i=1}^Ny_i x_i
μ1^=N11i=1∑Nyixi
同理,对于
μ
2
^
\hat{\mu_2}
μ2^:
μ
2
^
=
1
N
2
∑
i
=
1
N
(
1
−
y
i
)
x
i
\hat{\mu_2}=\frac{1}{N_2}\sum_{i=1}^N(1-y_i) x_i
μ2^=N21i=1∑N(1−yi)xi
高斯判别分析模型求 Σ \Sigma Σ
前面求出:
L
(
θ
)
=
∑
i
=
1
N
[
log
N
(
μ
1
,
Σ
)
y
i
+
log
N
(
μ
2
,
Σ
)
1
−
y
i
+
log
ϕ
y
(
1
−
ϕ
)
1
−
y
i
]
L(\theta) = \sum_{i=1}^N \left [ \log N(\mu_1,\Sigma)^{y_i} +\log N(\mu_2,\Sigma)^{1-y_i} +\log \phi^y(1-\phi)^{1-y_i} \right ]
L(θ)=i=1∑N[logN(μ1,Σ)yi+logN(μ2,Σ)1−yi+logϕy(1−ϕ)1−yi]
此时令:
Δ
=
∑
i
=
1
N
[
y
i
log
N
(
μ
1
,
Σ
)
+
(
1
−
y
i
)
log
N
(
μ
2
,
Σ
)
]
\Delta = \sum_{i=1}^N \left [ y_i\log N(\mu_1,\Sigma) +(1-y_i)\log N(\mu_2,\Sigma) \right]
Δ=i=1∑N[yilogN(μ1,Σ)+(1−yi)logN(μ2,Σ)]
为方便求导计算,现对
Δ
\Delta
Δ做如下转换:
Δ
=
Δ
1
+
Δ
2
=
∑
x
i
∈
c
1
log
N
(
μ
1
,
Σ
)
+
∑
x
i
∈
c
2
log
N
(
μ
2
,
Σ
)
\Delta = \Delta_1+\Delta_2= \sum_{x_i\in c_1}\log N(\mu_1,\Sigma)+ \sum_{x_i\in c_2}\log N(\mu_2,\Sigma)
Δ=Δ1+Δ2=xi∈c1∑logN(μ1,Σ)+xi∈c2∑logN(μ2,Σ)
化简
Δ
1
\Delta_1
Δ1:
Δ
1
=
∑
i
=
1
N
1
log
{
1
(
2
π
)
p
2
∣
Σ
∣
1
2
exp
[
−
1
2
(
x
i
−
μ
1
)
T
Σ
−
1
(
x
i
−
μ
1
)
]
}
=
∑
i
=
1
N
1
log
1
(
2
π
)
p
2
∣
Σ
∣
1
2
−
1
2
∑
i
=
1
N
1
(
x
i
−
μ
1
)
T
Σ
−
1
(
x
i
−
μ
1
)
\Delta_1= \sum_{i=1}^{N_1} \log \left \{ \frac{1}{(2\pi)^{\frac{p}{2}}| \Sigma|^{\frac{1}{2}}} \exp\left [ -\frac{1}{2} (x_i-\mu_1)^T\Sigma^{-1} (x_i-\mu_1) \right ]\right\}\\ =\sum_{i=1}^{N_1} \log \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}} -\frac{1}{2}\sum_{i=1}^{N_1} (x_i-\mu_1)^T\Sigma^{-1} (x_i-\mu_1)
Δ1=i=1∑N1log{(2π)2p∣Σ∣211exp[−21(xi−μ1)TΣ−1(xi−μ1)]}=i=1∑N1log(2π)2p∣Σ∣211−21i=1∑N1(xi−μ1)TΣ−1(xi−μ1)
为方便计算,现引入如下定义:
∂
t
r
(
A
B
)
∂
A
=
B
T
∂
∣
A
∣
∂
A
=
∣
A
∣
A
−
1
t
r
(
A
B
)
=
t
r
(
B
A
)
t
r
(
A
B
C
)
=
t
r
(
C
A
B
)
=
t
r
(
B
C
A
)
\frac{\partial{tr(AB)}}{\partial{A}} = B^T\\ \frac{\partial{|A|}}{\partial{A}} = |A|A^{-1}\\ tr(AB) = tr(BA)\\ tr(ABC) = tr(CAB)=tr(BCA)
∂A∂tr(AB)=BT∂A∂∣A∣=∣A∣A−1tr(AB)=tr(BA)tr(ABC)=tr(CAB)=tr(BCA)
因为
(
x
i
−
μ
1
)
T
Σ
−
1
(
x
i
−
μ
1
)
(x_i-\mu_1)^T\Sigma^{-1} (x_i-\mu_1)
(xi−μ1)TΣ−1(xi−μ1)为一维实数,将其转换为
t
r
[
(
x
i
−
μ
1
)
T
Σ
−
1
(
x
i
−
μ
1
)
]
tr[(x_i-\mu_1)^T\Sigma^{-1} (x_i-\mu_1)]
tr[(xi−μ1)TΣ−1(xi−μ1)]对其值无任何影响,根据迹的性质,所以
Δ
1
\Delta_1
Δ1有:
Δ
1
=
∑
i
=
1
N
1
log
1
(
2
π
)
p
2
∣
Σ
∣
1
2
−
1
2
∑
i
=
1
N
1
t
r
[
(
x
i
−
μ
1
)
T
Σ
−
1
(
x
i
−
μ
1
)
]
=
∑
i
=
1
N
1
log
1
(
2
π
)
p
2
∣
Σ
∣
1
2
−
1
2
∑
i
=
1
N
1
t
r
[
(
x
i
−
μ
1
)
(
x
i
−
μ
1
)
T
Σ
−
1
]
\Delta_1=\sum_{i=1}^{N_1} \log \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}} -\frac{1}{2}\sum_{i=1}^{N_1} tr[(x_i-\mu_1)^T\Sigma^{-1} (x_i-\mu_1)]\\ =\sum_{i=1}^{N_1} \log \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}} -\frac{1}{2}\sum_{i=1}^{N_1} tr[(x_i-\mu_1)(x_i-\mu_1)^T\Sigma^{-1} ]
Δ1=i=1∑N1log(2π)2p∣Σ∣211−21i=1∑N1tr[(xi−μ1)TΣ−1(xi−μ1)]=i=1∑N1log(2π)2p∣Σ∣211−21i=1∑N1tr[(xi−μ1)(xi−μ1)TΣ−1]
因为样本方差
S
1
=
1
N
1
∑
i
=
1
N
1
(
x
i
−
μ
1
)
(
x
i
−
μ
1
)
T
S_1=\frac{1}{N_1}\sum_{i=1}^{N_1} (x_i-\mu_1)(x_i-\mu_1)^T
S1=N11∑i=1N1(xi−μ1)(xi−μ1)T,所以上式有:
Δ
1
=
∑
i
=
1
N
1
log
(
2
π
)
−
p
2
−
1
2
∑
i
=
1
N
1
log
∣
Σ
∣
−
1
2
N
1
t
r
[
1
N
1
∑
i
=
1
N
1
(
x
i
−
μ
1
)
(
x
i
−
μ
1
)
T
Σ
−
1
]
=
C
−
1
2
N
1
log
∣
Σ
∣
−
1
2
N
1
t
r
(
S
1
Σ
−
1
)
\Delta_1=\sum_{i=1}^{N_1}\log (2\pi)^{-\frac{p}{2}}-\frac{1}{2}\sum_{i=1}^{N_1}\log|\Sigma|-\frac{1}{2}N_1tr\left [\frac{1}{N_1}\sum_{i=1}^{N_1} (x_i-\mu_1)(x_i-\mu_1)^T\Sigma^{-1}\right]\\ =C-\frac{1}{2}N_1\log|\Sigma|-\frac{1}{2}N_1tr(S_1\Sigma^{-1})
Δ1=i=1∑N1log(2π)−2p−21i=1∑N1log∣Σ∣−21N1tr[N11i=1∑N1(xi−μ1)(xi−μ1)TΣ−1]=C−21N1log∣Σ∣−21N1tr(S1Σ−1)
所以:
Δ
=
Δ
1
+
Δ
2
=
−
1
2
N
1
log
∣
Σ
∣
−
1
2
N
1
t
r
(
S
1
Σ
−
1
)
−
1
2
N
2
log
∣
Σ
∣
−
1
2
N
2
t
r
(
S
2
Σ
−
1
)
=
−
1
2
[
N
log
∣
Σ
∣
+
N
1
t
r
(
S
1
Σ
−
1
)
+
N
2
t
r
(
S
2
Σ
−
1
)
]
\Delta = \Delta_1+\Delta_2\\ =-\frac{1}{2}N_1\log|\Sigma|-\frac{1}{2}N_1tr(S_1\Sigma^{-1})-\frac{1}{2}N_2\log|\Sigma|-\frac{1}{2}N_2tr(S_2\Sigma^{-1})\\ =-\frac{1}{2} \left [ N\log|\Sigma|+ N_1tr(S_1\Sigma^{-1})+ N_2tr(S_2\Sigma^{-1}) \right ]
Δ=Δ1+Δ2=−21N1log∣Σ∣−21N1tr(S1Σ−1)−21N2log∣Σ∣−21N2tr(S2Σ−1)=−21[Nlog∣Σ∣+N1tr(S1Σ−1)+N2tr(S2Σ−1)]
现将
Δ
\Delta
Δ对
Σ
\Sigma
Σ求导,有(利用上文给出的行列式和迹求导的公式):
∂
Δ
∂
Σ
=
−
1
2
(
N
1
∣
Σ
∣
∣
Σ
∣
Σ
−
1
−
N
1
S
1
Σ
−
2
−
N
2
S
2
Σ
−
2
)
=
−
1
2
[
N
Σ
−
(
N
1
S
1
+
N
2
S
2
)
]
=
0
\frac{\partial{\Delta}}{\partial{\Sigma}} =-\frac{1}{2}\left ( N\frac{1}{|\Sigma|}|\Sigma|\Sigma^{-1}-N_1S_1\Sigma^{-2}-N_2S_2\Sigma^{-2}\right )\\ =-\frac{1}{2}\left [ N\Sigma -(N_1S_1+N_2S_2) \right]=0
∂Σ∂Δ=−21(N∣Σ∣1∣Σ∣Σ−1−N1S1Σ−2−N2S2Σ−2)=−21[NΣ−(N1S1+N2S2)]=0
所以有:
Σ
^
=
1
N
(
N
1
S
1
+
N
2
S
2
)
\hat{\Sigma}=\frac{1}{N}(N_1S_1+N_2S_2)
Σ^=N1(N1S1+N2S2)
求解完毕。
后记
至此,
θ
=
(
μ
1
,
μ
2
,
Σ
,
ϕ
)
\theta=(\mu_1,\mu_2,\Sigma,\phi)
θ=(μ1,μ2,Σ,ϕ)均求解完毕!
日后补充