高斯判别分析
高斯判别分析(Gaussian discriminative analysis )是一个较为直观的模型,属于生成模型的一种,采用一种软分类的思路,所谓软分类就是我们对一个样本决定它的类别时使用概率模型来决定,而不是直接由函数映射到某一类上。生成模型通过求解联合概率来求解
P
(
y
∣
x
)
P(y|x)
P(y∣x)。它假设
y
∼
B
e
r
n
o
u
l
l
i
(
ϕ
)
x
∣
y
=
1
∼
N
(
μ
1
,
Σ
)
x
∣
y
=
0
∼
N
(
μ
2
,
Σ
)
y \sim Bernoulli(\phi) \\ x|y=1 \sim N(\mu_1,\Sigma) \\ x|y=0 \sim N(\mu_2,\Sigma)
y∼Bernoulli(ϕ)x∣y=1∼N(μ1,Σ)x∣y=0∼N(μ2,Σ)
则有
P
(
y
)
=
ϕ
y
(
1
−
ϕ
)
1
−
y
P
(
x
∣
y
)
=
N
(
μ
1
,
Σ
)
y
⋅
N
(
μ
2
,
Σ
)
1
−
y
\begin{aligned} &P(y)=\phi^y(1-\phi)^{1-y} \\ &P(x|y)=N(\mu_1,\Sigma)^y·N(\mu_2,\Sigma)^{1-y} \end{aligned}
P(y)=ϕy(1−ϕ)1−yP(x∣y)=N(μ1,Σ)y⋅N(μ2,Σ)1−y
模型的参数为
θ
=
(
μ
1
,
μ
2
,
Σ
,
ϕ
)
\theta=(\mu_1,\mu_2,\Sigma,\phi)
θ=(μ1,μ2,Σ,ϕ)
对于生成模型,我们要求解的目标函数是
y
^
=
arg
max
y
∈
{
0
,
1
}
p
(
y
∣
x
)
=
arg
max
y
p
(
y
)
p
(
x
∣
y
)
\hat y=\arg \max_{y \in \{0,1\}}p(y|x)=\arg \max_yp(y)p(x|y)
y^=argy∈{0,1}maxp(y∣x)=argymaxp(y)p(x∣y)
定义似然函数,则
θ
^
=
arg
max
θ
l
(
θ
)
=
arg
max
θ
log
∏
i
=
1
N
p
(
x
i
,
y
i
)
=
arg
max
θ
log
∏
i
=
1
N
p
(
y
i
)
p
(
x
i
∣
y
i
)
=
arg
max
θ
∑
i
=
1
N
(
log
N
(
μ
1
,
Σ
)
y
i
+
log
N
(
μ
2
,
Σ
)
1
−
y
i
+
log
ϕ
y
i
(
1
−
ϕ
)
1
−
y
i
)
\begin{aligned} \hat \theta &=\arg \max_\theta l(\theta) \\ &=\arg \max_\theta \log \prod_{i=1}^Np(x_i,y_i) \\ &=\arg \max_\theta \log \prod_{i=1}^Np(y_i)p(x_i|y_i) \\ &=\arg \max_\theta \sum_{i=1}^N(\log N(\mu_1,\Sigma)^{y_i} \\&+\log N(\mu_2,\Sigma)^{1-y_i}+\log \phi^{y_i}(1-\phi)^{1-y_i})\\ \end{aligned}
θ^=argθmaxl(θ)=argθmaxlogi=1∏Np(xi,yi)=argθmaxlogi=1∏Np(yi)p(xi∣yi)=argθmaxi=1∑N(logN(μ1,Σ)yi+logN(μ2,Σ)1−yi+logϕyi(1−ϕ)1−yi)
- 求
ϕ
\phi
ϕ:
∂ l ( θ ) ∂ ϕ = ∑ i = 1 N y i 1 ϕ − ( 1 − y i ) 1 1 − ϕ = 0    ⟺    ∑ i = 1 N y i ( 1 − ϕ ) − ( 1 − y i ) ϕ = 0    ⟺    ∑ i = 1 N ( y i − ϕ ) = 0    ⟺    ∑ i = 1 N y i − N ϕ = 0    ⟺    ϕ ^ = 1 N ∑ i = 1 N y i = N 1 N \begin{aligned} &\frac{\partial l(\theta)}{\partial \phi}=\sum_{i=1}^Ny_i\frac{1}{ \phi}-(1-y_i)\frac{1}{1-\phi} = 0 \\ &\iff \sum_{i=1}^Ny_i(1-\phi)-(1-y_i)\phi=0 \\ &\iff \sum_{i=1}^N(y_i-\phi)=0 \\ &\iff \sum_{i=1}^Ny_i-N\phi=0 \\ &\iff \hat \phi =\frac{1}{N}\sum_{i=1}^Ny_i =\frac{N_1}{N}\\ \end{aligned} ∂ϕ∂l(θ)=i=1∑Nyiϕ1−(1−yi)1−ϕ1=0⟺i=1∑Nyi(1−ϕ)−(1−yi)ϕ=0⟺i=1∑N(yi−ϕ)=0⟺i=1∑Nyi−Nϕ=0⟺ϕ^=N1i=1∑Nyi=NN1 - 求
μ
1
,
μ
2
\mu_1,\mu_2
μ1,μ2:
两个的求解过程其实是相同的,所以我们直接求解 μ 1 \mu_1 μ1,由于我们只对 μ 1 \mu_1 μ1求解,所以原式可以化简为
∑ i = 1 N y i log 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 exp ( − 1 2 ( x i − μ 1 ) T Σ − 1 ( x i − μ 1 ) ) = ∑ i = 1 N y i log 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 exp ( − 1 2 ( x i T Σ − 1 − μ 1 T Σ − 1 ) ( x i − μ 1 ) ) = ∑ i = 1 N y i log 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 exp ( − 1 2 ( x i T Σ − 1 x i − 2 μ 1 T Σ − 1 x i + μ 1 T Σ − 1 μ 1 ) ) \begin{aligned} &\sum_{i=1}^Ny_i\log \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\exp(-\frac{1}{2}(x_i-\mu_1)^T\Sigma^{-1}(x_i-\mu_1)) \\ &=\sum_{i=1}^Ny_i\log \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\exp(-\frac{1}{2}(x_i^T\Sigma^{-1}-\mu_1^T\Sigma^{-1})(x_i-\mu_1))\\ &=\sum_{i=1}^Ny_i\log \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\exp(-\frac{1}{2}(x_i^T\Sigma^{-1}x_i-2\mu_1^T\Sigma^{-1}x_i+\mu_1^T\Sigma^{-1}\mu_1)) \end{aligned} i=1∑Nyilog(2π)2p∣Σ∣211exp(−21(xi−μ1)TΣ−1(xi−μ1))=i=1∑Nyilog(2π)2p∣Σ∣211exp(−21(xiTΣ−1−μ1TΣ−1)(xi−μ1))=i=1∑Nyilog(2π)2p∣Σ∣211exp(−21(xiTΣ−1xi−2μ1TΣ−1xi+μ1TΣ−1μ1))
对上式求导并令导数为0,有
− 1 2 ∑ i = 1 N y i ( − 2 Σ − 1 x i + 2 Σ − 1 μ 1 ) = 0    ⟺    ∑ i = 1 N y i ( Σ − 1 μ 1 − Σ − 1 x i ) = 0    ⟺    ∑ i = 1 N y i ( μ 1 − x i ) = 0    ⟺    ∑ i = 1 N y i μ 1 = ∑ i = 1 N y i x i    ⟺    μ ^ 1 = ∑ i = 1 N y i x i ∑ i = 1 N y i = ∑ i = 1 N y i x i N 1 \begin{aligned} &-\frac{1}{2}\sum_{i=1}^Ny_i(-2\Sigma^{-1}x_i+2\Sigma^{-1}\mu_1)=0 \\ &\iff \sum_{i=1}^Ny_i(\Sigma^{-1}\mu_1-\Sigma^{-1}x_i)=0 \\ &\iff \sum_{i=1}^Ny_i(\mu_1-x_i)=0 \\ &\iff \sum_{i=1}^Ny_i\mu_1=\sum_{i=1}^Ny_ix_i \\ &\iff \hat \mu_1=\frac{\sum\limits_{i=1}^Ny_ix_i}{\sum\limits_{i=1}^Ny_i}=\frac{\sum\limits_{i=1}^Ny_ix_i}{N_1} \\ \end{aligned} −21i=1∑Nyi(−2Σ−1xi+2Σ−1μ1)=0⟺i=1∑Nyi(Σ−1μ1−Σ−1xi)=0⟺i=1∑Nyi(μ1−xi)=0⟺i=1∑Nyiμ1=i=1∑Nyixi⟺μ^1=i=1∑Nyii=1∑Nyixi=N1i=1∑Nyixi
同理可得
μ ^ 2 = ∑ i = 1 N ( 1 − y i ) x i ∑ i = 1 N ( 1 − y i ) = ∑ i = 1 N ( 1 − y i ) x i N 2 \hat \mu_2=\frac{\sum\limits_{i=1}^N(1-y_i)x_i}{\sum\limits_{i=1}^N(1-y_i)}=\frac{\sum\limits_{i=1}^N(1-y_i)x_i}{N_2} μ^2=i=1∑N(1−yi)i=1∑N(1−yi)xi=N2i=1∑N(1−yi)xi - 求
Σ
\Sigma
Σ:
尝试对通项 log N ( μ , Σ ) \log N(\mu,\Sigma) logN(μ,Σ)进行化简,有
∑ i = 1 N log N ( μ , Σ ) = ∑ i = 1 N log 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 exp ( − 1 2 ( x i − μ ) T Σ − 1 ( x i − μ ) ) = ∑ i = 1 N ( log 1 ( 2 π ) p 2 + ∣ Σ ∣ − 1 2 − 1 2 ( x i − μ ) T Σ − 1 ( x i − μ ) ) = ∑ i = 1 N ( C − 1 2 log ∣ Σ ∣ − 1 2 ( x i − μ ) T Σ − 1 ( x i − μ ) ) = C − 1 2 N log ∣ Σ ∣ − 1 2 t r ( ∑ i = 1 N ( x i − μ ) T Σ − 1 ( x i − μ ) ) = C − 1 2 N log ∣ Σ ∣ − 1 2 t r ( ∑ i = 1 N ( x i − μ ) ( x i − μ ) T Σ − 1 ) = − 1 2 N log ∣ Σ ∣ − 1 2 t r ( S Σ − 1 ) + C \begin{aligned} \sum_{i=1}^N\log N(\mu,\Sigma) &=\sum_{i=1}^N \log \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\exp (-\frac{1}{2}(x_i-\mu)^T\Sigma^{-1}(x_i-\mu)) \\ &=\sum_{i=1}^N(\log \frac{1}{(2\pi)^{\frac{p}{2}}}+|\Sigma|^{-\frac{1}{2}}-\frac{1}{2}(x_i-\mu)^T\Sigma^{-1}(x_i-\mu)) \\ &=\sum_{i=1}^N(C-\frac{1}{2}\log|\Sigma|-\frac{1}{2}(x_i-\mu)^T\Sigma^{-1}(x_i-\mu))\\ &=C-\frac{1}{2}N\log |\Sigma|-\frac{1}{2}tr(\sum_{i=1}^N(x_i-\mu)^T\Sigma^{-1}(x_i-\mu))\\ &=C-\frac{1}{2}N\log |\Sigma|-\frac{1}{2}tr(\sum_{i=1}^N(x_i-\mu)(x_i-\mu)^T\Sigma^{-1})\\ &=-\frac{1}{2}N\log |\Sigma|-\frac{1}{2}tr(S\Sigma^{-1})+C\\ \end{aligned} i=1∑NlogN(μ,Σ)=i=1∑Nlog(2π)2p∣Σ∣211exp(−21(xi−μ)TΣ−1(xi−μ))=i=1∑N(log(2π)2p1+∣Σ∣−21−21(xi−μ)TΣ−1(xi−μ))=i=1∑N(C−21log∣Σ∣−21(xi−μ)TΣ−1(xi−μ))=C−21Nlog∣Σ∣−21tr(i=1∑N(xi−μ)TΣ−1(xi−μ))=C−21Nlog∣Σ∣−21tr(i=1∑N(xi−μ)(xi−μ)TΣ−1)=−21Nlog∣Σ∣−21tr(SΣ−1)+C
由于只需要对 Σ \Sigma Σ求解,所以对似然函数化简为
∑ i = 1 N ( y i log N ( μ 1 , Σ ) + ( 1 − y i ) log N ( μ 2 , Σ ) ) = ∑ x i ∈ c 1 log N ( μ 1 , Σ ) + ∑ x i ∈ c 2 log N ( μ 2 , Σ ) = − 1 2 N 1 log ∣ Σ ∣ − 1 2 t r ( S 1 Σ − 1 ) − 1 2 N 2 log ∣ Σ ∣ − 1 2 N 2 t r ( S 2 Σ − 1 ) + C = − 1 2 ( N 1 log ∣ Σ ∣ + N 1 t r ( S 1 Σ − 1 ) + N 2 log ∣ Σ ∣ + N 2 t r ( S 2 Σ − 1 ) ) + C \begin{aligned} &\sum_{i=1}^N(y_i\log N(\mu_1,\Sigma) +(1-y_i)\log N(\mu_2,\Sigma) ) \\ &=\sum_{x_i \in c_1}\log N(\mu_1,\Sigma)+\sum_{x_i \in c_2}\log N(\mu_2,\Sigma) \\ &=-\frac{1}{2}N_1\log |\Sigma|-\frac{1}{2}tr(S_1\Sigma^{-1})-\frac{1}{2}N_2\log |\Sigma|-\frac{1}{2}N_2tr(S_2\Sigma^{-1})+C \\ &=-\frac{1}{2}(N_1\log |\Sigma|+N_1tr(S_1\Sigma^{-1})+N_2\log |\Sigma|+N_2tr(S_2\Sigma^{-1}))+C \\ \end{aligned} i=1∑N(yilogN(μ1,Σ)+(1−yi)logN(μ2,Σ))=xi∈c1∑logN(μ1,Σ)+xi∈c2∑logN(μ2,Σ)=−21N1log∣Σ∣−21tr(S1Σ−1)−21N2log∣Σ∣−21N2tr(S2Σ−1)+C=−21(N1log∣Σ∣+N1tr(S1Σ−1)+N2log∣Σ∣+N2tr(S2Σ−1))+C
根据tr的求导公式
∂ t r ( A B ) ∂ A = B − 1 ∂ t r ( ∣ A ∣ ) ∂ A = ∣ A ∣ ⋅ A − 1 t r ( A B ) = t r ( B A ) \begin{aligned} &\frac{\partial tr(AB)}{\partial A}=B^{-1}\\ &\frac{\partial tr(|A|)}{\partial A}=|A|·A^{-1} \\ &tr(AB)=tr(BA) \end{aligned} ∂A∂tr(AB)=B−1∂A∂tr(∣A∣)=∣A∣⋅A−1tr(AB)=tr(BA)
对上面化简后的式子进行求导并令导数为0,有
− 1 2 ( N 1 ∣ Σ ∣ ∣ Σ ∣ Σ − 1 + N 1 ∂ t r ( Σ − 1 S 1 ) ∂ Σ − 1 ∂ t r ( Σ − 1 ) ∂ Σ + N 2 ∂ t r ( Σ − 1 S 2 ) ∂ Σ − 1 ∂ t r ( Σ − 1 ) ∂ Σ ) = 0    ⟺    N 1 ∣ Σ ∣ ∣ Σ ∣ Σ − 1 − N 1 S 1 T Σ − 2 − N 1 S 2 T Σ − 2 = 0    ⟺    N Σ − 1 − N 1 S 1 Σ − 2 − N 1 S 2 Σ − 2 = 0    ⟺    N Σ − N 1 S 1 − N 1 S 2 = 0    ⟺    N Σ − N 1 S 1 − N 1 S 2 = 0    ⟺    Σ ^ = 1 N ( N 1 S 1 + N 2 S 2 ) \begin{aligned} &-\frac{1}{2}(N\frac{1}{|\Sigma|}|\Sigma|\Sigma^{-1}+N_1\frac{\partial tr(\Sigma^{-1}S_1)}{\partial \Sigma^{-1}}\frac{\partial tr(\Sigma^{-1})}{\partial \Sigma}+N_2\frac{\partial tr(\Sigma^{-1}S_2)}{\partial \Sigma^{-1}}\frac{\partial tr(\Sigma^{-1})}{\partial \Sigma})=0 \\ &\iff N\frac{1}{|\Sigma|}|\Sigma|\Sigma^{-1}-N_1S_1^T\Sigma^{-2}-N_1S_2^T\Sigma^{-2}=0 \\ &\iff N\Sigma^{-1}-N_1S_1\Sigma^{-2}-N_1S_2\Sigma^{-2}=0\\ &\iff N\Sigma-N_1S_1-N_1S_2=0 \\ &\iff N\Sigma-N_1S_1-N_1S_2=0 \\ &\iff \hat \Sigma =\frac{1}{N}(N_1S_1+N_2S_2) \\ \end{aligned} −21(N∣Σ∣1∣Σ∣Σ−1+N1∂Σ−1∂tr(Σ−1S1)∂Σ∂tr(Σ−1)+N2∂Σ−1∂tr(Σ−1S2)∂Σ∂tr(Σ−1))=0⟺N∣Σ∣1∣Σ∣Σ−1−N1S1TΣ−2−N1S2TΣ−2=0⟺NΣ−1−N1S1Σ−2−N1S2Σ−2=0⟺NΣ−N1S1−N1S2=0⟺NΣ−N1S1−N1S2=0⟺Σ^=N1(N1S1+N2S2)