高斯判别分析
建立高斯模型
【假设】:
y
∼
B
e
r
n
o
u
l
i
(
Φ
)
y\sim Bernouli(\Phi)
y∼Bernouli(Φ)
x
∣
y
=
0
∼
N
(
μ
0
,
Σ
)
x|y=0\sim N(\mu_0,\Sigma)
x∣y=0∼N(μ0,Σ)
x
∣
y
=
1
∼
N
(
μ
1
,
Σ
)
x|y=1\sim N(\mu_1,\Sigma)
x∣y=1∼N(μ1,Σ)
由贝叶斯公式可得:
P
(
y
∣
x
)
=
p
(
x
∣
y
)
p
(
y
)
p
(
x
)
P(y|x)=\frac{p(x|y)p(y)}{p(x)}
P(y∣x)=p(x)p(x∣y)p(y)
则
y
^
=
a
r
g
\widehat{y}=arg
y
=arg
m
a
x
max
max
p
(
y
∣
x
)
=
p(y|x)=
p(y∣x)=
a
r
g
arg
arg
m
a
x
max
max
p
(
x
∣
y
)
p
(
y
)
p
(
x
)
=
\frac{p(x|y)p(y)}{p(x)}=
p(x)p(x∣y)p(y)=
a
r
g
arg
arg
m
a
x
max
max
p
(
x
∣
y
)
p
(
y
)
p(x|y)p(y)
p(x∣y)p(y)
【参数估计】:
构造对数似然函数:
L
(
Φ
,
μ
0
,
μ
1
,
Σ
)
=
l
o
g
∏
i
=
1
m
P
(
x
(
i
)
,
y
(
i
)
)
=
l
o
g
∏
i
=
1
m
P
(
x
(
i
)
∣
y
(
i
)
)
P
(
y
(
i
)
)
=
∑
i
=
1
m
(
l
o
g
P
(
x
(
i
)
∣
y
(
i
)
)
+
l
o
g
P
(
y
(
i
)
)
)
L(\Phi,\mu_0,\mu_1,\Sigma)=log\prod\limits_{i=1}^{m}P(x^{(i)},y^{(i)})=log\prod\limits_{i=1}^{m}P(x^{(i)}|y^{(i)})P(y^{(i)})\\\quad\quad\quad\quad=\sum\limits_{i=1}^{m}(logP(x^{(i)}|y^{(i)})+logP(y^{(i)}))
L(Φ,μ0,μ1,Σ)=logi=1∏mP(x(i),y(i))=logi=1∏mP(x(i)∣y(i))P(y(i))=i=1∑m(logP(x(i)∣y(i))+logP(y(i)))
=
∑
i
=
1
m
[
l
o
g
(
P
(
x
(
i
)
∣
y
(
i
)
=
0
)
1
−
y
(
i
)
∗
P
(
x
(
i
)
∣
y
(
i
)
=
1
)
y
(
i
)
)
+
l
o
g
P
(
y
(
i
)
)
]
\quad\quad\quad\quad=\sum\limits_{i=1}^{m}[log(P(x^{(i)}|y^{(i)}=0)^{1-y^{(i)}}*P(x^{(i)}|y^{(i)}=1)^{y^{(i)}})+logP(y^{(i)})]
=i=1∑m[log(P(x(i)∣y(i)=0)1−y(i)∗P(x(i)∣y(i)=1)y(i))+logP(y(i))]
=
∑
i
=
1
m
[
(
1
−
y
(
i
)
)
l
o
g
P
(
x
(
i
)
∣
y
(
i
)
=
0
)
+
y
(
i
)
l
o
g
P
(
x
(
i
)
∣
y
(
i
)
=
1
)
+
l
o
g
P
(
y
(
i
)
)
]
\quad\quad\quad\quad=\sum\limits_{i=1}^{m}[(1-y^{(i)})logP(x^{(i)}|y^{(i)}=0)+y^{(i)}logP(x^{(i)}|y^{(i)}=1)+logP(y^{(i)})]
=i=1∑m[(1−y(i))logP(x(i)∣y(i)=0)+y(i)logP(x(i)∣y(i)=1)+logP(y(i))]
其中,第一项只和 μ 0 , Σ \mu_0,\Sigma μ0,Σ 有关,第二项只和 μ 1 , Σ \mu_1,\Sigma μ1,Σ 有关,第三项只和 Φ \Phi Φ 有关
【求 Φ \Phi Φ】:
∂ L ∂ Φ = ∂ ∑ i = 1 m l o g P ( y ( i ) ) ∂ Φ = ∂ ∑ i = 1 m ( l o g Φ y ( i ) ∗ ( 1 − Φ ) 1 − y ( i ) ) ∂ Φ = ∂ ∑ i = 1 m ( y ( i ) l o g Φ + ( 1 − y ( i ) ) l o g ( 1 − Φ ) ) ) ∂ Φ = ∑ i = 1 m ( y ( i ) 1 Φ + ( 1 − y ( i ) ) 1 1 − Φ ) = ∑ i = 1 m ( I ( y ( i ) = 1 ) 1 Φ + I ( y ( i ) = 0 ) 1 1 − Φ ) = 0 \frac{\partial L}{\partial\Phi}=\frac{\partial \sum\limits_{i=1}^{m}logP(y^{(i)})}{\partial \Phi}=\frac{\partial \sum\limits_{i=1}^{m}(log\Phi^{y^{(i)}}*(1-\Phi)^{1-y^{(i)}})}{\partial \Phi}=\frac{\partial \sum\limits_{i=1}^{m}(y^{(i)}log\Phi+(1-y^{(i)})log(1-\Phi)))}{\partial \Phi}\\\quad=\sum\limits_{i=1}^{m}(y^{(i)}\frac{1}{\Phi}+(1-y^{(i)})\frac{1}{1-\Phi})\\\quad=\sum\limits_{i=1}^{m}(I(y^{(i)}=1)\frac{1}{\Phi}+I(y^{(i)}=0)\frac{1}{1-\Phi})=0 ∂Φ∂L=∂Φ∂i=1∑mlogP(y(i))=∂Φ∂i=1∑m(logΦy(i)∗(1−Φ)1−y(i))=∂Φ∂i=1∑m(y(i)logΦ+(1−y(i))log(1−Φ)))=i=1∑m(y(i)Φ1+(1−y(i))1−Φ1)=i=1∑m(I(y(i)=1)Φ1+I(y(i)=0)1−Φ1)=0
可求得: Φ ^ = ∑ i = 1 m I ( y ( i ) = 1 ) ∑ i = 1 m I ( y ( i ) = 0 ) + ∑ i = 1 m I ( y ( i ) = 1 ) = ∑ i = 1 m I ( y ( i ) = 1 ) m \widehat{\Phi}=\frac{\sum\limits_{i=1}^{m}I(y^{(i)}=1)}{\sum\limits_{i=1}^{m}I(y^{(i)}=0)+\sum\limits_{i=1}^{m}I(y^{(i)}=1)}=\frac{\sum\limits_{i=1}^{m}I(y^{(i)}=1)}{m} Φ =i=1∑mI(y(i)=0)+i=1∑mI(y(i)=1)i=1∑mI(y(i)=1)=mi=1∑mI(y(i)=1)
【求 μ 0 , μ 1 \mu_0,\mu_1 μ0,μ1】:
∂ L ∂ μ 0 = ∂ ∑ i = 1 m ( 1 − y ( i ) ) l o g P ( x ( i ) ∣ y ( i ) = 0 ) ∂ μ 0 = ∂ ∑ i = 1 m ( 1 − y ( i ) ) [ l o g 1 ( 2 π ) p ∣ Σ ∣ − 1 2 ( x ( i ) − μ 0 ) T Σ − 1 ( x ( i ) − μ 0 ) ] ∂ μ 0 = ∑ i = 1 m ( 1 − y ( i ) ) Σ − 1 ( x ( i ) − μ 0 ) = ∑ i = 1 m I ( y ( i ) = 0 ) ( x ( i ) − μ 0 ) = 0 \frac{\partial L}{\partial \mu_0}=\frac{\partial \sum\limits_{i=1}^{m}(1-y^{(i)})logP(x^{(i)}|y^{(i)}=0)}{\partial \mu_0}=\frac{\partial \sum\limits_{i=1}^{m}(1-y^{(i)})[log\frac{1}{\sqrt{(2\pi)^p|\Sigma|}}-\frac{1}{2}(x^{(i)}-\mu_0)^T\Sigma^{-1}(x^{(i)}-\mu_0)]}{\partial \mu_0}\\\quad\quad=\sum\limits_{i=1}^{m}(1-y^{(i)})\Sigma^{-1}(x^{(i)}-\mu_0)=\sum\limits_{i=1}^{m}I(y^{(i)}=0)(x^{(i)}-\mu_0)=0 ∂μ0∂L=∂μ0∂i=1∑m(1−y(i))logP(x(i)∣y(i)=0)=∂μ0∂i=1∑m(1−y(i))[log(2π)p∣Σ∣1−21(x(i)−μ0)TΣ−1(x(i)−μ0)]=i=1∑m(1−y(i))Σ−1(x(i)−μ0)=i=1∑mI(y(i)=0)(x(i)−μ0)=0
可求得: μ 0 ^ = ∑ i = 1 m I ( Y ( i ) = 0 ) x ( i ) ∑ i = 1 m I ( y ( i ) = 0 ) \widehat{\mu_0}=\frac{\sum\limits_{i=1}^{m}I(Y^{(i)}=0)x^{(i)}}{\sum\limits_{i=1}^{m}I(y^{(i)}=0)} μ0 =i=1∑mI(y(i)=0)i=1∑mI(Y(i)=0)x(i)
同理得: μ 1 ^ = ∑ i = 1 m I ( Y ( i ) = 1 ) x ( i ) ∑ i = 1 m I ( y ( i ) = 1 ) \widehat{\mu_1}=\frac{\sum\limits_{i=1}^{m}I(Y^{(i)}=1)x^{(i)}}{\sum\limits_{i=1}^{m}I(y^{(i)}=1)} μ1 =i=1∑mI(y(i)=1)i=1∑mI(Y(i)=1)x(i)
【求 Σ \Sigma Σ】:
令 a = l o g 1 ( 2 π ) p ∣ Σ ∣ = − p 2 l o g ( 2 π ) − 1 2 l o g ∣ Σ ∣ a=log\frac{1}{\sqrt{(2\pi)^p|\Sigma|}}=-\frac{p}{2}log(2\pi)-\frac{1}{2}log|\Sigma| a=log(2π)p∣Σ∣1=−2plog(2π)−21log∣Σ∣
Σ \Sigma Σ 之和前两项有关,因此将前两项写作:
∑ i = 1 m ( 1 − y ( i ) ) a + ∑ i = 1 m y ( i ) a − 1 2 ∑ i = 1 m ( x ( i ) − μ 0 ) T Σ − 1 ( x ( i ) − μ 0 ) − 1 2 ∑ i = 1 m ( x ( i ) − μ 1 ) T Σ − 1 ( x ( i ) − μ 1 ) = ∑ i = 1 m a − 1 2 ∑ i = 1 m ( x ( i ) − μ y ( i ) ) T Σ − 1 ( x ( i ) − μ y ( i ) ) \sum\limits_{i=1}^{m}(1-y^{(i)})a+\sum\limits_{i=1}^{m}y^{(i)}a-\frac{1}{2}\sum\limits_{i=1}^{m}(x^{(i)}-\mu_0)^T\Sigma^{-1}(x^{(i)}-\mu_0)-\frac{1}{2}\sum\limits_{i=1}^{m}(x^{(i)}-\mu_1)^T\Sigma^{-1}(x^{(i)}-\mu_1)\\=\sum\limits_{i=1}^{m}a-\frac{1}{2}\sum\limits_{i=1}^{m}(x^{(i)}-\mu_{y^{(i)}})^T\Sigma^{-1}(x^{(i)}-\mu_{y^{(i)}}) i=1∑m(1−y(i))a+i=1∑my(i)a−21i=1∑m(x(i)−μ0)TΣ−1(x(i)−μ0)−21i=1∑m(x(i)−μ1)TΣ−1(x(i)−μ1)=i=1∑ma−21i=1∑m(x(i)−μy(i))TΣ−1(x(i)−μy(i))
则 ∂ L ∂ Σ = m ( − 1 2 1 ∣ Σ ∣ ∣ Σ ∣ Σ − 1 ) − 1 2 ∑ i = 1 m ( x ( i ) − μ y ( i ) ) T ( − 1 ) Σ − 2 ( x ( i ) − μ y ( i ) ) = 0 \frac{\partial L}{\partial \Sigma}=m(-\frac{1}{2}\frac{1}{|\Sigma|}|\Sigma|\Sigma^{-1})-\frac{1}{2}\sum\limits_{i=1}^{m}(x^{(i)}-\mu_{y^{(i)}})^T(-1)\Sigma^{-2}(x^{(i)}-\mu_{y^{(i)}})=0 ∂Σ∂L=m(−21∣Σ∣1∣Σ∣Σ−1)−21i=1∑m(x(i)−μy(i))T(−1)Σ−2(x(i)−μy(i))=0
求得: Σ ^ = 1 m ∑ i = 1 m ( x ( i ) − μ y ( i ) ) T ( x ( i ) − μ y ( i ) ) \widehat{\Sigma}=\frac{1}{m}\sum\limits_{i=1}^{m}(x^{(i)}-\mu_{y^{(i)}})^T(x^{(i)}-\mu_{y^{(i)}}) Σ =m1i=1∑m(x(i)−μy(i))T(x(i)−μy(i))
【分类】:
求得上述参数之后就可以代入样本 x x x 求后验概率 p ( y = 1 ∣ x ) p(y=1|x) p(y=1∣x) 和 p ( y = 0 ∣ x ) p(y=0|x) p(y=0∣x),比较二者大小,将样本 x x x 归于后验概率大的一类。因此可以得到GDA的分离超平面 p ( y = 1 ∣ x ) = p ( y = 0 ∣ x ) p(y=1|x)=p(y=0|x) p(y=1∣x)=p(y=0∣x) p ( x ∣ y = 0 ) p ( y = 0 ) = p ( x ∣ y = 1 ) p ( y = 1 ) p(x|y=0)p(y=0)=p(x|y=1)p(y=1) p(x∣y=0)p(y=0)=p(x∣y=1)p(y=1) ( 1 − Φ ) e x p { ( x − μ 0 ) T Σ − 1 ( x − μ 0 ) } = Φ e x p { ( x − μ 1 ) T Σ − 1 ( x − μ 1 ) } (1-\Phi) exp\{(x-\mu_0)^T\Sigma^{-1}(x-\mu_0)\}=\Phi exp\{(x-\mu_1)^T\Sigma^{-1}(x-\mu_1)\} (1−Φ)exp{(x−μ0)TΣ−1(x−μ0)}=Φexp{(x−μ1)TΣ−1(x−μ1)}
对上式两边取对数化简可得:
x
T
Σ
−
1
(
μ
1
−
μ
0
)
+
(
μ
1
−
μ
0
)
T
Σ
−
1
x
=
μ
1
T
Σ
−
1
μ
1
−
μ
0
T
Σ
−
1
μ
0
+
l
o
g
Φ
−
l
o
g
(
1
−
Φ
)
x^T\Sigma^{-1}(\mu_1-\mu_0)+(\mu_1-\mu_0)^T\Sigma^{-1}x=\mu_1^{T}\Sigma^{-1}\mu_1-\mu_0^T\Sigma^{-1}\mu_0+log\Phi-log(1-\Phi)
xTΣ−1(μ1−μ0)+(μ1−μ0)TΣ−1x=μ1TΣ−1μ1−μ0TΣ−1μ0+logΦ−log(1−Φ)因为左边两项都是数,因此进一步化简:
2
x
T
Σ
−
1
(
μ
1
−
μ
0
)
=
μ
1
T
Σ
−
1
μ
1
−
μ
0
T
Σ
−
1
μ
0
+
l
o
g
Φ
−
l
o
g
(
1
−
Φ
)
2x^T\Sigma^{-1}(\mu_1-\mu_0)=\mu_1^{T}\Sigma^{-1}\mu_1-\mu_0^T\Sigma^{-1}\mu_0+log\Phi-log(1-\Phi)
2xTΣ−1(μ1−μ0)=μ1TΣ−1μ1−μ0TΣ−1μ0+logΦ−log(1−Φ)
令
A
=
2
Σ
−
1
(
μ
1
−
μ
0
)
=
(
a
1
a
2
…
a
p
)
T
,
b
=
μ
1
T
Σ
−
1
μ
1
−
μ
0
T
Σ
−
1
μ
0
+
l
o
g
Φ
−
l
o
g
(
1
−
Φ
)
A=2\Sigma^{-1}(\mu_1-\mu_0)=\begin{pmatrix}a_1&a_2&\dots&a_p\end{pmatrix}^T,b=\mu_1^{T}\Sigma^{-1}\mu_1-\mu_0^T\Sigma^{-1}\mu_0+log\Phi-log(1-\Phi)
A=2Σ−1(μ1−μ0)=(a1a2…ap)T,b=μ1TΣ−1μ1−μ0TΣ−1μ0+logΦ−log(1−Φ)
则超平面可简化为 x T A = b ⇒ a 1 x 1 + a 2 x 2 + ⋯ + a p x p = b x^TA=b\Rightarrow a_1x_1+a_2x_2+\dots+a_px_p=b xTA=b⇒a1x1+a2x2+⋯+apxp=b