机器学习-白板推导 P4_5
高斯判别分析 Gaussian Discriminant Analysis
定义
y ^ = arg max y ∈ { 0 , 1 } p ( y ∣ x ) \hat{y} = \arg \max_{y \in \lbrace 0,1 \rbrace} p(y|x) y^=argmaxy∈{0,1}p(y∣x)
借助贝叶斯定理: p ( y ∣ x ) = p ( x ∣ y ) p ( y ) p ( x ) p(y|x)=\frac{p(x|y)p(y)}{p(x)} p(y∣x)=p(x)p(x∣y)p(y)
生成式模型并不是求两个值的大小,只要能够比较出两个值的大小就行
因为 p ( x ) p(x) p(x)是一个定值
所以 p ( y ∣ x ) ∝ p ( x ∣ y ) p ( y ) p(y|x) \propto {p(x|y)p(y)} p(y∣x)∝p(x∣y)p(y)
所以: y ^ ∝ arg max y ∈ { 0 , 1 } p ( x ∣ y ) p ( y ) \hat{y} \propto \arg \max_{y \in \lbrace 0,1 \rbrace} p(x|y) p(y) y^∝argmaxy∈{0,1}p(x∣y)p(y)
假设 y y y服从伯努力分布
y
∝
B
e
r
n
o
u
l
l
i
(
ϕ
)
y \propto Bernoulli(\phi)
y∝Bernoulli(ϕ)
高斯判别分析,假设条件概率服从高斯分布,均值不同,方差相同
x ∣ y = 1 ∝ N ( μ 1 , Σ ) x|y =1 \propto N(\mu_1,\Sigma) x∣y=1∝N(μ1,Σ)
x ∣ y = 0 ∝ N ( μ 2 , Σ ) x|y =0 \propto N(\mu_2,\Sigma) x∣y=0∝N(μ2,Σ)
公式
l
o
g
−
l
i
k
e
l
i
h
o
o
d
log-likelihood
log−likelihood:
L
(
θ
)
=
∑
i
=
1
N
log
(
p
(
x
i
∣
y
i
)
p
(
y
i
)
)
=
∑
i
=
1
N
(
log
(
p
(
x
i
∣
y
i
)
+
log
(
p
(
y
i
)
)
=
∑
i
=
1
N
(
log
N
(
μ
1
,
Σ
)
y
i
N
(
μ
2
,
Σ
)
1
−
y
i
+
log
ϕ
y
i
(
1
−
ϕ
)
1
−
y
i
)
=
∑
i
=
1
N
(
log
N
(
μ
1
,
Σ
)
y
i
+
log
N
(
μ
2
,
Σ
)
1
−
y
i
+
log
ϕ
y
i
(
1
−
ϕ
)
1
−
y
i
)
\begin{aligned} L(\theta) &= \sum_{i=1}^N \log(p(x_i|y_i)p(y_i)) \\ &=\sum_{i=1}^N(\log(p(x_i|y_i) + \log(p(y_i)) \\ &=\sum_{i=1}^N(\log N(\mu_1,\Sigma)^{y_i}N(\mu_2,\Sigma)^{1-y_i} + \log \phi^{y_i}(1-\phi)^{1-y_i}) \\ &=\sum_{i=1}^N (\log N(\mu_1,\Sigma)^{y_i}+\log N(\mu_2,\Sigma)^{1-y_i} + \log \phi^{y_i}(1-\phi)^{1-y_i}) \end{aligned}
L(θ)=i=1∑Nlog(p(xi∣yi)p(yi))=i=1∑N(log(p(xi∣yi)+log(p(yi))=i=1∑N(logN(μ1,Σ)yiN(μ2,Σ)1−yi+logϕyi(1−ϕ)1−yi)=i=1∑N(logN(μ1,Σ)yi+logN(μ2,Σ)1−yi+logϕyi(1−ϕ)1−yi)
θ = ( μ 1 , μ 2 , Σ , ϕ ) \theta=(\mu_1,\mu_2,\Sigma,\phi) θ=(μ1,μ2,Σ,ϕ)
w ^ = a r g max θ L ( θ ) \hat{w}=arg \max_{\theta}L(\theta) w^=argmaxθL(θ)
求值
把样本分为2类:
y
=
1
:
N
1
y=1:N_1
y=1:N1
y
=
0
:
N
2
y=0:N_2
y=0:N2
N
=
N
1
+
N
2
N=N_1+N_2
N=N1+N2
求 ϕ \phi ϕ:
定义:
L ( θ ) 3 = log ϕ y i ( 1 − ϕ ) 1 − y i L(\theta)_3= \log \phi^{y_i}(1-\phi)^{1-y_i} L(θ)3=logϕyi(1−ϕ)1−yi
∂ L ( θ ) 3 ∂ ϕ = ∑ i = 1 N ( y i 1 ϕ + ( 1 − y i ) 1 1 − ϕ ( − 1 ) ) = 0 \frac {\partial L(\theta)_3 }{\partial \phi}=\sum_{i=1}^N( y_i \frac{1}{\phi} +(1-y_i)\frac{1}{1- \phi}(-1))=0 ∂ϕ∂L(θ)3=∑i=1N(yiϕ1+(1−yi)1−ϕ1(−1))=0
∑ i = 1 N ( y i ( 1 − ϕ ) − ( 1 − y i ) ϕ ) = 0 \sum_{i=1}^N( y_i (1- \phi) - (1-y_i){\phi})=0 ∑i=1N(yi(1−ϕ)−(1−yi)ϕ)=0
∑ i = 1 N ( y i − ϕ ) = 0 \sum_{i=1}^N( y_i -{\phi})=0 ∑i=1N(yi−ϕ)=0
∑ i = 1 N y i − N ϕ = 0 \sum_{i=1}^N y_i -N{\phi}=0 ∑i=1Nyi−Nϕ=0
所以:
ϕ ^ = 1 N ∑ i = 1 N y i = N 1 N \hat{\phi}= \frac{1}{N} \sum_{i=1}^N y_i=\frac{N_1}{N} ϕ^=N1∑i=1Nyi=NN1
求
μ
1
\mu_1
μ1:
定义:
L ( θ ) 1 = ∑ i = 1 N ( log N ( μ 1 , Σ ) y i ) = ∑ i = 1 N y i log 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 e x p ( − 1 2 ( x i − μ 1 ) T Σ − 1 ( x i − μ 1 ) ) L(\theta)_1= \sum_{i=1}^N (\log N(\mu_1,\Sigma)^{y_i})=\sum_{i=1}^Ny_i\log\frac{1}{(2 \pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}exp(-\frac{1}{2}(x_i-\mu_1)^T\Sigma^{-1}(x_i-\mu_1)) L(θ)1=∑i=1N(logN(μ1,Σ)yi)=∑i=1Nyilog(2π)2p∣Σ∣211exp(−21(xi−μ1)TΣ−1(xi−μ1))
μ 1 = a r g max μ 1 L ( θ ) 1 = a r g max μ 1 ∑ i = 1 N y i ( − 1 2 ( x i − μ 1 ) T Σ − 1 ( x i − μ 1 ) ) \mu_1=arg \max_{\mu_1}L(\theta)_1=arg \max_{\mu_1}\sum_{i=1}^Ny_i(-\frac{1}{2}(x_i-\mu_1)^T\Sigma^{-1}(x_i-\mu_1)) μ1=argmaxμ1L(θ)1=argmaxμ1∑i=1Nyi(−21(xi−μ1)TΣ−1(xi−μ1))
Δ = ∑ i = 1 N y i ( − 1 2 ( x i − μ 1 ) T Σ − 1 ( x i − μ 1 ) ) = − 1 2 ∑ i = 1 N y i ( ( x i T Σ − 1 − μ 1 T Σ − 1 ) ( x i − μ 1 ) ) = − 1 2 ∑ i = 1 N y i ( x i T Σ − 1 x i − μ 1 T Σ − 1 μ 1 − x i T Σ − 1 x i + μ 1 T Σ − 1 μ 1 ) = − 1 2 ∑ i = 1 N y i ( x i T Σ − 1 x i − 2 μ 1 T Σ − 1 x 1 + μ 1 T Σ − 1 μ 1 ) \begin{aligned} \Delta &= \sum_{i=1}^Ny_i(-\frac{1}{2}(x_i-\mu_1)^T\Sigma^{-1}(x_i-\mu_1)) \\ &=-\frac{1}{2} \sum_{i=1}^Ny_i((x_i^T \Sigma^{-1} -\mu_1^T \Sigma^{-1})(x_i-\mu_1)) \\ &=-\frac{1}{2} \sum_{i=1}^Ny_i(x_i^T \Sigma^{-1} x_i-\mu_1^T \Sigma^{-1}\mu_1-x_i^T \Sigma^{-1}x_i +\mu_1^T \Sigma^{-1}\mu_1 )\\ &=-\frac{1}{2} \sum_{i=1}^Ny_i(x_i^T \Sigma^{-1} x_i-2\mu_1^T \Sigma^{-1}x_1 +\mu_1^T \Sigma^{-1}\mu_1 )\\ \end{aligned} Δ=i=1∑Nyi(−21(xi−μ1)TΣ−1(xi−μ1))=−21i=1∑Nyi((xiTΣ−1−μ1TΣ−1)(xi−μ1))=−21i=1∑Nyi(xiTΣ−1xi−μ1TΣ−1μ1−xiTΣ−1xi+μ1TΣ−1μ1)=−21i=1∑Nyi(xiTΣ−1xi−2μ1TΣ−1x1+μ1TΣ−1μ1)
∂ Δ ∂ μ 1 = − 1 2 ∑ i = 1 N y i ( − 2 Σ − 1 x i + 2 Σ − 1 μ 1 ) = 0 \frac{\partial \Delta}{\partial \mu_1} = -\frac{1}{2} \sum_{i=1}^Ny_i(-2\Sigma^{-1}x_i+2\Sigma^{-1}\mu_1)=0 ∂μ1∂Δ=−21∑i=1Nyi(−2Σ−1xi+2Σ−1μ1)=0
∑ i = 1 N y i ( Σ − 1 μ 1 − Σ − 1 x i ) = 0 \sum_{i=1}^Ny_i(\Sigma^{-1}\mu_1-\Sigma^{-1}x_i)=0 ∑i=1Nyi(Σ−1μ1−Σ−1xi)=0
∑ i = 1 N y i ( μ 1 − x i ) = 0 \sum_{i=1}^Ny_i(\mu_1-x_i)=0 ∑i=1Nyi(μ1−xi)=0
∑ i = 1 N y i μ 1 = ∑ i = 1 N y i x i \sum_{i=1}^Ny_i\mu_1=\sum_{i=1}^Ny_i x_i ∑i=1Nyiμ1=∑i=1Nyixi
μ 1 ^ = ∑ i = 1 N y i x i ∑ i = 1 N y i = ∑ i = 1 N y i x i N 1 \hat{\mu_1}=\frac{\sum_{i=1}^Ny_ix_i}{\sum_{i=1}^N y_i}=\frac{\sum_{i=1}^Ny_ix_i}{N_1} μ1^=∑i=1Nyi∑i=1Nyixi=N1∑i=1Nyixi
求 Σ \Sigma Σ:
补充公式:
∂
t
r
(
A
B
)
∂
A
=
B
T
\frac{\partial tr(AB)}{\partial A}=B^T
∂A∂tr(AB)=BT
∂
∣
A
∣
∂
A
=
∣
A
∣
.
A
−
1
\frac{\partial|A|}{\partial A}=|A|.A^{-1}
∂A∂∣A∣=∣A∣.A−1
t
r
(
A
B
)
=
t
r
(
B
A
)
tr(AB)=tr(BA)
tr(AB)=tr(BA)
t
r
(
A
B
C
)
=
t
r
(
C
A
B
)
=
t
r
(
B
C
A
)
tr(ABC)=tr(CAB)=tr(BCA)
tr(ABC)=tr(CAB)=tr(BCA)
L ( θ ) 2 = ∑ i = 1 N ( log N ( μ 1 , Σ ) y i + log N ( μ 2 , Σ ) 1 − y i ) L(\theta)_2 = \sum_{i=1}^N (\log N(\mu_1,\Sigma)^{y_i}+\log N(\mu_2,\Sigma)^{1-y_i} ) L(θ)2=∑i=1N(logN(μ1,Σ)yi+logN(μ2,Σ)1−yi)
log N ( μ , Σ ) = log 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 e x p ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) = log 1 ( 2 π ) p 2 + log ∣ Σ ∣ − 1 2 − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) = C − 1 2 log ∣ Σ ∣ − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) \begin{aligned} \log N(\mu,\Sigma) &=\log \frac{1}{(2 \pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}exp(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)) \\ &=\log\frac{1}{(2 \pi)^{\frac{p}{2}}}+\log |\Sigma|^{-\frac{1}{2}}-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)) \\ &=C- \frac{1}{2}\log|\Sigma|-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu))\\ \end{aligned} logN(μ,Σ)=log(2π)2p∣Σ∣211exp(−21(x−μ)TΣ−1(x−μ))=log(2π)2p1+log∣Σ∣−21−21(x−μ)TΣ−1(x−μ))=C−21log∣Σ∣−21(x−μ)TΣ−1(x−μ))
∑
i
=
1
N
log
N
(
μ
,
Σ
)
=
∑
i
=
1
N
(
C
−
1
2
log
∣
Σ
∣
−
1
2
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
)
)
=
C
−
1
2
N
log
∣
Σ
∣
−
1
2
∑
i
=
1
N
log
N
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
)
\begin{aligned} \sum_{i=1}^N \log N(\mu,\Sigma) &= \sum_{i=1}^N \left( C- \frac{1}{2}\log|\Sigma|-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)) \right) \\ &=C-\frac{1}{2}N\log|\Sigma|-\frac{1}{2}\sum_{i=1}^N \log N(x-\mu)^T\Sigma^{-1}(x-\mu)) \end{aligned}
i=1∑NlogN(μ,Σ)=i=1∑N(C−21log∣Σ∣−21(x−μ)TΣ−1(x−μ)))=C−21Nlog∣Σ∣−21i=1∑NlogN(x−μ)TΣ−1(x−μ))
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
(x-\mu)^T\Sigma^{-1}(x-\mu)
(x−μ)TΣ−1(x−μ)维度为1
( x − μ ) T Σ − 1 ( x − μ ) = t r ( ( x − μ ) T Σ − 1 ( x − μ ) ) (x-\mu)^T\Sigma^{-1}(x-\mu)=tr((x-\mu)^T\Sigma^{-1}(x-\mu)) (x−μ)TΣ−1(x−μ)=tr((x−μ)TΣ−1(x−μ))
样本方差
S
=
1
N
∑
i
=
1
N
(
x
−
μ
)
(
x
−
μ
)
T
S=\frac{1}{N}\sum_{i=1}^N(x-\mu)(x-\mu)^T
S=N1∑i=1N(x−μ)(x−μ)T
∑
i
=
1
N
t
r
(
(
x
−
μ
)
T
Σ
−
1
(
x
−
μ
)
)
=
∑
i
=
1
N
t
r
(
(
x
−
μ
)
(
x
−
μ
)
T
Σ
−
1
)
=
t
r
(
∑
i
=
1
N
(
x
−
μ
)
(
x
−
μ
)
T
Σ
−
1
)
=
N
t
r
(
S
Σ
−
1
)
\begin{aligned} \sum_{i=1}^Ntr \left((x-\mu)^T\Sigma^{-1}(x-\mu) \right) &= \sum_{i=1}^Ntr \left((x-\mu)(x-\mu)^T\Sigma^{-1} \right) \\ &=tr\left( \sum_{i=1}^N(x-\mu)(x-\mu)^T\Sigma^{-1} \right) \\ &=Ntr(S\Sigma^{-1}) \end{aligned}
i=1∑Ntr((x−μ)TΣ−1(x−μ))=i=1∑Ntr((x−μ)(x−μ)TΣ−1)=tr(i=1∑N(x−μ)(x−μ)TΣ−1)=Ntr(SΣ−1)
所以:
∑
i
=
1
N
log
N
(
μ
,
Σ
)
=
C
−
1
2
N
log
∣
Σ
∣
−
1
2
N
t
r
(
S
Σ
−
1
)
\begin{aligned} \sum_{i=1}^N \log N(\mu,\Sigma) &=C-\frac{1}{2}N\log|\Sigma|-\frac{1}{2}Ntr(S\Sigma^{-1}) \end{aligned}
i=1∑NlogN(μ,Σ)=C−21Nlog∣Σ∣−21Ntr(SΣ−1)
L ( θ ) 2 = ∑ i = 1 N ( log N ( μ 1 , Σ ) y i + log N ( μ 2 , Σ ) 1 − y i ) = − 1 2 N 1 log ∣ Σ ∣ − 1 2 N 1 t r ( S 1 Σ − 1 ) − 1 2 N 2 log ∣ Σ ∣ − 1 2 N 2 t r ( S 2 Σ − 1 ) + C = − 1 2 N log ∣ Σ ∣ − 1 2 N 1 t r ( S 1 Σ − 1 ) − 1 2 N 2 t r ( S 2 Σ − 1 ) + C = − 1 2 ( N log ∣ Σ ∣ + N 1 t r ( S 1 Σ − 1 ) + N 2 t r ( S 2 Σ − 1 ) ) + C \begin{aligned} L(\theta)_2 & = \sum_{i=1}^N (\log N(\mu_1,\Sigma)^{y_i}+\log N(\mu_2,\Sigma)^{1-y_i} ) \\ &=-\frac{1}{2}N_1\log|\Sigma|-\frac{1}{2}N_1tr(S_1\Sigma^{-1})-\frac{1}{2}N_2\log|\Sigma|-\frac{1}{2}N_2tr(S_2\Sigma^{-1})+C \\ &=-\frac{1}{2}N\log|\Sigma|-\frac{1}{2}N_1tr(S_1\Sigma^{-1})-\frac{1}{2}N_2tr(S_2\Sigma^{-1}) +C\\ &=-\frac{1}{2} \left( N\log|\Sigma|+N_1tr(S_1\Sigma^{-1})+N_2tr(S_2\Sigma^{-1}) \right) +C \\ \end{aligned} L(θ)2=i=1∑N(logN(μ1,Σ)yi+logN(μ2,Σ)1−yi)=−21N1log∣Σ∣−21N1tr(S1Σ−1)−21N2log∣Σ∣−21N2tr(S2Σ−1)+C=−21Nlog∣Σ∣−21N1tr(S1Σ−1)−21N2tr(S2Σ−1)+C=−21(Nlog∣Σ∣+N1tr(S1Σ−1)+N2tr(S2Σ−1))+C
∂ L ( θ ) 2 ∂ Σ = − 1 2 ( N 1 ∣ Σ ∣ ∣ Σ ∣ Σ − 1 − N 1 S 1 Σ − 2 − N 2 S 2 Σ − 2 ) = − 1 2 ( N Σ − 1 − N 1 S 1 Σ − 2 − N 2 S 2 Σ − 2 ) = 0 \begin{aligned} \frac{\partial L(\theta)_2 }{\partial \Sigma} &= -\frac{1}{2} \left( N\frac{1}{|\Sigma|} |\Sigma| \Sigma^{-1} -N_1S_1\Sigma^{-2} -N_2S_2\Sigma^{-2} \right) \\ & = -\frac{1}{2} \left( N \Sigma^{-1} -N_1S_1\Sigma^{-2} -N_2S_2\Sigma^{-2} \right) \\ &=0 \end{aligned} ∂Σ∂L(θ)2=−21(N∣Σ∣1∣Σ∣Σ−1−N1S1Σ−2−N2S2Σ−2)=−21(NΣ−1−N1S1Σ−2−N2S2Σ−2)=0
N Σ − 1 = N 1 S 1 Σ − 2 + N 2 S 2 Σ − 2 N \Sigma^{-1} = N_1S_1\Sigma^{-2} + N_2S_2\Sigma^{-2} NΣ−1=N1S1Σ−2+N2S2Σ−2
N Σ = N 1 S 1 + N 2 S 2 N \Sigma = N_1S_1 + N_2S_2 NΣ=N1S1+N2S2
Σ ^ = 1 N ( N 1 S 1 + N 2 S 2 ) \hat{\Sigma}=\frac{1}{N}(N_1S_1 + N_2S_2) Σ^=N1(N1S1+N2S2)