PCA无监督,LDA有监督
在PCA一文中,我们简单地提到了
如果是二维空间中的样本点,那么我们就是求解出一条直线使得样本投影到该直线上的方差最大。从回归的角度来看其实就是求解出一个线性函数来拟合样本点集合。
所以我们可以从维度降低的角度来考察线性分类器。考虑二分类的情形,假设我们有一个D维输入向量 x x x,然后我们使用 y = w T x y=w^{T}x y=wTx投影到一维。我们设置一个阈值,有 N 1 N_{1} N1个点大于它是 C 1 C_{1} C1类,有 N 2 N_{2} N2个点小于是 C 2 C_{2} C2类,这就得到了一个标准的线性分类器。两类的均值向量分别为 m 1 = 1 N 1 ∑ n ∈ C 1 x n m_{1}=\frac{1}{N_{1}}\sum_{n\in C_{1}}x_{n} m1=N11n∈C1∑xn m 2 = 1 N 2 ∑ n ∈ C 2 x n m_{2}=\frac{1}{N_{2}}\sum_{n\in C_{2}}x_{n} m2=N21n∈C2∑xn投影之后最简单的度量类别之间分开程度的方法就是类别均值投影之后的距离,我们要选择一个合适的 w w w使得距离最大即 a r g m a x w w T ( m 2 − m 1 ) \underset{w}{arg\ max}\ w^{T}(m_{2}-m_{1}) warg max wT(m2−m1)由于只考虑了均值即偏差并没有考虑方差(通常需要权衡利弊两者的关系),而本身又是降维的方式可以想象得到这种方法会存在一个问题即高维中可分,降维后两个类别可能会挤在一起。如果类概率分布的协方差矩阵与对角化矩阵差距较大时,这个问题尤为突出。
Fisher提出的思想是最大化一个函数,这个函数能够让类均值的投影分开得尽可能大,同时能够让每个类别内部的方差较小,从而最小化了类别的重叠。
经过变换后,类内方差可以表示为 s k 2 = ∑ n ∈ C k ( w T x n − w T m k ) 2 s_{k}^{2}=\sum_{n\in C_{k}}(w^{T}x_{n}-w^{T}m_{k})^{2} sk2=n∈Ck∑(wTxn−wTmk)2Fisher准则根据类间方差与类内方差的比值定义 J ( w ) = ( w T ( m 2 − m 1 ) ) 2 s 1 2 + s 2 2 = w T S B w w T S W w J(w)=\frac{(w^{T}(m_{2}-m_{1}))^{2}}{s_{1}^{2}+s_{2}^{2}}=\frac{w^{T}S_{B}w}{w^{T}S_{W}w} J(w)=s12+s22(wT(m2−m1))2=wTSWwwTSBw S B = ( m 2 − m 1 ) ( m 2 − m 1 ) T S_{B}=(m_{2}-m_{1})(m_{2}-m_{1})^{T} SB=(m2−m1)(m2−m1)T S W = ∑ n ∈ C 1 ( x n − m 1 ) ( x n − m 1 ) T + ∑ n ∈ C 2 ( x n − m 2 ) ( x n − m 2 ) T S_{W}=\sum_{n\in C_{1}}(x_{n}-m_{1})(x_{n}-m_{1})^{T}+\sum_{n\in C_{2}}(x_{n}-m_{2})(x_{n}-m_{2})^{T} SW=n∈C1∑(xn−m1)(xn−m1)T+n∈C2∑(xn−m2)(xn−m2)T对J求w的导数 ∂ J ( w ) ∂ w = 0 \frac{\partial J(w)}{\partial w}=0 ∂w∂J(w)=0 ( w T S B w ) S W w = ( w T S W w ) S B w (w^{T}S_{B}w)S_{W}w=(w^{T}S_{W}w)S_{B}w (wTSBw)SWw=(wTSWw)SBw忽略标量因子 ( w T S B w ) (w^{T}S_{B}w) (wTSBw)和 ( w T S W w ) (w^{T}S_{W}w) (wTSWw), S B w S_{B}w SBw是在 ( m 2 − m 1 ) (m_{2}-m_{1}) (m2−m1)方向上。所以** w ∝ S W − 1 ( m 2 − m 1 ) w\propto S_{W}^{-1}(m_{2}-m_{1}) w∝SW−1(m2−m1)**,这就是Fisher线性判别函数。
对于二分类问题,Fisher准则可以看成最小平方的一个特例
我们假设
C
1
C_{1}
C1类的目标值为
N
N
1
\frac{N}{N_{1}}
N1N,
C
2
C_{2}
C2类的目标值为
−
N
N
2
-\frac{N}{N_{2}}
−N2N,平方和误差函数
E
=
1
2
∑
n
=
1
N
(
w
T
x
n
+
w
o
−
t
n
)
2
E=\frac{1}{2}\sum_{n=1}^{N}(w^{T}x_{n}+w_{o}-t_{n})^{2}
E=21n=1∑N(wTxn+wo−tn)2求偏导
∂
E
∂
w
=
∑
n
=
1
N
(
w
T
x
n
+
w
o
−
t
n
)
x
n
=
0
\frac{\partial E}{\partial w}=\sum_{n=1}^{N}(w^{T}x_{n}+w_{o}-t_{n})x_{n}=0
∂w∂E=n=1∑N(wTxn+wo−tn)xn=0
∂
E
∂
w
o
=
∑
n
=
1
N
(
w
T
x
n
+
w
o
−
t
n
)
=
0
\frac{\partial E}{\partial w_{o}}=\sum_{n=1}^{N}(w^{T}x_{n}+w_{o}-t_{n})=0
∂wo∂E=n=1∑N(wTxn+wo−tn)=0先化解第二个式子,
N
w
o
+
∑
n
=
1
N
w
T
x
n
=
∑
n
=
1
N
t
n
=
N
1
N
N
1
−
N
2
N
N
2
=
0
Nw_{o}+\sum_{n=1}^{N}w^{T}x_{n}=\sum_{n=1}^{N}t_{n}=N_{1}\frac{N}{N_{1}}-N_{2}\frac{N}{N_{2}}=0
Nwo+n=1∑NwTxn=n=1∑Ntn=N1N1N−N2N2N=0
w
o
=
−
w
T
m
m
=
1
N
∑
n
=
1
N
x
n
=
1
N
(
N
1
m
1
+
N
2
m
2
)
w_{o}=-w^{T}m\qquad m=\frac{1}{N}\sum_{n=1}^{N}x_{n}=\frac{1}{N}(N_{1}m_{1}+N_{2}m_{2})
wo=−wTmm=N1n=1∑Nxn=N1(N1m1+N2m2)然后把
w
o
w_{o}
wo带入到
∂
E
∂
w
\frac{\partial E}{\partial w}
∂w∂E
∑
n
=
1
N
(
w
T
x
n
−
w
T
m
−
t
n
)
x
n
=
0
\sum_{n=1}^{N}(w^{T}x_{n}-w^{T}m-t_{n})x_{n}=0
n=1∑N(wTxn−wTm−tn)xn=0
∑
n
∈
C
1
(
x
n
x
n
T
−
m
x
n
)
w
−
∑
n
∈
C
1
t
n
x
n
+
∑
n
∈
C
2
(
x
n
x
n
T
−
m
x
n
)
w
−
∑
n
∈
C
2
t
n
x
n
=
0
\sum_{n\in C_{1}}(x_{n}x_{n}^{T}-mx_{n})w-\sum_{n\in C_{1}}t_{n}x_{n}+\sum_{n\in C_{2}}(x_{n}x_{n}^{T}-mx_{n})w-\sum_{n\in C_{2}}t_{n}x_{n}=0
n∈C1∑(xnxnT−mxn)w−n∈C1∑tnxn+n∈C2∑(xnxnT−mxn)w−n∈C2∑tnxn=0
(
∑
n
∈
C
1
x
n
x
n
T
−
N
1
m
1
m
T
)
w
−
N
N
1
N
1
m
1
+
(
∑
n
∈
C
2
x
n
x
n
T
−
N
2
m
2
m
T
)
w
+
N
N
2
N
2
m
2
=
0
(\sum_{n\in C_{1}}x_{n}x_{n}^{T}-N_{1}m_{1}m^{T})w-\frac{N}{N_{1}}N_{1}m_{1}+(\sum_{n\in C_{2}}x_{n}x_{n}^{T}-N_{2}m_{2}m^{T})w+\frac{N}{N_{2}}N_{2}m_{2}=0
(n∈C1∑xnxnT−N1m1mT)w−N1NN1m1+(n∈C2∑xnxnT−N2m2mT)w+N2NN2m2=0
{
(
∑
n
∈
C
1
x
n
x
n
T
+
∑
n
∈
C
2
x
n
x
n
T
)
−
(
N
1
m
1
+
N
2
m
2
)
m
T
}
w
=
N
(
m
1
−
m
2
)
\left\{ (\sum_{n\in C_{1}}x_{n}x_{n}^{T}+\sum_{n\in C_{2}}x_{n}x_{n}^{T})-(N_{1}m_{1}+N_{2}m_{2})m^{T}\right\}w=N(m_{1}-m_{2})
{(n∈C1∑xnxnT+n∈C2∑xnxnT)−(N1m1+N2m2)mT}w=N(m1−m2)我们已经推导出了
S
W
=
∑
n
∈
C
1
(
x
n
−
m
1
)
(
x
n
−
m
1
)
T
+
∑
n
∈
C
2
(
x
n
−
m
2
)
(
x
n
−
m
2
)
T
S_{W}=\sum_{n\in C_{1}}(x_{n}-m_{1})(x_{n}-m_{1})^{T}+\sum_{n\in C_{2}}(x_{n}-m_{2})(x_{n}-m_{2})^{T}
SW=n∈C1∑(xn−m1)(xn−m1)T+n∈C2∑(xn−m2)(xn−m2)T
S
W
=
(
∑
n
∈
C
1
x
n
x
n
T
+
∑
n
∈
C
2
x
n
x
n
T
)
−
(
∑
n
∈
C
1
m
1
x
n
T
+
x
n
m
1
T
)
−
(
∑
n
∈
C
2
m
2
x
n
T
+
x
n
m
2
T
)
+
(
∑
n
∈
C
1
m
1
m
1
T
+
∑
n
∈
C
2
m
2
m
2
T
)
S_{W}=(\sum_{n\in C_{1}}x_{n}x_{n}^{T}+\sum_{n\in C_{2}}x_{n}x_{n}^{T})-(\sum_{n\in C_{1}}m_{1}x_{n}^{T}+x_{n}m_{1}^{T})-(\sum_{n\in C_{2}}m_{2}x_{n}^{T}+x_{n}m_{2}^{T})+(\sum_{n\in C_{1}}m_{1}m_{1}^{T}+\sum_{n\in C_{2}}m_{2}m_{2}^{T})
SW=(n∈C1∑xnxnT+n∈C2∑xnxnT)−(n∈C1∑m1xnT+xnm1T)−(n∈C2∑m2xnT+xnm2T)+(n∈C1∑m1m1T+n∈C2∑m2m2T)
S
W
=
(
∑
n
∈
C
1
x
n
x
n
T
+
∑
n
∈
C
2
x
n
x
n
T
)
−
(
N
1
m
1
m
1
T
+
N
1
m
1
m
1
T
)
−
(
N
2
m
2
m
2
T
+
N
2
m
2
m
2
T
)
+
(
N
1
m
1
m
1
T
+
N
2
m
2
m
2
T
)
S_{W}=(\sum_{n\in C_{1}}x_{n}x_{n}^{T}+\sum_{n\in C_{2}}x_{n}x_{n}^{T})-(N_{1}m_{1}m_{1}^{T}+N_{1}m_{1}m_{1}^{T})-(N_{2}m_{2}m_{2}^{T}+N_{2}m_{2}m_{2}^{T})+(N_{1}m_{1}m_{1}^{T}+N_{2}m_{2}m_{2}^{T})
SW=(n∈C1∑xnxnT+n∈C2∑xnxnT)−(N1m1m1T+N1m1m1T)−(N2m2m2T+N2m2m2T)+(N1m1m1T+N2m2m2T)
(
∑
n
∈
C
1
x
n
x
n
T
+
∑
n
∈
C
2
x
n
x
n
T
)
=
S
W
+
N
1
m
1
m
1
T
+
N
2
m
2
m
2
T
(\sum_{n\in C_{1}}x_{n}x_{n}^{T}+\sum_{n\in C_{2}}x_{n}x_{n}^{T})=S_{W}+N_{1}m_{1}m_{1}^{T}+N_{2}m_{2}m_{2}^{T}
(n∈C1∑xnxnT+n∈C2∑xnxnT)=SW+N1m1m1T+N2m2m2T
把这个代入
{
(
∑
n
∈
C
1
x
n
x
n
T
+
∑
n
∈
C
2
x
n
x
n
T
)
−
(
N
1
m
1
+
N
2
m
2
)
m
T
}
w
=
N
(
m
1
−
m
2
)
\left\{ (\sum_{n\in C_{1}}x_{n}x_{n}^{T}+\sum_{n\in C_{2}}x_{n}x_{n}^{T})-(N_{1}m_{1}+N_{2}m_{2})m^{T}\right\}w=N(m_{1}-m_{2})
{(n∈C1∑xnxnT+n∈C2∑xnxnT)−(N1m1+N2m2)mT}w=N(m1−m2)可得
{
S
W
+
N
1
m
1
m
1
T
+
N
2
m
2
m
2
T
−
(
N
1
m
1
+
N
2
m
2
)
1
N
(
N
1
m
1
+
N
2
m
2
)
T
}
w
=
N
(
m
1
−
m
2
)
\left\{ S_{W}+N_{1}m_{1}m_{1}^{T}+N_{2}m_{2}m_{2}^{T}-(N_{1}m_{1}+N_{2}m_{2})\frac{1}{N}(N_{1}m_{1}+N_{2}m_{2})^{T}\right\}w=N(m_{1}-m_{2})
{SW+N1m1m1T+N2m2m2T−(N1m1+N2m2)N1(N1m1+N2m2)T}w=N(m1−m2)
{
S
W
+
N
1
N
2
N
[
N
N
2
m
1
m
1
T
+
N
N
1
m
2
m
2
T
−
N
1
N
2
m
1
m
1
T
−
m
1
m
2
T
−
m
2
m
1
T
−
N
2
N
1
m
2
m
2
T
]
}
w
=
N
(
m
1
−
m
2
)
\left\{ S_{W}+\frac{N_{1}N_{2}}{N}[\frac{N}{N_{2}}m_{1}m_{1}^{T}+\frac{N}{N_{1}}m_{2}m_{2}^{T}-\frac{N_{1}}{N_{2}}m_{1}m_{1}^{T}-m_{1}m_{2}^{T}-m_{2}m_{1}^{T}-\frac{N_{2}}{N_{1}}m_{2}m_{2}^{T}]\right\}w=N(m_{1}-m_{2})
{SW+NN1N2[N2Nm1m1T+N1Nm2m2T−N2N1m1m1T−m1m2T−m2m1T−N1N2m2m2T]}w=N(m1−m2)
{
S
W
+
N
1
N
2
N
[
m
1
m
1
T
+
m
2
m
2
T
−
m
1
m
2
T
−
m
2
m
1
T
]
}
w
=
N
(
m
1
−
m
2
)
\left\{ S_{W}+\frac{N_{1}N_{2}}{N}[m_{1}m_{1}^{T}+m_{2}m_{2}^{T}-m_{1}m_{2}^{T}-m_{2}m_{1}^{T}]\right\}w=N(m_{1}-m_{2})
{SW+NN1N2[m1m1T+m2m2T−m1m2T−m2m1T]}w=N(m1−m2)
{
S
W
+
N
1
N
2
N
S
B
}
w
=
N
(
m
1
−
m
2
)
\left\{ S_{W}+\frac{N_{1}N_{2}}{N}S_{B}\right\}w=N(m_{1}-m_{2})
{SW+NN1N2SB}w=N(m1−m2)所以从这个式子中还是能看出
w
∝
S
W
−
1
(
m
2
−
m
1
)
w\propto S_{W}^{-1}(m_{2}-m_{1})
w∝SW−1(m2−m1)