主要思想
LDA将 D D D维特征 X = [ x 1 , x 2 , ⋯ , x N ] ∈ R D × N \mathbf{X}=[\mathbf{x}_1, \mathbf{x}_2, \cdots, \mathbf{x}_N]\in\mathbb{R}^{D\times N} X=[x1,x2,⋯,xN]∈RD×N( x i ∈ R D \mathbf{x}_i\in\mathbb{R}^{D} xi∈RD)映射到 d ( d ≪ D ) d(d\ll D) d(d≪D)维空间中( Z = [ z 1 , z 2 , ⋯ , z N ] = W T X ∈ R d × N \mathbf{Z}=[\mathbf{z}_1, \mathbf{z}_2, \cdots, \mathbf{z}_N]=\mathbf{W}^T\mathbf{X}\in\mathbb{R}^{d\times N} Z=[z1,z2,⋯,zN]=WTX∈Rd×N, W = [ w 1 , w 2 , ⋯ , w d ] ∈ R D × d \mathbf{W}=[\mathbf{w}_1,\mathbf{w}_2,\cdots,\mathbf{w}_d]\in\mathbb{R}^{D\times d} W=[w1,w2,⋯,wd]∈RD×d, w i ∈ R D \mathbf{w}_i\in\mathbb{R}^D wi∈RD),使得在降维后的空间中,尽量使同一类的数据聚集,不同类的数据尽可能分散,因此对于每个特征 x i \mathbf{x}_i xi还需要知道其对应的标签 y i ∈ { 1 , 2 , ⋯ , C } y_i\in\{1,2,\cdots,\mathcal{C}\} yi∈{1,2,⋯,C}。
推导方法
此处我们不考虑二分类的情况,直接考虑多分类的情况。根据降维思想,降维后的
Z
\mathbf{Z}
Z满足同类更紧凑,而异类更分散,那么衡量“紧凑”和“分散”可以定义类内散度矩阵和类间散度矩阵:
S
w
=
∑
c
=
1
C
N
c
N
S
w
(
c
)
=
∑
c
=
1
C
N
c
N
1
N
c
∑
i
=
1
N
1
[
y
i
=
c
]
(
x
i
−
μ
c
)
(
x
i
−
μ
c
)
T
S
b
=
1
2
∑
c
1
,
c
2
=
1
C
N
c
1
N
N
c
2
N
S
b
(
c
1
,
c
2
)
=
1
2
∑
c
1
,
c
2
=
1
C
N
c
1
N
N
c
2
N
(
μ
c
1
−
μ
c
2
)
(
μ
c
1
−
μ
c
2
)
T
=
1
2
∑
c
1
=
1
C
N
c
1
N
∑
c
2
=
1
C
N
c
2
N
(
μ
c
1
−
μ
c
2
)
(
μ
c
1
−
μ
c
2
)
T
=
1
2
(
∑
c
1
=
1
C
N
c
1
N
μ
c
1
μ
c
1
T
−
μ
μ
T
−
μ
μ
T
+
∑
c
2
=
1
C
N
c
2
N
μ
c
2
μ
c
2
T
)
=
∑
c
1
=
1
C
N
c
1
N
μ
c
1
μ
c
1
T
−
μ
μ
T
=
∑
c
1
=
1
C
N
c
1
N
(
μ
c
1
μ
c
1
T
−
μ
μ
T
+
2
μ
μ
T
−
μ
c
1
μ
T
−
μ
μ
c
1
T
)
=
∑
c
1
=
1
C
N
c
1
N
(
μ
c
1
μ
c
1
T
+
μ
μ
T
−
μ
c
1
μ
T
−
μ
μ
c
1
T
)
=
∑
c
=
1
C
N
c
N
(
μ
c
−
μ
)
(
μ
c
−
μ
)
T
\begin{aligned} \mathbf{S}_w&=\sum_{c=1}^{\mathcal{C}}{\frac{N_c}{N}}\mathbf{S}_{w}^{(c)}\\ &=\sum_{c=1}^{\mathcal{C}}{\frac{N_c}{N}}\frac{1}{N_c}\sum_{i=1}^N{1}[y_i=c](\mathbf{x}_i-\mu _c)(\mathbf{x}_i-\mu _c)^T\\ \mathbf{S}_b&=\frac{1}{2}\sum_{c_1,c_2=1}^{\mathcal{C}}{\frac{N_{c_1}}{N}\frac{N_{c_2}}{N}\mathbf{S}_{b}^{(c_1,c_2)}}\\ &=\frac{1}{2}\sum_{c_1,c_2=1}^{\mathcal{C}}{\frac{N_{c_1}}{N}\frac{N_{c_2}}{N}(\mu _{c_1}-\mu _{c_2})(\mu _{c_1}-\mu _{c_2})^T}\\ &=\frac{1}{2}\sum_{c_1=1}^{\mathcal{C}}{\frac{N_{c_1}}{N}\sum_{c_2=1}^{\mathcal{C}}{\frac{N_{c_2}}{N}}(\mu _{c_1}-\mu _{c_2})(\mu _{c_1}-\mu _{c_2})^T}\\ &=\frac{1}{2}\left( \sum_{c_1=1}^{\mathcal{C}}{\frac{N_{c_1}}{N}\mu _{c_1}\mu _{c_1}^{T}-\mu \mu ^T-\mu \mu ^T+\sum_{c_2=1}^{\mathcal{C}}{\frac{N_{c_2}}{N}}\mu _{c_2}\mu _{c_2}^{T}} \right)\\ &=\sum_{c_1=1}^{\mathcal{C}}{\frac{N_{c_1}}{N}\mu _{c_1}\mu _{c_1}^{T}-\mu \mu ^T}\\ &=\sum_{c_1=1}^{\mathcal{C}}{\frac{N_{c_1}}{N}(\mu _{c_1}\mu _{c_1}^{T}-\mu \mu ^T+{\color{red} 2\mu \mu ^T-\mu _{c_1}\mu ^T-\mu \mu _{c_1}^{T}})}\\ &=\sum_{c_1=1}^{\mathcal{C}}{\frac{N_{c_1}}{N}(\mu _{c_1}\mu _{c_1}^{T}+\mu \mu ^T-\mu _{c_1}\mu ^T-\mu \mu _{c_1}^{T})}\\ &=\sum_{c=1}^{\mathcal{C}}{\frac{N_{c}}{N}\left( \mu _{c}-\mu \right) \left( \mu _{c}-\mu \right) ^T}\\ \end{aligned}
SwSb=c=1∑CNNcSw(c)=c=1∑CNNcNc1i=1∑N1[yi=c](xi−μc)(xi−μc)T=21c1,c2=1∑CNNc1NNc2Sb(c1,c2)=21c1,c2=1∑CNNc1NNc2(μc1−μc2)(μc1−μc2)T=21c1=1∑CNNc1c2=1∑CNNc2(μc1−μc2)(μc1−μc2)T=21(c1=1∑CNNc1μc1μc1T−μμT−μμT+c2=1∑CNNc2μc2μc2T)=c1=1∑CNNc1μc1μc1T−μμT=c1=1∑CNNc1(μc1μc1T−μμT+2μμT−μc1μT−μμc1T)=c1=1∑CNNc1(μc1μc1T+μμT−μc1μT−μμc1T)=c=1∑CNNc(μc−μ)(μc−μ)T
其中
N
c
=
∑
i
=
1
N
1
[
y
i
=
c
]
μ
c
=
1
N
c
∑
i
=
1
N
1
[
y
i
=
c
]
x
i
μ
=
1
N
∑
i
=
1
N
x
i
=
∑
c
=
1
C
N
c
N
μ
c
\begin{aligned} N_c&=\sum_{i=1}^N1[y_i=c]\\ \mu_c&=\frac1{N_c}\sum_{i=1}^N{1[y_i=c]\mathbf{x}_i}\\ \mu&=\frac1{N}\sum_{i=1}^N{\mathbf{x}_i}\\ &=\sum_{c=1}^\mathcal{C}\frac{N_c}{N}\mu_c \end{aligned}
Ncμcμ=i=1∑N1[yi=c]=Nc1i=1∑N1[yi=c]xi=N1i=1∑Nxi=c=1∑CNNcμc
以上的
S
b
\mathbf{S}_b
Sb和
S
w
\mathbf{S}_w
Sw计算的是
X
\mathbf{X}
X,而目标是找到
W
\mathbf{W}
W使得
Z
=
W
T
X
\mathbf{Z}=\mathbf{W}^T\mathbf{X}
Z=WTX在同类中更加紧凑,在异类中更分散。优化目标有很多种表示方法:
求和优化目标
a
r
g
max
W
t
r
a
c
e
(
W
T
S
b
W
)
t
r
a
c
e
(
W
T
S
w
W
)
⇔
{
a
r
g
max
W
t
r
a
c
e
(
W
T
S
b
W
)
s
.
t
.
W
T
S
w
W
=
I
⇔
{
a
r
g
max
W
t
r
a
c
e
(
V
T
S
w
−
1
/
2
S
b
S
w
−
1
/
2
V
)
s
.
t
.
V
T
V
=
I
V
=
S
w
1
/
2
W
\underset{\mathbf{W}}{\mathrm{arg}\max}\frac{trace\left( \mathbf{W}^T\mathbf{S}_b\mathbf{W} \right)}{trace\left( \mathbf{W}^T\mathbf{S}_w\mathbf{W} \right)} \\ \Leftrightarrow \left\{ \begin{array}{c} \underset{\mathbf{W}}{\mathrm{arg}\max}trace\left( \mathbf{W}^T\mathbf{S}_b\mathbf{W} \right)\\ s.t.\mathbf{W}^T\mathbf{S}_w\mathbf{W}=\mathbb{I}\\ \end{array} \right. \\ \Leftrightarrow \left\{ \begin{array}{c} \underset{\mathbf{W}}{\mathrm{arg}\max}trace\left( \mathbf{V}^T\mathbf{S}_{w}^{-1/2}\mathbf{S}_b\mathbf{S}_{w}^{-1/2}\mathbf{V} \right)\\ s.t.\mathbf{V}^T\mathbf{V}=\mathbb{I}\\ \mathbf{V}=\mathbf{S}_{w}^{1/2}\mathbf{W}\\ \end{array} \right.
Wargmaxtrace(WTSwW)trace(WTSbW)⇔{Wargmaxtrace(WTSbW)s.t.WTSwW=I⇔⎩
⎨
⎧Wargmaxtrace(VTSw−1/2SbSw−1/2V)s.t.VTV=IV=Sw1/2W
那么以上的求解与PCA所用的方法一致了,然后再
W
=
S
w
−
1
/
2
V
\mathbf{W}=\mathbf{S}_{w}^{-1/2}\mathbf{V}
W=Sw−1/2V就OK啦!需要说明的是,由于
r
a
n
k
(
S
b
)
≤
C
−
1
rank(\mathbf{S}_b)\leq \mathcal{C}-1
rank(Sb)≤C−1,所以最终求得的
V
\mathbf{V}
V最多就
C
−
1
\mathcal{C}-1
C−1个特征向量。
求积优化目标
a
r
g
max
W
∏
i
=
1
d
w
i
T
S
b
w
i
w
i
T
S
w
w
i
⇔
{
a
r
g
max
W
∏
i
=
1
d
v
i
T
S
w
−
1
/
2
S
b
S
w
−
1
/
2
v
i
s
.
t
.
V
T
V
=
I
V
=
S
w
1
/
2
W
\underset{\mathbf{W}}{\mathrm{arg}\max}\prod_{i=1}^d{\frac{\mathbf{w}_{i}^{T}\mathbf{S}_b\mathbf{w}_i}{\mathbf{w}_{i}^{T}\mathbf{S}_w\mathbf{w}_i}} \\ \Leftrightarrow \left\{ \begin{array}{c} \underset{\mathbf{W}}{\mathrm{arg}\max}\prod_{i=1}^d{\mathbf{v}_{i}^{T}\mathbf{S}_{w}^{-1/2}\mathbf{S}_b\mathbf{S}_{w}^{-1/2}\mathbf{v}_i}\\ s.t. \mathbf{V}^T\mathbf{V}=\mathbb{I}\\ \mathbf{V}=\mathbf{S}_{w}^{1/2}\mathbf{W}\\ \end{array} \right.
Wargmaxi=1∏dwiTSwwiwiTSbwi⇔⎩
⎨
⎧Wargmax∏i=1dviTSw−1/2SbSw−1/2vis.t.VTV=IV=Sw1/2W
更简单的一种是
S
w
−
1
/
2
S
b
S
w
−
1
/
2
\mathbf{S}_{w}^{-1/2}\mathbf{S}_b\mathbf{S}_{w}^{-1/2}
Sw−1/2SbSw−1/2求出的特征向量
v
\mathbf{v}
v不用左乘
S
w
−
1
/
2
\mathbf{S}_{w}^{-1/2}
Sw−1/2得到
w
\mathbf{w}
w,可以直接通过求解
S
w
−
1
S
b
\mathbf{S}_{w}^{-1}\mathbf{S}_b
Sw−1Sb得到
w
\mathbf{w}
w。
from stat import S_IFBLK
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
def plot_2d(X_r, y, target_names, name):
if 'PCA' in name:
plt.subplot(1,3,1)
else:
if 'Sklearn' in name:
plt.subplot(1,3,2)
else:
plt.subplot(1,3,3)
colors = ["navy", "turquoise", "darkorange"]
lw = 2
for color, i, target_name in zip(colors, [0, 1, 2], target_names):
plt.scatter(
X_r[y == i, 0], X_r[y == i, 1], color=color, alpha=0.8, lw=lw, label=target_name
)
plt.legend(loc="best", shadow=False, scatterpoints=1)
plt.xlabel(f"{name} of IRIS dataset")
def sklearn_lda(X, y, target_names):
lda = LinearDiscriminantAnalysis(n_components=2)
X_r2 = lda.fit(X, y).transform(X)
plot_2d(X_r2, y, target_names, 'Sklearn LDA')
return X_r2
def my_lda(X, y, target_names):
Sw = 0.0
Sb = 0.0
mu = np.mean(X, axis=0)
for i in range(3):
X_c = X[y==i,:]
mu_c = np.mean(X_c, axis=0)
N_c = X_c.shape[0]
Sw = Sw + (X_c-mu_c).T @ (X_c-mu_c)
Sb = Sb + N_c * np.reshape(mu_c - mu, [-1, 1]) @ np.reshape(mu_c - mu, [1, -1])
S = np.linalg.inv(Sw) @ Sb
values, vectors = np.linalg.eig(S)
idxs = np.argsort(values)
W = vectors[:,[idxs[-i] for i in range(1,3)]]
X_r = np.matmul(X-mu, W)
X_r[:,1] = - X_r[:,1] #确保与sklearn得到的结果一致
X_r[:,0] = - X_r[:,0] #确保与sklearn得到的结果一致
plot_2d(X_r, y, target_names, 'My Imple. LDA')
return X_r
def my_pca(X, y, target_names):
n_components=2
# 去中心化
N = X.shape[0]
X_mean = np.mean(X, axis=0)
X_ = X - X_mean
# 构建协方差矩阵
XX = 1. / N * np.matmul(X_.T, X_)
# 求特征向量
values, vectors = np.linalg.eig(XX)
idxs = np.argsort(values)
W = vectors[:,[idxs[-i] for i in range(1,n_components+1)]]
X_r = np.matmul(X_, W)
X_r[:,1] = - X_r[:,1] #确保与sklearn得到的结果一致
plot_2d(X_r, y, target_names, 'My Imple. PCA')
return X_r
if __name__ == '__main__':
iris = datasets.load_iris()
X = iris.data
y = iris.target
target_names = iris.target_names
plt.figure()
X_r2 = my_pca(X, y, target_names)
X_r1 = my_lda(X, y, target_names)
X_r1 = sklearn_lda(X, y, target_names)
plt.show()
核技巧
之前的问题实质上是求解
S
b
w
k
=
λ
k
S
w
w
k
\mathbf{S}_b\mathbf{w}_k=\lambda _k\mathbf{S}_w\mathbf{w}_k
Sbwk=λkSwwk
的广义特征向量。首先将核化后的
S
b
\mathbf{S}_b
Sb和
S
w
\mathbf{S}_w
Sw写出
S
w
ϕ
=
1
N
∑
c
=
1
C
∑
i
=
1
N
1
[
y
i
=
c
]
(
ϕ
(
x
i
)
−
1
N
c
X
ϕ
1
N
×
1
(
c
)
)
(
ϕ
(
x
i
)
−
1
N
c
X
ϕ
1
N
×
1
(
c
)
)
T
=
1
N
∑
c
=
1
C
∑
i
=
1
N
1
[
y
i
=
c
]
(
ϕ
(
x
i
)
ϕ
(
x
i
)
T
−
X
ϕ
1
N
×
1
(
c
)
N
c
ϕ
(
x
i
)
T
−
ϕ
(
x
i
)
1
1
×
N
(
c
)
N
c
X
ϕ
T
+
X
ϕ
1
N
×
1
(
c
)
1
1
×
N
(
c
)
N
c
2
X
ϕ
T
)
=
1
N
(
X
ϕ
X
ϕ
T
−
X
ϕ
(
∑
c
=
1
C
1
N
×
1
(
c
)
1
1
×
N
(
c
)
N
c
)
X
ϕ
T
−
X
ϕ
(
∑
c
=
1
C
1
N
×
1
(
c
)
1
1
×
N
(
c
)
N
c
)
X
ϕ
T
+
X
ϕ
(
∑
c
=
1
C
1
N
×
1
(
c
)
1
1
×
N
(
c
)
N
c
)
X
ϕ
T
)
=
1
N
X
ϕ
(
I
−
∑
c
=
1
C
1
N
×
N
(
c
)
N
c
)
X
ϕ
T
\begin{aligned} \mathbf{S}_{w}^{\phi}&=\frac{1}{N}\sum_{c=1}^{\mathcal{C}}{\sum_{i=1}^N{1\left[ y_i=c \right] \left( \phi \left( \mathbf{x}_i \right) -\frac{1}{N_c}\mathbf{X}_{\phi}1_{N\times 1}^{\left( c \right)} \right)}}\left( \phi \left( \mathbf{x}_i \right) -\frac{1}{N_c}\mathbf{X}_{\phi}1_{N\times 1}^{\left( c \right)} \right) ^T\\ &=\frac{1}{N}\sum_{c=1}^{\mathcal{C}}{\sum_{i=1}^N{1\left[ y_i=c \right] \left( \phi \left( \mathbf{x}_i \right) \phi \left( \mathbf{x}_i \right) ^T-\mathbf{X}_{\phi}\frac{1_{N\times 1}^{\left( c \right)}}{N_c}\phi \left( \mathbf{x}_i \right) ^T-\phi \left( \mathbf{x}_i \right) \frac{1_{1\times N}^{\left( c \right)}}{N_c}\mathbf{X}_{\phi}^{T}+\mathbf{X}_{\phi}\frac{1_{N\times 1}^{\left( c \right)}1_{1\times N}^{\left( c \right)}}{N_{c}^{2}}\mathbf{X}_{\phi}^{T} \right)}}\\ &=\frac{1}{N}\left( \mathbf{X}_{\phi}\mathbf{X}_{\phi}^{T}-\mathbf{X}_{\phi}\left( \sum_{c=1}^{\mathcal{C}}{\frac{{\color{red} 1_{N\times 1}^{\left( c \right)}1_{1\times N}^{\left( c \right)}}}{N_c}} \right) \mathbf{X}_{\phi}^{T}-\mathbf{X}_{\phi}\left( \sum_{c=1}^{\mathcal{C}}{\frac{1_{N\times 1}^{\left( c \right)}1_{1\times N}^{\left( c \right)}}{N_c}} \right) \mathbf{X}_{\phi}^{T}+\mathbf{X}_{\phi}\left( \sum_{c=1}^{\mathcal{C}}{\frac{1_{N\times 1}^{\left( c \right)}1_{1\times N}^{\left( c \right)}}{N_c}} \right) \mathbf{X}_{\phi}^{T} \right)\\ &=\frac{1}{N}\mathbf{X}_{\phi}\left( \mathbb{I} -\sum_{c=1}^{\mathcal{C}}{\frac{{\color{red} 1_{N\times N}^{\left( c \right)}}}{N_c}} \right) \mathbf{X}_{\phi}^{T}\\ \end{aligned}
Swϕ=N1c=1∑Ci=1∑N1[yi=c](ϕ(xi)−Nc1Xϕ1N×1(c))(ϕ(xi)−Nc1Xϕ1N×1(c))T=N1c=1∑Ci=1∑N1[yi=c](ϕ(xi)ϕ(xi)T−XϕNc1N×1(c)ϕ(xi)T−ϕ(xi)Nc11×N(c)XϕT+XϕNc21N×1(c)11×N(c)XϕT)=N1(XϕXϕT−Xϕ(c=1∑CNc1N×1(c)11×N(c))XϕT−Xϕ(c=1∑CNc1N×1(c)11×N(c))XϕT+Xϕ(c=1∑CNc1N×1(c)11×N(c))XϕT)=N1Xϕ(I−c=1∑CNc1N×N(c))XϕT
S
b
ϕ
=
1
N
∑
c
=
1
C
N
c
(
1
N
c
X
ϕ
1
N
×
1
(
c
)
−
1
N
X
ϕ
1
N
×
1
)
(
1
N
c
X
ϕ
1
N
×
1
(
c
)
−
1
N
X
ϕ
1
N
×
1
)
T
=
1
N
(
X
ϕ
(
∑
c
=
1
C
1
N
×
1
(
c
)
1
1
×
N
(
c
)
N
c
)
X
ϕ
T
−
X
ϕ
(
1
N
×
1
N
∑
c
=
1
C
1
1
×
N
(
c
)
)
X
ϕ
T
−
X
ϕ
(
∑
c
=
1
C
1
N
×
1
(
c
)
1
1
×
N
N
)
X
ϕ
T
+
X
ϕ
(
1
N
×
1
1
1
×
N
∑
c
=
1
C
N
c
N
2
)
X
ϕ
T
)
=
1
N
X
ϕ
(
∑
c
=
1
C
1
N
×
1
(
c
)
1
1
×
N
(
c
)
N
c
−
1
N
×
1
1
1
×
N
N
−
1
N
×
1
1
1
×
N
N
+
1
N
×
1
1
1
×
N
N
)
X
ϕ
T
=
1
N
X
ϕ
(
∑
c
=
1
C
1
N
×
N
(
c
)
N
c
−
1
N
×
N
N
)
X
ϕ
T
\begin{aligned} \mathbf{S}_{b}^{\phi}&=\frac{1}{N}\sum_{c=1}^{\mathcal{C}}{N_c\left( \frac{1}{N_c}\mathbf{X}_{\phi}1_{N\times 1}^{\left( c \right)}-\frac{1}{N}\mathbf{X}_{\phi}1_{N\times 1} \right)}\left( \frac{1}{N_c}\mathbf{X}_{\phi}1_{N\times 1}^{\left( c \right)}-\frac{1}{N}\mathbf{X}_{\phi}1_{N\times 1} \right) ^T\\ &=\frac{1}{N}\left( \mathbf{X}_{\phi}\left( \sum_{c=1}^{\mathcal{C}}{\frac{1_{N\times 1}^{\left( c \right)}1_{1\times N}^{\left( c \right)}}{N_c}} \right) \mathbf{X}_{\phi}^{T}-\mathbf{X}_{\phi}\left( \frac{1_{N\times 1}}{N}\sum_{c=1}^{\mathcal{C}}{1_{1\times N}^{\left( c \right)}} \right) \mathbf{X}_{\phi}^{T}-\mathbf{X}_{\phi}\left( \sum_{c=1}^{\mathcal{C}}{1_{N\times 1}^{\left( c \right)}}\frac{1_{1\times N}}{N} \right) \mathbf{X}_{\phi}^{T}+\mathbf{X}_{\phi}\left( 1_{N\times 1}1_{1\times N}\sum_{c=1}^{\mathcal{C}}{\frac{N_c}{N^2}} \right) \mathbf{X}_{\phi}^{T} \right)\\ &=\frac{1}{N}\mathbf{X}_{\phi}\left( \sum_{c=1}^{\mathcal{C}}{\frac{1_{N\times 1}^{\left( c \right)}1_{1\times N}^{\left( c \right)}}{N_c}}-\frac{1_{N\times 1}1_{1\times N}}{N}-\frac{1_{N\times 1}1_{1\times N}}{N}+\frac{1_{N\times 1}1_{1\times N}}{N} \right) \mathbf{X}_{\phi}^{T}\\ &=\frac{1}{N}\mathbf{X}_{\phi}\left( \sum_{c=1}^{\mathcal{C}}{\frac{1_{N\times N}^{\left( c \right)}}{N_c}}-\frac{1_{N\times N}}{N} \right) \mathbf{X}_{\phi}^{T}\\ \end{aligned}
Sbϕ=N1c=1∑CNc(Nc1Xϕ1N×1(c)−N1Xϕ1N×1)(Nc1Xϕ1N×1(c)−N1Xϕ1N×1)T=N1(Xϕ(c=1∑CNc1N×1(c)11×N(c))XϕT−Xϕ(N1N×1c=1∑C11×N(c))XϕT−Xϕ(c=1∑C1N×1(c)N11×N)XϕT+Xϕ(1N×111×Nc=1∑CN2Nc)XϕT)=N1Xϕ(c=1∑CNc1N×1(c)11×N(c)−N1N×111×N−N1N×111×N+N1N×111×N)XϕT=N1Xϕ(c=1∑CNc1N×N(c)−N1N×N)XϕT
而根据核技巧有
w
k
=
X
ϕ
α
k
=
[
ϕ
(
x
1
)
,
ϕ
(
x
2
)
,
⋯
,
ϕ
(
x
N
)
]
[
α
1
k
α
2
k
⋮
α
N
k
]
\begin{aligned} \mathbf{w}_k&=\mathbf{X}_{\phi}\alpha _k \\ &=\left[ \phi \left( \mathbf{x}_1 \right) ,\phi \left( \mathbf{x}_2 \right) ,\cdots ,\phi \left( \mathbf{x}_N \right) \right] \left[ \begin{array}{c} \alpha _{1k}\\ \alpha _{2k}\\ \vdots\\ \alpha _{Nk}\\ \end{array} \right] \end{aligned}
wk=Xϕαk=[ϕ(x1),ϕ(x2),⋯,ϕ(xN)]⎣
⎡α1kα2k⋮αNk⎦
⎤
那么
1
N
X
ϕ
(
∑
c
=
1
C
1
N
×
N
(
c
)
N
c
−
1
N
×
N
N
)
X
ϕ
T
X
ϕ
α
k
=
λ
k
1
N
X
ϕ
(
I
−
∑
c
=
1
C
1
N
×
N
(
c
)
N
c
)
X
ϕ
T
X
ϕ
α
k
X
ϕ
T
X
ϕ
(
∑
c
=
1
C
1
N
×
N
(
c
)
N
c
−
1
N
×
N
N
)
X
ϕ
T
X
ϕ
α
k
=
λ
k
X
ϕ
T
X
ϕ
(
I
−
∑
c
=
1
C
1
N
×
N
(
c
)
N
c
)
X
ϕ
T
X
ϕ
α
k
K
(
∑
c
=
1
C
1
N
×
N
(
c
)
N
c
−
1
N
×
N
N
)
K
α
k
=
λ
k
K
(
I
−
∑
c
=
1
C
1
N
×
N
(
c
)
N
c
)
K
α
k
(
∑
c
=
1
C
1
N
×
N
(
c
)
N
c
−
1
N
×
N
N
)
K
α
k
=
λ
k
(
I
−
∑
c
=
1
C
1
N
×
N
(
c
)
N
c
)
K
α
k
\frac{1}{N}\mathbf{X}_{\phi}\left( \sum_{c=1}^{\mathcal{C}}{\frac{1_{N\times N}^{\left( c \right)}}{N_c}}-\frac{1_{N\times N}}{N} \right) \mathbf{X}_{\phi}^{T}\mathbf{X}_{\phi}\alpha _k=\lambda _k\frac{1}{N}\mathbf{X}_{\phi}\left( \mathbb{I} -\sum_{c=1}^{\mathcal{C}}{\frac{1{\color{black} _{N\times N}^{\left( c \right)}}}{N_c}} \right) \mathbf{X}_{\phi}^{T}\mathbf{X}_{\phi}\alpha _k \\ \mathbf{X}_{\phi}^{T}\mathbf{X}_{\phi}\left( \sum_{c=1}^{\mathcal{C}}{\frac{1_{N\times N}^{\left( c \right)}}{N_c}}-\frac{1_{N\times N}}{N} \right) \mathbf{X}_{\phi}^{T}\mathbf{X}_{\phi}\alpha _k=\lambda _k\mathbf{X}_{\phi}^{T}\mathbf{X}_{\phi}\left( \mathbb{I} -\sum_{c=1}^{\mathcal{C}}{\frac{1{\color{black} _{N\times N}^{\left( c \right)}}}{N_c}} \right) \mathbf{X}_{\phi}^{T}\mathbf{X}_{\phi}\alpha _k \\ \mathbf{K}\left( \sum_{c=1}^{\mathcal{C}}{\frac{1_{N\times N}^{\left( c \right)}}{N_c}}-\frac{1_{N\times N}}{N} \right) \mathbf{K}\alpha _k=\lambda _k\mathbf{K}\left( \mathbb{I} -\sum_{c=1}^{\mathcal{C}}{\frac{1{\color{black} _{N\times N}^{\left( c \right)}}}{N_c}} \right) \mathbf{K}\alpha _k \\ \left( \sum_{c=1}^{\mathcal{C}}{\frac{1_{N\times N}^{\left( c \right)}}{N_c}}-\frac{1_{N\times N}}{N} \right) \mathbf{K}\alpha _k=\lambda _k\left( \mathbb{I} -\sum_{c=1}^{\mathcal{C}}{\frac{1{\color{black} _{N\times N}^{\left( c \right)}}}{N_c}} \right) \mathbf{K}\alpha _k
N1Xϕ(c=1∑CNc1N×N(c)−N1N×N)XϕTXϕαk=λkN1Xϕ(I−c=1∑CNc1N×N(c))XϕTXϕαkXϕTXϕ(c=1∑CNc1N×N(c)−N1N×N)XϕTXϕαk=λkXϕTXϕ(I−c=1∑CNc1N×N(c))XϕTXϕαkK(c=1∑CNc1N×N(c)−N1N×N)Kαk=λkK(I−c=1∑CNc1N×N(c))Kαk(c=1∑CNc1N×N(c)−N1N×N)Kαk=λk(I−c=1∑CNc1N×N(c))Kαk
所以
α
k
\alpha_k
αk就是
[
(
I
−
∑
c
=
1
C
1
N
×
N
(
c
)
N
c
)
K
]
−
1
(
∑
c
=
1
C
1
N
×
N
(
c
)
N
c
−
1
N
×
N
N
)
K
\left[ \left( \mathbb{I} -\sum_{c=1}^{\mathcal{C}}{\frac{1{ _{N\times N}^{\left( c \right)}}}{N_c}} \right) \mathbf{K} \right] ^{-1}\left( \sum_{c=1}^{\mathcal{C}}{\frac{1_{N\times N}^{\left( c \right)}}{N_c}}-\frac{1_{N\times N}}{N} \right) \mathbf{K}
[(I−∑c=1CNc1N×N(c))K]−1(∑c=1CNc1N×N(c)−N1N×N)K的特征向量。对一个新的数据
x
n
e
w
\mathbf{x}_{new}
xnew,降维到
w
k
\mathbf{w}_k
wk的维度
w
k
T
(
ϕ
(
x
n
e
w
)
−
1
N
X
ϕ
1
N
×
1
)
=
α
k
T
X
ϕ
T
ϕ
(
x
n
e
w
)
−
1
N
α
k
T
X
ϕ
T
X
ϕ
1
N
×
1
=
α
k
T
K
(
⋅
,
x
n
e
w
)
−
1
N
α
k
T
K
1
N
×
1
\begin{aligned} {\color{red} \mathbf{w}_{k}^{T}}\left( \phi \left( \mathbf{x}_{new} \right) -\frac{1}{N}\mathbf{X}_{\phi}1_{N\times 1} \right) &={\color{red} \alpha _{k}^{T}\mathbf{X}_{\phi}^{T}}\phi \left( \mathbf{x}_{new} \right) -\frac{1}{N}{\color{red} \alpha _{k}^{T}\mathbf{X}_{\phi}^{T}}\mathbf{X}_{\phi}1_{N\times 1} \\ &=\alpha _{k}^{T}\mathbf{K}\left( \cdot ,\mathbf{x}_{new} \right) -\frac{1}{N}\alpha _{k}^{T}\mathbf{K}1_{N\times 1} \end{aligned}
wkT(ϕ(xnew)−N1Xϕ1N×1)=αkTXϕTϕ(xnew)−N1αkTXϕTXϕ1N×1=αkTK(⋅,xnew)−N1αkTK1N×1
from stat import S_IFBLK
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
def plot_2d(X_r, y, target_names, name):
if 'PCA' in name:
plt.subplot(1,3,1)
else:
if 'KLDA' in name:
plt.subplot(1,3,3)
else:
plt.subplot(1,3,2)
colors = ["navy", "turquoise", "darkorange"]
lw = 2
for color, i, target_name in zip(colors, [0, 1, 2], target_names):
plt.scatter(
X_r[y == i, 0], X_r[y == i, 1], color=color, alpha=0.8, lw=lw, label=target_name
)
plt.legend(loc="best", shadow=False, scatterpoints=1)
plt.xlabel(f"{name} of IRIS dataset")
def my_klda(X, y, target_names):
gamma = 10.0
# exp(-||x_i-x_j||^2*gamma)
N = X.shape[0]
K0 = np.sum(X**2, axis=1, keepdims=True)
K0 = np.exp(-(K0 + K0.T - 2 * np.matmul(X, X.T))*gamma)
Ic = 0
for i in range(3):
X_c = X[y==i,:]
N_c = X_c.shape[0]
lc = np.zeros([N, 1])
lc[y==i,:] = 1.0
Ic = Ic + lc @ lc.T / N_c
Sb_ = (Ic - np.ones([N, N]) / N) @ K0
Sw_ = (np.eye(N) - Ic) @ K0
S = np.linalg.inv(Sw_) @ Sb_
values, vectors = np.linalg.eig(S)
idxs = np.argsort(values)
alphas = vectors[:,[idxs[-i] for i in range(1,3)]]
X_r = alphas.T @ K0 - 1. / N * alphas.T @ K0 @ np.ones([N, 1])
X_r = X_r.T
plot_2d(X_r, y, target_names, 'My Imple. KLDA')
return X_r
def my_lda(X, y, target_names):
Sw = 0.0
Sb = 0.0
mu = np.mean(X, axis=0)
for i in range(3):
X_c = X[y==i,:]
mu_c = np.mean(X_c, axis=0)
N_c = X_c.shape[0]
Sw = Sw + (X_c-mu_c).T @ (X_c-mu_c)
Sb = Sb + N_c * np.reshape(mu_c - mu, [-1, 1]) @ np.reshape(mu_c - mu, [1, -1])
S = np.linalg.inv(Sw) @ Sb
values, vectors = np.linalg.eig(S)
idxs = np.argsort(values)
W = vectors[:,[idxs[-i] for i in range(1,3)]]
X_r = np.matmul(X-mu, W)
X_r[:,1] = - X_r[:,1] #确保与sklearn得到的结果一致
X_r[:,0] = - X_r[:,0] #确保与sklearn得到的结果一致
plot_2d(X_r, y, target_names, 'My Imple. LDA')
return X_r
def my_pca(X, y, target_names):
n_components=2
# 去中心化
N = X.shape[0]
X_mean = np.mean(X, axis=0)
X_ = X - X_mean
# 构建协方差矩阵
XX = 1. / N * np.matmul(X_.T, X_)
# 求特征向量
values, vectors = np.linalg.eig(XX)
idxs = np.argsort(values)
W = vectors[:,[idxs[-i] for i in range(1,n_components+1)]]
X_r = np.matmul(X_, W)
X_r[:,1] = - X_r[:,1] #确保与sklearn得到的结果一致
plot_2d(X_r, y, target_names, 'My Imple. PCA')
return X_r
if __name__ == '__main__':
iris = datasets.load_iris()
X = iris.data
y = iris.target
target_names = iris.target_names
plt.figure()
X_r2 = my_pca(X, y, target_names)
X_r1 = my_lda(X, y, target_names)
X_r1 = my_klda(X, y, target_names)
plt.show()