线性判别分析(Linear Discriminant Analysis, LDA)

主要思想

LDA将 D D D维特征 X = [ x 1 , x 2 , ⋯   , x N ] ∈ R D × N \mathbf{X}=[\mathbf{x}_1, \mathbf{x}_2, \cdots, \mathbf{x}_N]\in\mathbb{R}^{D\times N} X=[x1,x2,,xN]RD×N x i ∈ R D \mathbf{x}_i\in\mathbb{R}^{D} xiRD)映射到 d ( d ≪ D ) d(d\ll D) d(dD)维空间中( Z = [ z 1 , z 2 , ⋯   , z N ] = W T X ∈ R d × N \mathbf{Z}=[\mathbf{z}_1, \mathbf{z}_2, \cdots, \mathbf{z}_N]=\mathbf{W}^T\mathbf{X}\in\mathbb{R}^{d\times N} Z=[z1,z2,,zN]=WTXRd×N, W = [ w 1 , w 2 , ⋯   , w d ] ∈ R D × d \mathbf{W}=[\mathbf{w}_1,\mathbf{w}_2,\cdots,\mathbf{w}_d]\in\mathbb{R}^{D\times d} W=[w1,w2,,wd]RD×d w i ∈ R D \mathbf{w}_i\in\mathbb{R}^D wiRD),使得在降维后的空间中,尽量使同一类的数据聚集,不同类的数据尽可能分散,因此对于每个特征 x i \mathbf{x}_i xi还需要知道其对应的标签 y i ∈ { 1 , 2 , ⋯   , C } y_i\in\{1,2,\cdots,\mathcal{C}\} yi{1,2,,C}

推导方法

此处我们不考虑二分类的情况,直接考虑多分类的情况。根据降维思想,降维后的 Z \mathbf{Z} Z满足同类更紧凑,而异类更分散,那么衡量“紧凑”和“分散”可以定义类内散度矩阵和类间散度矩阵:
S w = ∑ c = 1 C N c N S w ( c ) = ∑ c = 1 C N c N 1 N c ∑ i = 1 N 1 [ y i = c ] ( x i − μ c ) ( x i − μ c ) T S b = 1 2 ∑ c 1 , c 2 = 1 C N c 1 N N c 2 N S b ( c 1 , c 2 ) = 1 2 ∑ c 1 , c 2 = 1 C N c 1 N N c 2 N ( μ c 1 − μ c 2 ) ( μ c 1 − μ c 2 ) T = 1 2 ∑ c 1 = 1 C N c 1 N ∑ c 2 = 1 C N c 2 N ( μ c 1 − μ c 2 ) ( μ c 1 − μ c 2 ) T = 1 2 ( ∑ c 1 = 1 C N c 1 N μ c 1 μ c 1 T − μ μ T − μ μ T + ∑ c 2 = 1 C N c 2 N μ c 2 μ c 2 T ) = ∑ c 1 = 1 C N c 1 N μ c 1 μ c 1 T − μ μ T = ∑ c 1 = 1 C N c 1 N ( μ c 1 μ c 1 T − μ μ T + 2 μ μ T − μ c 1 μ T − μ μ c 1 T ) = ∑ c 1 = 1 C N c 1 N ( μ c 1 μ c 1 T + μ μ T − μ c 1 μ T − μ μ c 1 T ) = ∑ c = 1 C N c N ( μ c − μ ) ( μ c − μ ) T \begin{aligned} \mathbf{S}_w&=\sum_{c=1}^{\mathcal{C}}{\frac{N_c}{N}}\mathbf{S}_{w}^{(c)}\\ &=\sum_{c=1}^{\mathcal{C}}{\frac{N_c}{N}}\frac{1}{N_c}\sum_{i=1}^N{1}[y_i=c](\mathbf{x}_i-\mu _c)(\mathbf{x}_i-\mu _c)^T\\ \mathbf{S}_b&=\frac{1}{2}\sum_{c_1,c_2=1}^{\mathcal{C}}{\frac{N_{c_1}}{N}\frac{N_{c_2}}{N}\mathbf{S}_{b}^{(c_1,c_2)}}\\ &=\frac{1}{2}\sum_{c_1,c_2=1}^{\mathcal{C}}{\frac{N_{c_1}}{N}\frac{N_{c_2}}{N}(\mu _{c_1}-\mu _{c_2})(\mu _{c_1}-\mu _{c_2})^T}\\ &=\frac{1}{2}\sum_{c_1=1}^{\mathcal{C}}{\frac{N_{c_1}}{N}\sum_{c_2=1}^{\mathcal{C}}{\frac{N_{c_2}}{N}}(\mu _{c_1}-\mu _{c_2})(\mu _{c_1}-\mu _{c_2})^T}\\ &=\frac{1}{2}\left( \sum_{c_1=1}^{\mathcal{C}}{\frac{N_{c_1}}{N}\mu _{c_1}\mu _{c_1}^{T}-\mu \mu ^T-\mu \mu ^T+\sum_{c_2=1}^{\mathcal{C}}{\frac{N_{c_2}}{N}}\mu _{c_2}\mu _{c_2}^{T}} \right)\\ &=\sum_{c_1=1}^{\mathcal{C}}{\frac{N_{c_1}}{N}\mu _{c_1}\mu _{c_1}^{T}-\mu \mu ^T}\\ &=\sum_{c_1=1}^{\mathcal{C}}{\frac{N_{c_1}}{N}(\mu _{c_1}\mu _{c_1}^{T}-\mu \mu ^T+{\color{red} 2\mu \mu ^T-\mu _{c_1}\mu ^T-\mu \mu _{c_1}^{T}})}\\ &=\sum_{c_1=1}^{\mathcal{C}}{\frac{N_{c_1}}{N}(\mu _{c_1}\mu _{c_1}^{T}+\mu \mu ^T-\mu _{c_1}\mu ^T-\mu \mu _{c_1}^{T})}\\ &=\sum_{c=1}^{\mathcal{C}}{\frac{N_{c}}{N}\left( \mu _{c}-\mu \right) \left( \mu _{c}-\mu \right) ^T}\\ \end{aligned} SwSb=c=1CNNcSw(c)=c=1CNNcNc1i=1N1[yi=c](xiμc)(xiμc)T=21c1,c2=1CNNc1NNc2Sb(c1,c2)=21c1,c2=1CNNc1NNc2(μc1μc2)(μc1μc2)T=21c1=1CNNc1c2=1CNNc2(μc1μc2)(μc1μc2)T=21(c1=1CNNc1μc1μc1TμμTμμT+c2=1CNNc2μc2μc2T)=c1=1CNNc1μc1μc1TμμT=c1=1CNNc1(μc1μc1TμμT+2μμTμc1μTμμc1T)=c1=1CNNc1(μc1μc1T+μμTμc1μTμμc1T)=c=1CNNc(μcμ)(μcμ)T
其中
N c = ∑ i = 1 N 1 [ y i = c ] μ c = 1 N c ∑ i = 1 N 1 [ y i = c ] x i μ = 1 N ∑ i = 1 N x i = ∑ c = 1 C N c N μ c \begin{aligned} N_c&=\sum_{i=1}^N1[y_i=c]\\ \mu_c&=\frac1{N_c}\sum_{i=1}^N{1[y_i=c]\mathbf{x}_i}\\ \mu&=\frac1{N}\sum_{i=1}^N{\mathbf{x}_i}\\ &=\sum_{c=1}^\mathcal{C}\frac{N_c}{N}\mu_c \end{aligned} Ncμcμ=i=1N1[yi=c]=Nc1i=1N1[yi=c]xi=N1i=1Nxi=c=1CNNcμc
以上的 S b \mathbf{S}_b Sb S w \mathbf{S}_w Sw计算的是 X \mathbf{X} X,而目标是找到 W \mathbf{W} W使得 Z = W T X \mathbf{Z}=\mathbf{W}^T\mathbf{X} Z=WTX在同类中更加紧凑,在异类中更分散。优化目标有很多种表示方法:

求和优化目标

a r g max ⁡ W t r a c e ( W T S b W ) t r a c e ( W T S w W ) ⇔ { a r g max ⁡ W t r a c e ( W T S b W ) s . t . W T S w W = I ⇔ { a r g max ⁡ W t r a c e ( V T S w − 1 / 2 S b S w − 1 / 2 V ) s . t . V T V = I V = S w 1 / 2 W \underset{\mathbf{W}}{\mathrm{arg}\max}\frac{trace\left( \mathbf{W}^T\mathbf{S}_b\mathbf{W} \right)}{trace\left( \mathbf{W}^T\mathbf{S}_w\mathbf{W} \right)} \\ \Leftrightarrow \left\{ \begin{array}{c} \underset{\mathbf{W}}{\mathrm{arg}\max}trace\left( \mathbf{W}^T\mathbf{S}_b\mathbf{W} \right)\\ s.t.\mathbf{W}^T\mathbf{S}_w\mathbf{W}=\mathbb{I}\\ \end{array} \right. \\ \Leftrightarrow \left\{ \begin{array}{c} \underset{\mathbf{W}}{\mathrm{arg}\max}trace\left( \mathbf{V}^T\mathbf{S}_{w}^{-1/2}\mathbf{S}_b\mathbf{S}_{w}^{-1/2}\mathbf{V} \right)\\ s.t.\mathbf{V}^T\mathbf{V}=\mathbb{I}\\ \mathbf{V}=\mathbf{S}_{w}^{1/2}\mathbf{W}\\ \end{array} \right. Wargmaxtrace(WTSwW)trace(WTSbW){Wargmaxtrace(WTSbW)s.t.WTSwW=I Wargmaxtrace(VTSw1/2SbSw1/2V)s.t.VTV=IV=Sw1/2W
那么以上的求解与PCA所用的方法一致了,然后再 W = S w − 1 / 2 V \mathbf{W}=\mathbf{S}_{w}^{-1/2}\mathbf{V} W=Sw1/2V就OK啦!需要说明的是,由于 r a n k ( S b ) ≤ C − 1 rank(\mathbf{S}_b)\leq \mathcal{C}-1 rank(Sb)C1,所以最终求得的 V \mathbf{V} V最多就 C − 1 \mathcal{C}-1 C1个特征向量。

求积优化目标

a r g max ⁡ W ∏ i = 1 d w i T S b w i w i T S w w i ⇔ { a r g max ⁡ W ∏ i = 1 d v i T S w − 1 / 2 S b S w − 1 / 2 v i s . t . V T V = I V = S w 1 / 2 W \underset{\mathbf{W}}{\mathrm{arg}\max}\prod_{i=1}^d{\frac{\mathbf{w}_{i}^{T}\mathbf{S}_b\mathbf{w}_i}{\mathbf{w}_{i}^{T}\mathbf{S}_w\mathbf{w}_i}} \\ \Leftrightarrow \left\{ \begin{array}{c} \underset{\mathbf{W}}{\mathrm{arg}\max}\prod_{i=1}^d{\mathbf{v}_{i}^{T}\mathbf{S}_{w}^{-1/2}\mathbf{S}_b\mathbf{S}_{w}^{-1/2}\mathbf{v}_i}\\ s.t. \mathbf{V}^T\mathbf{V}=\mathbb{I}\\ \mathbf{V}=\mathbf{S}_{w}^{1/2}\mathbf{W}\\ \end{array} \right. Wargmaxi=1dwiTSwwiwiTSbwi Wargmaxi=1dviTSw1/2SbSw1/2vis.t.VTV=IV=Sw1/2W
更简单的一种是 S w − 1 / 2 S b S w − 1 / 2 \mathbf{S}_{w}^{-1/2}\mathbf{S}_b\mathbf{S}_{w}^{-1/2} Sw1/2SbSw1/2求出的特征向量 v \mathbf{v} v不用左乘 S w − 1 / 2 \mathbf{S}_{w}^{-1/2} Sw1/2得到 w \mathbf{w} w,可以直接通过求解 S w − 1 S b \mathbf{S}_{w}^{-1}\mathbf{S}_b Sw1Sb得到 w \mathbf{w} w

from stat import S_IFBLK
import matplotlib.pyplot as plt
import numpy as np

from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

def plot_2d(X_r, y, target_names, name):
    if 'PCA' in name:
        plt.subplot(1,3,1)
    else:
        if 'Sklearn' in name:
            plt.subplot(1,3,2)
        else:
            plt.subplot(1,3,3)
    
    colors = ["navy", "turquoise", "darkorange"]
    lw = 2

    for color, i, target_name in zip(colors, [0, 1, 2], target_names):
        plt.scatter(
            X_r[y == i, 0], X_r[y == i, 1], color=color, alpha=0.8, lw=lw, label=target_name
        )
    plt.legend(loc="best", shadow=False, scatterpoints=1)
    plt.xlabel(f"{name} of IRIS dataset")

def sklearn_lda(X, y, target_names):
    lda = LinearDiscriminantAnalysis(n_components=2)
    X_r2 = lda.fit(X, y).transform(X)
    plot_2d(X_r2, y, target_names, 'Sklearn LDA')
    return X_r2

def my_lda(X, y, target_names):
    Sw = 0.0
    Sb = 0.0
    mu = np.mean(X, axis=0)
    for i in range(3):
        X_c  = X[y==i,:]
        mu_c = np.mean(X_c, axis=0)
        N_c  = X_c.shape[0]

        Sw = Sw + (X_c-mu_c).T @ (X_c-mu_c)

        Sb = Sb + N_c * np.reshape(mu_c - mu, [-1, 1]) @ np.reshape(mu_c - mu, [1, -1])
    S = np.linalg.inv(Sw) @ Sb

    values, vectors = np.linalg.eig(S)
    idxs = np.argsort(values)
    
    W = vectors[:,[idxs[-i] for i in range(1,3)]]
    
    X_r = np.matmul(X-mu, W)
    X_r[:,1] = - X_r[:,1] #确保与sklearn得到的结果一致
    X_r[:,0] = - X_r[:,0] #确保与sklearn得到的结果一致

    plot_2d(X_r, y, target_names, 'My Imple. LDA')

    return X_r

def my_pca(X, y, target_names):
    n_components=2
    # 去中心化
    N = X.shape[0]
    X_mean = np.mean(X, axis=0)
    X_ = X - X_mean
    # 构建协方差矩阵
    XX = 1. / N * np.matmul(X_.T, X_)
    # 求特征向量
    values, vectors = np.linalg.eig(XX)
    idxs = np.argsort(values)
    
    W = vectors[:,[idxs[-i] for i in range(1,n_components+1)]]
    
    X_r = np.matmul(X_, W)

    X_r[:,1] = - X_r[:,1] #确保与sklearn得到的结果一致

    plot_2d(X_r, y, target_names, 'My Imple. PCA')

    return X_r



if __name__ == '__main__':
    iris = datasets.load_iris()

    X = iris.data
    y = iris.target
    target_names = iris.target_names

    plt.figure()

    

    X_r2 = my_pca(X, y, target_names)
    X_r1 = my_lda(X, y, target_names)
    X_r1 = sklearn_lda(X, y, target_names)

    plt.show()


在这里插入图片描述

核技巧

之前的问题实质上是求解
S b w k = λ k S w w k \mathbf{S}_b\mathbf{w}_k=\lambda _k\mathbf{S}_w\mathbf{w}_k Sbwk=λkSwwk
的广义特征向量。首先将核化后的 S b \mathbf{S}_b Sb S w \mathbf{S}_w Sw写出
S w ϕ = 1 N ∑ c = 1 C ∑ i = 1 N 1 [ y i = c ] ( ϕ ( x i ) − 1 N c X ϕ 1 N × 1 ( c ) ) ( ϕ ( x i ) − 1 N c X ϕ 1 N × 1 ( c ) ) T = 1 N ∑ c = 1 C ∑ i = 1 N 1 [ y i = c ] ( ϕ ( x i ) ϕ ( x i ) T − X ϕ 1 N × 1 ( c ) N c ϕ ( x i ) T − ϕ ( x i ) 1 1 × N ( c ) N c X ϕ T + X ϕ 1 N × 1 ( c ) 1 1 × N ( c ) N c 2 X ϕ T ) = 1 N ( X ϕ X ϕ T − X ϕ ( ∑ c = 1 C 1 N × 1 ( c ) 1 1 × N ( c ) N c ) X ϕ T − X ϕ ( ∑ c = 1 C 1 N × 1 ( c ) 1 1 × N ( c ) N c ) X ϕ T + X ϕ ( ∑ c = 1 C 1 N × 1 ( c ) 1 1 × N ( c ) N c ) X ϕ T ) = 1 N X ϕ ( I − ∑ c = 1 C 1 N × N ( c ) N c ) X ϕ T \begin{aligned} \mathbf{S}_{w}^{\phi}&=\frac{1}{N}\sum_{c=1}^{\mathcal{C}}{\sum_{i=1}^N{1\left[ y_i=c \right] \left( \phi \left( \mathbf{x}_i \right) -\frac{1}{N_c}\mathbf{X}_{\phi}1_{N\times 1}^{\left( c \right)} \right)}}\left( \phi \left( \mathbf{x}_i \right) -\frac{1}{N_c}\mathbf{X}_{\phi}1_{N\times 1}^{\left( c \right)} \right) ^T\\ &=\frac{1}{N}\sum_{c=1}^{\mathcal{C}}{\sum_{i=1}^N{1\left[ y_i=c \right] \left( \phi \left( \mathbf{x}_i \right) \phi \left( \mathbf{x}_i \right) ^T-\mathbf{X}_{\phi}\frac{1_{N\times 1}^{\left( c \right)}}{N_c}\phi \left( \mathbf{x}_i \right) ^T-\phi \left( \mathbf{x}_i \right) \frac{1_{1\times N}^{\left( c \right)}}{N_c}\mathbf{X}_{\phi}^{T}+\mathbf{X}_{\phi}\frac{1_{N\times 1}^{\left( c \right)}1_{1\times N}^{\left( c \right)}}{N_{c}^{2}}\mathbf{X}_{\phi}^{T} \right)}}\\ &=\frac{1}{N}\left( \mathbf{X}_{\phi}\mathbf{X}_{\phi}^{T}-\mathbf{X}_{\phi}\left( \sum_{c=1}^{\mathcal{C}}{\frac{{\color{red} 1_{N\times 1}^{\left( c \right)}1_{1\times N}^{\left( c \right)}}}{N_c}} \right) \mathbf{X}_{\phi}^{T}-\mathbf{X}_{\phi}\left( \sum_{c=1}^{\mathcal{C}}{\frac{1_{N\times 1}^{\left( c \right)}1_{1\times N}^{\left( c \right)}}{N_c}} \right) \mathbf{X}_{\phi}^{T}+\mathbf{X}_{\phi}\left( \sum_{c=1}^{\mathcal{C}}{\frac{1_{N\times 1}^{\left( c \right)}1_{1\times N}^{\left( c \right)}}{N_c}} \right) \mathbf{X}_{\phi}^{T} \right)\\ &=\frac{1}{N}\mathbf{X}_{\phi}\left( \mathbb{I} -\sum_{c=1}^{\mathcal{C}}{\frac{{\color{red} 1_{N\times N}^{\left( c \right)}}}{N_c}} \right) \mathbf{X}_{\phi}^{T}\\ \end{aligned} Swϕ=N1c=1Ci=1N1[yi=c](ϕ(xi)Nc1Xϕ1N×1(c))(ϕ(xi)Nc1Xϕ1N×1(c))T=N1c=1Ci=1N1[yi=c](ϕ(xi)ϕ(xi)TXϕNc1N×1(c)ϕ(xi)Tϕ(xi)Nc11×N(c)XϕT+XϕNc21N×1(c)11×N(c)XϕT)=N1(XϕXϕTXϕ(c=1CNc1N×1(c)11×N(c))XϕTXϕ(c=1CNc1N×1(c)11×N(c))XϕT+Xϕ(c=1CNc1N×1(c)11×N(c))XϕT)=N1Xϕ(Ic=1CNc1N×N(c))XϕT
S b ϕ = 1 N ∑ c = 1 C N c ( 1 N c X ϕ 1 N × 1 ( c ) − 1 N X ϕ 1 N × 1 ) ( 1 N c X ϕ 1 N × 1 ( c ) − 1 N X ϕ 1 N × 1 ) T = 1 N ( X ϕ ( ∑ c = 1 C 1 N × 1 ( c ) 1 1 × N ( c ) N c ) X ϕ T − X ϕ ( 1 N × 1 N ∑ c = 1 C 1 1 × N ( c ) ) X ϕ T − X ϕ ( ∑ c = 1 C 1 N × 1 ( c ) 1 1 × N N ) X ϕ T + X ϕ ( 1 N × 1 1 1 × N ∑ c = 1 C N c N 2 ) X ϕ T ) = 1 N X ϕ ( ∑ c = 1 C 1 N × 1 ( c ) 1 1 × N ( c ) N c − 1 N × 1 1 1 × N N − 1 N × 1 1 1 × N N + 1 N × 1 1 1 × N N ) X ϕ T = 1 N X ϕ ( ∑ c = 1 C 1 N × N ( c ) N c − 1 N × N N ) X ϕ T \begin{aligned} \mathbf{S}_{b}^{\phi}&=\frac{1}{N}\sum_{c=1}^{\mathcal{C}}{N_c\left( \frac{1}{N_c}\mathbf{X}_{\phi}1_{N\times 1}^{\left( c \right)}-\frac{1}{N}\mathbf{X}_{\phi}1_{N\times 1} \right)}\left( \frac{1}{N_c}\mathbf{X}_{\phi}1_{N\times 1}^{\left( c \right)}-\frac{1}{N}\mathbf{X}_{\phi}1_{N\times 1} \right) ^T\\ &=\frac{1}{N}\left( \mathbf{X}_{\phi}\left( \sum_{c=1}^{\mathcal{C}}{\frac{1_{N\times 1}^{\left( c \right)}1_{1\times N}^{\left( c \right)}}{N_c}} \right) \mathbf{X}_{\phi}^{T}-\mathbf{X}_{\phi}\left( \frac{1_{N\times 1}}{N}\sum_{c=1}^{\mathcal{C}}{1_{1\times N}^{\left( c \right)}} \right) \mathbf{X}_{\phi}^{T}-\mathbf{X}_{\phi}\left( \sum_{c=1}^{\mathcal{C}}{1_{N\times 1}^{\left( c \right)}}\frac{1_{1\times N}}{N} \right) \mathbf{X}_{\phi}^{T}+\mathbf{X}_{\phi}\left( 1_{N\times 1}1_{1\times N}\sum_{c=1}^{\mathcal{C}}{\frac{N_c}{N^2}} \right) \mathbf{X}_{\phi}^{T} \right)\\ &=\frac{1}{N}\mathbf{X}_{\phi}\left( \sum_{c=1}^{\mathcal{C}}{\frac{1_{N\times 1}^{\left( c \right)}1_{1\times N}^{\left( c \right)}}{N_c}}-\frac{1_{N\times 1}1_{1\times N}}{N}-\frac{1_{N\times 1}1_{1\times N}}{N}+\frac{1_{N\times 1}1_{1\times N}}{N} \right) \mathbf{X}_{\phi}^{T}\\ &=\frac{1}{N}\mathbf{X}_{\phi}\left( \sum_{c=1}^{\mathcal{C}}{\frac{1_{N\times N}^{\left( c \right)}}{N_c}}-\frac{1_{N\times N}}{N} \right) \mathbf{X}_{\phi}^{T}\\ \end{aligned} Sbϕ=N1c=1CNc(Nc1Xϕ1N×1(c)N1Xϕ1N×1)(Nc1Xϕ1N×1(c)N1Xϕ1N×1)T=N1(Xϕ(c=1CNc1N×1(c)11×N(c))XϕTXϕ(N1N×1c=1C11×N(c))XϕTXϕ(c=1C1N×1(c)N11×N)XϕT+Xϕ(1N×111×Nc=1CN2Nc)XϕT)=N1Xϕ(c=1CNc1N×1(c)11×N(c)N1N×111×NN1N×111×N+N1N×111×N)XϕT=N1Xϕ(c=1CNc1N×N(c)N1N×N)XϕT
而根据核技巧有
w k = X ϕ α k = [ ϕ ( x 1 ) , ϕ ( x 2 ) , ⋯   , ϕ ( x N ) ] [ α 1 k α 2 k ⋮ α N k ] \begin{aligned} \mathbf{w}_k&=\mathbf{X}_{\phi}\alpha _k \\ &=\left[ \phi \left( \mathbf{x}_1 \right) ,\phi \left( \mathbf{x}_2 \right) ,\cdots ,\phi \left( \mathbf{x}_N \right) \right] \left[ \begin{array}{c} \alpha _{1k}\\ \alpha _{2k}\\ \vdots\\ \alpha _{Nk}\\ \end{array} \right] \end{aligned} wk=Xϕαk=[ϕ(x1),ϕ(x2),,ϕ(xN)] α1kα2kαNk
那么
1 N X ϕ ( ∑ c = 1 C 1 N × N ( c ) N c − 1 N × N N ) X ϕ T X ϕ α k = λ k 1 N X ϕ ( I − ∑ c = 1 C 1 N × N ( c ) N c ) X ϕ T X ϕ α k X ϕ T X ϕ ( ∑ c = 1 C 1 N × N ( c ) N c − 1 N × N N ) X ϕ T X ϕ α k = λ k X ϕ T X ϕ ( I − ∑ c = 1 C 1 N × N ( c ) N c ) X ϕ T X ϕ α k K ( ∑ c = 1 C 1 N × N ( c ) N c − 1 N × N N ) K α k = λ k K ( I − ∑ c = 1 C 1 N × N ( c ) N c ) K α k ( ∑ c = 1 C 1 N × N ( c ) N c − 1 N × N N ) K α k = λ k ( I − ∑ c = 1 C 1 N × N ( c ) N c ) K α k \frac{1}{N}\mathbf{X}_{\phi}\left( \sum_{c=1}^{\mathcal{C}}{\frac{1_{N\times N}^{\left( c \right)}}{N_c}}-\frac{1_{N\times N}}{N} \right) \mathbf{X}_{\phi}^{T}\mathbf{X}_{\phi}\alpha _k=\lambda _k\frac{1}{N}\mathbf{X}_{\phi}\left( \mathbb{I} -\sum_{c=1}^{\mathcal{C}}{\frac{1{\color{black} _{N\times N}^{\left( c \right)}}}{N_c}} \right) \mathbf{X}_{\phi}^{T}\mathbf{X}_{\phi}\alpha _k \\ \mathbf{X}_{\phi}^{T}\mathbf{X}_{\phi}\left( \sum_{c=1}^{\mathcal{C}}{\frac{1_{N\times N}^{\left( c \right)}}{N_c}}-\frac{1_{N\times N}}{N} \right) \mathbf{X}_{\phi}^{T}\mathbf{X}_{\phi}\alpha _k=\lambda _k\mathbf{X}_{\phi}^{T}\mathbf{X}_{\phi}\left( \mathbb{I} -\sum_{c=1}^{\mathcal{C}}{\frac{1{\color{black} _{N\times N}^{\left( c \right)}}}{N_c}} \right) \mathbf{X}_{\phi}^{T}\mathbf{X}_{\phi}\alpha _k \\ \mathbf{K}\left( \sum_{c=1}^{\mathcal{C}}{\frac{1_{N\times N}^{\left( c \right)}}{N_c}}-\frac{1_{N\times N}}{N} \right) \mathbf{K}\alpha _k=\lambda _k\mathbf{K}\left( \mathbb{I} -\sum_{c=1}^{\mathcal{C}}{\frac{1{\color{black} _{N\times N}^{\left( c \right)}}}{N_c}} \right) \mathbf{K}\alpha _k \\ \left( \sum_{c=1}^{\mathcal{C}}{\frac{1_{N\times N}^{\left( c \right)}}{N_c}}-\frac{1_{N\times N}}{N} \right) \mathbf{K}\alpha _k=\lambda _k\left( \mathbb{I} -\sum_{c=1}^{\mathcal{C}}{\frac{1{\color{black} _{N\times N}^{\left( c \right)}}}{N_c}} \right) \mathbf{K}\alpha _k N1Xϕ(c=1CNc1N×N(c)N1N×N)XϕTXϕαk=λkN1Xϕ(Ic=1CNc1N×N(c))XϕTXϕαkXϕTXϕ(c=1CNc1N×N(c)N1N×N)XϕTXϕαk=λkXϕTXϕ(Ic=1CNc1N×N(c))XϕTXϕαkK(c=1CNc1N×N(c)N1N×N)Kαk=λkK(Ic=1CNc1N×N(c))Kαk(c=1CNc1N×N(c)N1N×N)Kαk=λk(Ic=1CNc1N×N(c))Kαk
所以 α k \alpha_k αk就是 [ ( I − ∑ c = 1 C 1 N × N ( c ) N c ) K ] − 1 ( ∑ c = 1 C 1 N × N ( c ) N c − 1 N × N N ) K \left[ \left( \mathbb{I} -\sum_{c=1}^{\mathcal{C}}{\frac{1{ _{N\times N}^{\left( c \right)}}}{N_c}} \right) \mathbf{K} \right] ^{-1}\left( \sum_{c=1}^{\mathcal{C}}{\frac{1_{N\times N}^{\left( c \right)}}{N_c}}-\frac{1_{N\times N}}{N} \right) \mathbf{K} [(Ic=1CNc1N×N(c))K]1(c=1CNc1N×N(c)N1N×N)K的特征向量。对一个新的数据 x n e w \mathbf{x}_{new} xnew,降维到 w k \mathbf{w}_k wk的维度
w k T ( ϕ ( x n e w ) − 1 N X ϕ 1 N × 1 ) = α k T X ϕ T ϕ ( x n e w ) − 1 N α k T X ϕ T X ϕ 1 N × 1 = α k T K ( ⋅ , x n e w ) − 1 N α k T K 1 N × 1 \begin{aligned} {\color{red} \mathbf{w}_{k}^{T}}\left( \phi \left( \mathbf{x}_{new} \right) -\frac{1}{N}\mathbf{X}_{\phi}1_{N\times 1} \right) &={\color{red} \alpha _{k}^{T}\mathbf{X}_{\phi}^{T}}\phi \left( \mathbf{x}_{new} \right) -\frac{1}{N}{\color{red} \alpha _{k}^{T}\mathbf{X}_{\phi}^{T}}\mathbf{X}_{\phi}1_{N\times 1} \\ &=\alpha _{k}^{T}\mathbf{K}\left( \cdot ,\mathbf{x}_{new} \right) -\frac{1}{N}\alpha _{k}^{T}\mathbf{K}1_{N\times 1} \end{aligned} wkT(ϕ(xnew)N1Xϕ1N×1)=αkTXϕTϕ(xnew)N1αkTXϕTXϕ1N×1=αkTK(,xnew)N1αkTK1N×1

from stat import S_IFBLK
import matplotlib.pyplot as plt
import numpy as np

from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

def plot_2d(X_r, y, target_names, name):
    if 'PCA' in name:
        plt.subplot(1,3,1)
    else:
        if 'KLDA' in name:
            plt.subplot(1,3,3)
        else:
            plt.subplot(1,3,2)
    
    colors = ["navy", "turquoise", "darkorange"]
    lw = 2

    for color, i, target_name in zip(colors, [0, 1, 2], target_names):
        plt.scatter(
            X_r[y == i, 0], X_r[y == i, 1], color=color, alpha=0.8, lw=lw, label=target_name
        )
    plt.legend(loc="best", shadow=False, scatterpoints=1)
    plt.xlabel(f"{name} of IRIS dataset")

def my_klda(X, y, target_names):
    gamma = 10.0
    # exp(-||x_i-x_j||^2*gamma)
    N = X.shape[0]
    K0 = np.sum(X**2, axis=1, keepdims=True)
    K0 = np.exp(-(K0 + K0.T - 2 * np.matmul(X, X.T))*gamma)

    Ic = 0
    for i in range(3):
        X_c  = X[y==i,:]
        N_c  = X_c.shape[0]
        lc = np.zeros([N, 1])
        lc[y==i,:] = 1.0
        Ic = Ic + lc @ lc.T / N_c
    
    Sb_ = (Ic - np.ones([N, N]) / N) @ K0
    Sw_ = (np.eye(N) - Ic) @ K0

    S = np.linalg.inv(Sw_) @ Sb_

    values, vectors = np.linalg.eig(S)
    idxs = np.argsort(values)
    
    alphas = vectors[:,[idxs[-i] for i in range(1,3)]]
    
    X_r = alphas.T @ K0 - 1. / N * alphas.T @ K0 @ np.ones([N, 1])
    X_r = X_r.T

    plot_2d(X_r, y, target_names, 'My Imple. KLDA')

    return X_r
    

    


def my_lda(X, y, target_names):
    Sw = 0.0
    Sb = 0.0
    mu = np.mean(X, axis=0)
    for i in range(3):
        X_c  = X[y==i,:]
        mu_c = np.mean(X_c, axis=0)
        N_c  = X_c.shape[0]

        Sw = Sw + (X_c-mu_c).T @ (X_c-mu_c)

        Sb = Sb + N_c * np.reshape(mu_c - mu, [-1, 1]) @ np.reshape(mu_c - mu, [1, -1])
    S = np.linalg.inv(Sw) @ Sb

    values, vectors = np.linalg.eig(S)
    idxs = np.argsort(values)
    
    W = vectors[:,[idxs[-i] for i in range(1,3)]]
    
    X_r = np.matmul(X-mu, W)
    X_r[:,1] = - X_r[:,1] #确保与sklearn得到的结果一致
    X_r[:,0] = - X_r[:,0] #确保与sklearn得到的结果一致

    plot_2d(X_r, y, target_names, 'My Imple. LDA')

    return X_r

def my_pca(X, y, target_names):
    n_components=2
    # 去中心化
    N = X.shape[0]
    X_mean = np.mean(X, axis=0)
    X_ = X - X_mean
    # 构建协方差矩阵
    XX = 1. / N * np.matmul(X_.T, X_)
    # 求特征向量
    values, vectors = np.linalg.eig(XX)
    idxs = np.argsort(values)
    
    W = vectors[:,[idxs[-i] for i in range(1,n_components+1)]]
    
    X_r = np.matmul(X_, W)

    X_r[:,1] = - X_r[:,1] #确保与sklearn得到的结果一致

    plot_2d(X_r, y, target_names, 'My Imple. PCA')

    return X_r



if __name__ == '__main__':
    iris = datasets.load_iris()

    X = iris.data
    y = iris.target
    target_names = iris.target_names

    plt.figure()

    

    X_r2 = my_pca(X, y, target_names)
    X_r1 = my_lda(X, y, target_names)
    X_r1 = my_klda(X, y, target_names)

    plt.show()

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: fulllda = lineardiscriminantanalysis() 是指创建一个线性判别分析Linear Discriminant AnalysisLDA)的对象 fulllda线性判别分析是一种常见的监督学习方法,广泛应用于分类问题中。它的主要思想是将不同类别的样本在低维空间中分离开来,使得同一类别的样本距离尽可能近,不同类别的样本距离尽可能远。 fulllda = lineardiscriminantanalysis() 中的 lineardiscriminantanalysis() 是 Python 中 Scikit-learn 库中 LDA 的一个实现,它提供了一系列方法和属性,方便用户对 LDA 对象进行模型训练、预测和评估操作。 其中,fulllda 表示创建的线性判别分析对象的名称,在后续的程序中可以通过该名称直接调用该对象,进行分类任务的操作。 ### 回答2: fulllda = lineardiscriminantanalysis() 是Python编程语言中的一行代码,它是用于创建线性判别分析Linear Discriminant AnalysisLDA)对象的语句。LDA是一种常用的多变量统计分析方法,它是一个有监督的线性降维技术,用于将高维数据降维到低维空间中,以便更好地进行分类或聚类等任务。 这个函数的完整名称是 sklearn.discriminant_analysis.LinearDiscriminantAnalysis(),它是Python第三方库scikit-learn中的一个函数。在使用这个函数时,需要先导入所需的库:from sklearn.discriminant_analysis import LinearDiscriminantAnalysis。使用LDA的主要步骤包括载入数据,实例化模型对象,拟合模型,以及使用模型进行预测等。 fulllda = lineardiscriminantanalysis() 创建的LDA对象可以用于数据处理和分析的各个阶段,包括数据预处理、特征提取、模型拟合、结果预测等。在建模过程中,可以通过调整LDA的参数,如正则化参数和降维维度等,来获得更好的效果。LDA作为一种强大而灵活的有监督学习算法,被广泛应用于机器学习、数据挖掘和图像处理等领域。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值