基于回归学习及流形结构保持的无监督特征选择

项目地址

一、问题引入

特征选择:

  • 简单来说就是对大数据进行一个降维,减少计算量,比如从一张图片中只提取部分像素作为特征进行分类任务;
  • 选出最具重要性、代表性、低冗余性的特征。

二、数学建模

  • 定义
    x i ∈ R d × 1 D = [ s u m ( S 1 ) 0 ⋯ 0 0 s u m ( S 2 ) ⋯ 0 ⋮ ⋮ ⋮ 0 0 ⋯ s u m ( S m ) ] X = [ x 1 T x 2 T ⋮ x m T ] ∈ R m × d y i ∈ R c × 1 Y = [ y 1 T y 2 T ⋮ y m T ] ∈ R m × c S = [ S i j ] m × m S i j = e x p { − ∣ ∣ x i − x j ∣ ∣ 2 2 2 σ 2 } x_i\in R^{d\times1} \\ D=\begin{bmatrix}sum(S_1) & 0 & \cdots & 0 \\ 0 & sum(S_2) & \cdots & 0 \\ \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & sum(S_m) \end{bmatrix} \\ X=\begin{bmatrix}x_1^T\\x_2^T\\\vdots\\x_m^T\end{bmatrix}\in R^{m\times d} \\ y_i\in R^{c\times 1} \\ Y=\begin{bmatrix}y_1^T\\y_2^T\\\vdots\\y_m^T\end{bmatrix}\in R^{m\times c} \\ S=[S_{ij}]_{m\times m} \\ S_{ij}=exp\left\{-\frac{||x_i-x_j||_2^2}{2\sigma^2}\right\} xiRd×1D=sum(S1)000sum(S2)000sum(Sm)X=x1Tx2TxmTRm×dyiRc×1Y=y1Ty2TymTRm×cS=[Sij]m×mSij=exp{2σ2xixj22}

  • 基础
    ∣ ∣ X W − Y ∣ ∣ F 2 = t r { ( X W − Y ) T ( X W − Y ) } = t r { W T X T X W } − 2 t r { Y T X W } + t r { Y T Y } ||XW-Y||_F^2=tr\left\{(XW-Y)^T(XW-Y)\right\}=tr\left\{W^TX^TXW\right\}-2tr\left\{Y^TXW\right\}+tr\left\{Y^TY\right\} XWYF2=tr{(XWY)T(XWY)}=tr{WTXTXW}2tr{YTXW}+tr{YTY}

  • 误差项
    ∑ i = 1 m ∣ ∣ W T x i − y i ∣ ∣ 2 2 = ∣ ∣ X W − Y ∣ ∣ F 2 = t r ( X W − Y ) T ( X W − Y ) \sum_{i=1}^m||W^Tx_i-y_i||_2^2=||XW-Y||_F^2=tr(XW-Y)^T(XW-Y) i=1mWTxiyi22=XWYF2=tr(XWY)T(XWY)

  • 稀疏正则项
    ∣ ∣ W ∣ ∣ 2 , 0 → ∣ ∣ W ∣ ∣ 2 , 1 ||W||_{2,0}\rightarrow||W||_{2,1} W2,0W2,1

  • 流形结构保持的约束
    1 2 ∑ i = 1 m ∑ j = 1 m ∣ ∣ W T x i − W T x j ∣ ∣ x 2 S i j = t r ( W T X L X W ) L = D − S \frac{1}{2}\sum_{i=1}^m\sum_{j=1}^m||W^Tx_i-W^Tx_j||_x^2S_{ij}=tr(W^TXLXW) \\ L=D-S 21i=1mj=1mWTxiWTxjx2Sij=tr(WTXLXW)L=DS

  • 最终的目标
    m i n   t r { ∣ ∣ X W − Y ∣ ∣ F 2 } + α   ∣ ∣ W ∣ ∣ 2 , 1 + β   t r { W T X L X W } min~tr\left\{||XW-Y||_F^2\right\}+\alpha~||W||_{2,1}+\beta~tr\left\{W^TXLXW\right\} min tr{XWYF2}+α W2,1+β tr{WTXLXW}

  • 待求参数
    W ∈ R d × c Y ∈ R m × c W\in R^{d\times c} \\ Y\in R^{m\times c} WRd×cYRm×c

三、模型优化(轮换迭代)

  1. 固定 W W W ,求 Y Y Y
    L 1 = m i n Y ( ∣ ∣ X W − Y ∣ ∣ F 2 ) ∂ L ∂ Y = − 2 X W + 2 Y = 0 推 导 得 : Y = X W L_1=min_Y\left(||XW-Y||_F^2\right) \\ \frac{\partial L}{\partial Y}=-2XW+2Y=0 \\ 推导得:Y=XW L1=minY(XWYF2)YL=2XW+2Y=0Y=XW

  2. 固定 Y Y Y ,求 W W W
    L = ∣ ∣ X W − Y ∣ ∣ F 2 + α ∣ ∣ W ∣ ∣ 2 , 1 + β   t r { W T X T L X W } ∂ ∣ ∣ X W − Y ∣ ∣ F 2 ∂ W = 2 X T X W − 2 X T Y ∂ β   t r { W T X T L X W } ∂ W = 2 β X T L X W ∂ ∣ ∣ W ∣ ∣ 2 , 1 ∂ W = 2 Λ d × d W ∂ L ∂ W = 2 ( X T X + β X T L X + α Λ ) W − 2 X T Y = 0 推 导 得 迭 代 解 : W = ( X T X + β X T L X + α Λ ) − 1 X T Y L=||XW-Y||_F^2+\alpha||W||_{2,1}+\beta~tr\left\{W^TX^TLXW\right\} \\ \frac{\partial||XW-Y||_F^2}{\partial W}=2X^TXW-2X^TY \\ \frac{\partial\beta~tr\left\{W^TX^TLXW\right\}}{\partial W}=2\beta X^TLXW \\ \frac{\partial||W||_{2,1}}{\partial W}=2\Lambda_{d\times d} W \\ \frac{\partial L}{\partial W}=2(X^TX+\beta X^TLX+\alpha\Lambda)W-2X^TY=0 \\ 推导得迭代解:W=(X^TX+\beta X^TLX+\alpha\Lambda)^{-1}X^TY L=XWYF2+αW2,1+β tr{WTXTLXW}WXWYF2=2XTXW2XTYWβ tr{WTXTLXW}=2βXTLXWWW2,1=2Λd×dWWL=2(XTX+βXTLX+αΛ)W2XTY=0W=(XTX+βXTLX+αΛ)1XTY

  3. W W W 的迭代求解 (Alg1) :

    • Input: Data Matrix X ∈ R m × d X\in R^{m\times d} XRm×d , S ∈ R m × m S\in R^{m\times m} SRm×m, Hyper-parameters: α , β , ϵ > 0 \alpha, \beta, \epsilon > 0 α,β,ϵ>0

    • Output: W W W

    • Initialize: Λ = I d × d \Lambda=I_{d\times d} Λ=Id×d

    • Calculate: L = D − S L=D-S L=DS

    • Repeat:

      • Update W ← ( X T X + β X T L X + α Λ ) − 1 X T Y W\leftarrow(X^TX+\beta X^TLX+\alpha\Lambda)^{-1}X^TY W(XTX+βXTLX+αΛ)1XTY
      • Calculate: Λ = [ Λ i i ] d × d \Lambda=[\Lambda_{ii}]_{d\times d} Λ=[Λii]d×d, Λ i i = 1 ∣ ∣ W i ∣ ∣ 2 + ϵ \Lambda_{ii}=\frac{1}{||W^i||_2+\epsilon} Λii=Wi2+ϵ1
    • Until Convergence.

  4. 基于回归学习及流形结构保持的无监督特征选择 (Alg2) :

    • Input: X ∈ R m × d , S ∈ R m × m , α , β , ϵ > 0 X\in R^{m\times d},S\in R^{m\times m},\alpha,\beta,\epsilon>0 XRm×d,SRm×m,α,β,ϵ>0, the number of selected features k.
    • Output: k selected features
    • Initialize: Λ = I \Lambda=I Λ=I
    • Calculate: L = D − S L=D-S L=DS
    • Repeat:
      • Update W by Alg 1
      • Update Y by Y ← X W Y\leftarrow XW YXW
    • Until Convergence
    • Sort all features according to ∣ ∣ W i ∣ ∣ 2 ||W^i||_2 Wi2 in descending order and select the top k ranked features

四、代码实现

  1. code
    # coding=utf-8
    import numpy as np
    import numpy.matlib
    import math
    import heapq
    import scipy.io as scio
    from sklearn.cluster import KMeans
    from munkres import Munkres
    from sklearn.model_selection import train_test_split
    from sklearn import metrics
    
    
    def load_data(path):
        # 加载数据集
        data = scio.loadmat(path)
        data_x = data['X']
        data_y = data['Y'][:, 0] - 1
        return data_x, data_y
    
    
    def split_set(X, Y):
        train_data, test_data, train_label, test_label = train_test_split(X, Y, test_size=0.25, random_state=10)
        return train_data, test_data, test_label
    
    
    def matrix_gaussianKernel_S(X, sigma, m):
        # 高斯核生成S矩阵
        S = np.zeros((m, m))
        for i in range(m):
            for j in range(m):
                S[i][j] = math.exp(-math.pow(np.linalg.norm(X[i] - X[j], ord=2), 2) / (2 * sigma ** 2))
        return S
    
    
    def initialize_matrix_Lambda(d):
        # 初始化Lambda矩阵
        Lambda = numpy.matlib.identity(d)
        return Lambda
    
    
    def calculate_matrix_L(S, m):
        # 计算拉普拉斯矩阵L
        D = np.zeros((m, m))
        for i in range(m):
            D[i][i] = np.sum(S[i])
        return D - S
    
    
    def update_Lambda(epsilon, W, d):
        # 在循环中更新Lambda矩阵
        Lambda = np.zeros((d, d))
        for i in range(d):
            Lambda[i][i] = 1 / (np.linalg.norm(W[i], ord=2) + epsilon)
        return Lambda
    
    
    def iteration_until_convergence(alpha, beta, epsilon, wcycles, L, Lambda, X, Y, d):
        # 算法1,用于生成W
        W = np.matmul(
            np.matmul(np.linalg.inv(np.matmul(X.T, X) + np.matmul(np.matmul(beta * X.T, L), X) + alpha * Lambda), X.T),
            Y)
        for i in range(wcycles):
            # while np.sum(np.abs(temp_W - W)) > threshold:
            W = np.matmul(
                np.matmul(np.linalg.inv(np.matmul(X.T, X) + np.matmul(np.matmul(beta * X.T, L), X) + alpha * Lambda), X.T),
                Y)
            Lambda = update_Lambda(epsilon, W, d)
        return W
    
    
    def update_Y(X, W):
        # 在循环中更新Y
        return np.matmul(X, W)
    
    
    def rank_based_W(alpha, beta, epsilon, sigma, wcycles, ycycles, k, X, Y, m, d, c, name):
        # 算法2,用于生成特征值
        S = matrix_gaussianKernel_S(X, sigma, m)
        Lambda = initialize_matrix_Lambda(d)
        L = calculate_matrix_L(S, m)
        temp_Y = np.zeros((m, c))
        itetimes = 0
        W = iteration_until_convergence(alpha, beta, epsilon, wcycles, L, Lambda, X, Y, d)
        x, y = [], []
        for i in range(ycycles):
            x.append(itetimes)
            y.append(np.sum(np.abs(temp_Y - Y)))
            temp_Y = Y
            itetimes += 1
            W = iteration_until_convergence(alpha, beta, epsilon, wcycles, L, Lambda, X, Y, d)
            Y = update_Y(X, W)
        if name != '0':
            np.savez(name, x, y)
        norm_W = np.linalg.norm(W, axis=1)
        index = heapq.nlargest(k, range(len(norm_W)), norm_W.take)
        feature_X = np.zeros((m, k))
        for i in range(m):
            feature_X[i] = X[i][index]  # 取每个数据中被提取的特征数据
        return feature_X
    
    
    def best_map(L1, L2):
        # 重排列标签
        Label1 = np.unique(L1)  # 去除重复的元素,由小大大排列
        nClass1 = len(Label1)  # 标签的大小
        Label2 = np.unique(L2)
        nClass2 = len(Label2)
        nClass = np.maximum(nClass1, nClass2)
        G = np.zeros((nClass, nClass))
        for i in range(nClass1):
            ind_cla1 = L1 == Label1[i]
            ind_cla1 = ind_cla1.astype(float)
            for j in range(nClass2):
                ind_cla2 = L2 == Label2[j]
                ind_cla2 = ind_cla2.astype(float)
                G[i, j] = np.sum(ind_cla2 * ind_cla1)
        m = Munkres()
        index = m.compute(-G.T)
        index = np.array(index)
        c = index[:, 1]
        newL2 = np.zeros(L2.shape)
        for i in range(nClass2):
            newL2[L2 == Label2[i]] = Label1[c[i]]
        return newL2
    
    
    def acc_rate(gt_s, s):
        # 计算ACC
        c_x = best_map(gt_s, s)
        err_x = np.sum(gt_s[:] == c_x[:])
        acc = err_x.astype(float) / (gt_s.shape[0])
        return acc
    
    
    def comparative_test(path, epsilon, sigma, wcycles, ycycles, alpha, beta, c, k, name):
        # 对比特征提取与特征不提取的效果
        X, Y = load_data(path)  # 加载数据集
        onehot_Y = np.eye(c)[np.array(Y)]
        m = X.shape[0]  # 训练集长度
        d = X.shape[1]  # 图片长度
        print('---提取特征:')
        feature_X = rank_based_W(alpha, beta, epsilon, sigma, wcycles, ycycles, k, X, onehot_Y, m, d, c, name)
        train_data, test_data, test_label = split_set(feature_X, Y)
        k_means = KMeans(n_clusters=c, random_state=0)
        k_means.fit(train_data)
        pred = k_means.predict(test_data)
        print('ACC:', acc_rate(test_label, pred))
        NMI = metrics.normalized_mutual_info_score(pred, test_label)
        print('NMI: ', NMI)
        print()
        print('---不提取特征:')
        train_data, test_data, test_label = split_set(X, Y)
        k_means = KMeans(n_clusters=c, random_state=0)
        k_means.fit(train_data)
        pred = k_means.predict(test_data)
        print('ACC:', acc_rate(test_label, pred))
        NMI = metrics.normalized_mutual_info_score(pred, test_label)
        print('NMI: ', NMI)
        print()
    
    
    def f_test(path, epsilon, sigma, wcycles, ycycles, alpha, beta, c, k, name):
        # 不输出信息的算法2
        X, Y = load_data(path)  # 加载数据集
        onehot_Y = np.eye(c)[np.array(Y)]
        m = X.shape[0]  # 训练集长度
        d = X.shape[1]  # 图片长度
        feature_X = rank_based_W(alpha, beta, epsilon, sigma, wcycles, ycycles, k, X, onehot_Y, m, d, c, name)
        train_data, test_data, test_label = split_set(feature_X, Y)
        k_means = KMeans(n_clusters=c, random_state=0)
        k_means.fit(train_data)
        pred = k_means.predict(test_data)
        ACC = acc_rate(test_label, pred)
        NMI = metrics.normalized_mutual_info_score(pred, test_label)
        return ACC, NMI
    
    
    def test_alpha_beta(path, epsilon, sigma, wcycles, ycycles, c, k, name):
        # 测试alpha与beta的不同取值对聚类结果的影响
        Alpha = [-3, -2, -1, 0, 1, 2, 3]
        Beta = Alpha
        ACC = []
        NMI = []
        for alpha in Alpha:
            for beta in Beta:
                try:
                    acc, nmi = f_test(path, epsilon, sigma, wcycles, ycycles, alpha, beta, c, k, name)
                except:
                    acc, nmi = 0, 0
                ACC.append(acc)
                NMI.append(nmi)
        np.save('./npy/alpha_beta_ACC_' + path.split('/')[-1].split('.')[0] + '.npy', ACC)
        np.save('./npy/alpha_beta_NMI_' + path.split('/')[-1].split('.')[0] + '.npy', NMI)
    
    
    if __name__ == '__main__':
        """
        --------------comparative test--------------
        该部分用于做进行特征提取以及不进行特征提取的对比试验
        path: 数据集路径
        epsilon: Lambda矩阵防止除0错误的分母附加值
        sigma: 生成S时正态分布核的参数
        wcycles: 优化Y需要的循环次数
        ycycles: 优化Y需要的循环次数
        alpha: 超参数
        beta: 超参数
        c: 标签类别个数
        k: 特征提取的个数
        name: 保存Y迭代的值,用于画图
        """
        # # mnist
        # print('--------------mnist--------------')
        # comparative_test(path='datasets/MNIST.mat', epsilon=0.000001, sigma=1.0, wcycles=5, ycycles=50, alpha=3, beta=1,
        #                  c=10, k=500, name='./npz/mnist.npz')
        #
        # # lung
        # print('--------------lung--------------')
        # comparative_test(path='datasets/lung.mat', epsilon=0.000001, sigma=1.0, wcycles=5, ycycles=50, alpha=3, beta=2,
        #                  c=5, k=2400, name='./npz/lung.npz')
    
        """
        --------------alpha beta test--------------
        该部分用于研究不同alpha与beta取值对聚类结果的影响
        path: 数据集路径
        epsilon: Lambda矩阵防止除0错误的分母附加值
        sigma: 生成S时正态分布核的参数
        wcycles: 优化Y需要的循环次数
        ycycles: 优化Y需要的循环次数
        c: 标签类别个数
        k: 特征提取的个数
        name: 保存Y迭代的值,用于画图
        """
        # mnist
        test_alpha_beta(path='datasets/MNIST.mat', epsilon=0.000001, sigma=1.0, wcycles=5, ycycles=50, c=10, k=500,
                        name='0')
    
        # lung
        test_alpha_beta(path='datasets/lung.mat', epsilon=0.000001, sigma=1.0, wcycles=5, ycycles=50, c=5, k=2400,
                        name='0')
    
    
  2. plot
    import matplotlib.pyplot as plt
    import numpy as np
    
    
    def plot_2d(data, path, name):
        # Y 迭代的画图函数
        plt.figure()
        plt.title(name)  # 标题
        plt.xlabel('迭代次数')
        plt.ylabel('标签绝对差值')
        plt.plot(data['arr_0'], data['arr_1'], 'r--')
        plt.savefig(path)
    
    
    def plot_Y_loss():
        # Y 迭代的入口函数
        plt.rcParams["font.family"] = "FangSong"
        mnist_path = './npz/mnist.npz'
        lung_path = './npz/lung.npz'
    
        mnist_np = np.load(mnist_path)
        lung_np = np.load(lung_path)
    
        plot_2d(mnist_np, './img/mnist_Y.png', 'mnist')
        plot_2d(lung_np, './img/lung_Y.png', 'lung')
    
    
    def plot_3d(NMI, ACC, X, Y, width, depth, c, name):
        # alpha beta 遍历的画图函数
        fig = plt.figure()
        ax = fig.gca(projection='3d')
        height = np.zeros_like(NMI)
        ax.bar3d(X, Y, height, width, depth, NMI, color=c, shade=False)
        ax.set_xlabel('alpha')
        ax.set_ylabel('beta')
        ax.set_zlabel('NMI')
        ax.set_title(name)
        plt.savefig('./img/{}_NMI.png'.format(name))
    
        fig = plt.figure()
        ax = fig.gca(projection='3d')
        height = np.zeros_like(ACC)
        ax.bar3d(X, Y, height, width, depth, ACC, color=c, shade=False)
        ax.set_xlabel('alpha')
        ax.set_ylabel('beta')
        ax.set_zlabel('ACC')
        ax.set_title(name)
        plt.savefig('./img/{}_ACC.png'.format(name))
    
    
    def plot_alpha_beta():
        # alpha beta 遍历的入口函数
        mnist_NMI_path = './npy/alpha_beta_NMI_MNIST.npy'
        mnist_ACC_path = './npy/alpha_beta_ACC_MNIST.npy'
        lung_NMI_path = './npy/alpha_beta_NMI_lung.npy'
        lung_ACC_path = './npy/alpha_beta_ACC_lung.npy'
    
        mnist_NMI = np.load(mnist_NMI_path)
        mnist_ACC = np.load(mnist_ACC_path)
        lung_NMI = np.load(lung_NMI_path)
        lung_ACC = np.load(lung_ACC_path)
    
        X = Y = [-3, -2, -1, 0, 1, 2, 3]
        xx, yy = np.meshgrid(X, Y)
        X, Y = xx.ravel(), yy.ravel()
        width = depth = 0.5
        c = ['r', 'r', 'r', 'r', 'r', 'r', 'r', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'g', 'g', 'g', 'g', 'g', 'g', 'g', 'c',
             'c', 'c', 'c', 'c', 'c', 'c', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'm', 'm', 'm', 'm', 'm', 'm', 'm', 'k', 'k',
             'k', 'k', 'k', 'k', 'k']
    
        plot_3d(mnist_NMI, mnist_ACC, X, Y, width, depth, c, 'mnist')
        plot_3d(lung_NMI, lung_ACC, X, Y, width, depth, c, 'lung')
    
    
    if __name__ == '__main__':
        plot_Y_loss()
        plot_alpha_beta()  # 不要与上一条语句同时运行,三维坐标会显示错误
    
    
  • 7
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

BeZer0

打赏一杯奶茶支持一下作者吧~~

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值