项目地址
一、问题引入
特征选择:
- 简单来说就是对大数据进行一个降维,减少计算量,比如从一张图片中只提取部分像素作为特征进行分类任务;
- 选出最具重要性、代表性、低冗余性的特征。
二、数学建模
-
定义
x i ∈ R d × 1 D = [ s u m ( S 1 ) 0 ⋯ 0 0 s u m ( S 2 ) ⋯ 0 ⋮ ⋮ ⋮ 0 0 ⋯ s u m ( S m ) ] X = [ x 1 T x 2 T ⋮ x m T ] ∈ R m × d y i ∈ R c × 1 Y = [ y 1 T y 2 T ⋮ y m T ] ∈ R m × c S = [ S i j ] m × m S i j = e x p { − ∣ ∣ x i − x j ∣ ∣ 2 2 2 σ 2 } x_i\in R^{d\times1} \\ D=\begin{bmatrix}sum(S_1) & 0 & \cdots & 0 \\ 0 & sum(S_2) & \cdots & 0 \\ \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & sum(S_m) \end{bmatrix} \\ X=\begin{bmatrix}x_1^T\\x_2^T\\\vdots\\x_m^T\end{bmatrix}\in R^{m\times d} \\ y_i\in R^{c\times 1} \\ Y=\begin{bmatrix}y_1^T\\y_2^T\\\vdots\\y_m^T\end{bmatrix}\in R^{m\times c} \\ S=[S_{ij}]_{m\times m} \\ S_{ij}=exp\left\{-\frac{||x_i-x_j||_2^2}{2\sigma^2}\right\} xi∈Rd×1D=⎣⎢⎢⎢⎡sum(S1)0⋮00sum(S2)⋮0⋯⋯⋯00⋮sum(Sm)⎦⎥⎥⎥⎤X=⎣⎢⎢⎢⎡x1Tx2T⋮xmT⎦⎥⎥⎥⎤∈Rm×dyi∈Rc×1Y=⎣⎢⎢⎢⎡y1Ty2T⋮ymT⎦⎥⎥⎥⎤∈Rm×cS=[Sij]m×mSij=exp{−2σ2∣∣xi−xj∣∣22} -
基础
∣ ∣ X W − Y ∣ ∣ F 2 = t r { ( X W − Y ) T ( X W − Y ) } = t r { W T X T X W } − 2 t r { Y T X W } + t r { Y T Y } ||XW-Y||_F^2=tr\left\{(XW-Y)^T(XW-Y)\right\}=tr\left\{W^TX^TXW\right\}-2tr\left\{Y^TXW\right\}+tr\left\{Y^TY\right\} ∣∣XW−Y∣∣F2=tr{(XW−Y)T(XW−Y)}=tr{WTXTXW}−2tr{YTXW}+tr{YTY} -
误差项
∑ i = 1 m ∣ ∣ W T x i − y i ∣ ∣ 2 2 = ∣ ∣ X W − Y ∣ ∣ F 2 = t r ( X W − Y ) T ( X W − Y ) \sum_{i=1}^m||W^Tx_i-y_i||_2^2=||XW-Y||_F^2=tr(XW-Y)^T(XW-Y) i=1∑m∣∣WTxi−yi∣∣22=∣∣XW−Y∣∣F2=tr(XW−Y)T(XW−Y) -
稀疏正则项
∣ ∣ W ∣ ∣ 2 , 0 → ∣ ∣ W ∣ ∣ 2 , 1 ||W||_{2,0}\rightarrow||W||_{2,1} ∣∣W∣∣2,0→∣∣W∣∣2,1 -
流形结构保持的约束
1 2 ∑ i = 1 m ∑ j = 1 m ∣ ∣ W T x i − W T x j ∣ ∣ x 2 S i j = t r ( W T X L X W ) L = D − S \frac{1}{2}\sum_{i=1}^m\sum_{j=1}^m||W^Tx_i-W^Tx_j||_x^2S_{ij}=tr(W^TXLXW) \\ L=D-S 21i=1∑mj=1∑m∣∣WTxi−WTxj∣∣x2Sij=tr(WTXLXW)L=D−S -
最终的目标
m i n t r { ∣ ∣ X W − Y ∣ ∣ F 2 } + α ∣ ∣ W ∣ ∣ 2 , 1 + β t r { W T X L X W } min~tr\left\{||XW-Y||_F^2\right\}+\alpha~||W||_{2,1}+\beta~tr\left\{W^TXLXW\right\} min tr{∣∣XW−Y∣∣F2}+α ∣∣W∣∣2,1+β tr{WTXLXW} -
待求参数
W ∈ R d × c Y ∈ R m × c W\in R^{d\times c} \\ Y\in R^{m\times c} W∈Rd×cY∈Rm×c
三、模型优化(轮换迭代)
-
固定 W W W ,求 Y Y Y
L 1 = m i n Y ( ∣ ∣ X W − Y ∣ ∣ F 2 ) ∂ L ∂ Y = − 2 X W + 2 Y = 0 推 导 得 : Y = X W L_1=min_Y\left(||XW-Y||_F^2\right) \\ \frac{\partial L}{\partial Y}=-2XW+2Y=0 \\ 推导得:Y=XW L1=minY(∣∣XW−Y∣∣F2)∂Y∂L=−2XW+2Y=0推导得:Y=XW -
固定 Y Y Y ,求 W W W
L = ∣ ∣ X W − Y ∣ ∣ F 2 + α ∣ ∣ W ∣ ∣ 2 , 1 + β t r { W T X T L X W } ∂ ∣ ∣ X W − Y ∣ ∣ F 2 ∂ W = 2 X T X W − 2 X T Y ∂ β t r { W T X T L X W } ∂ W = 2 β X T L X W ∂ ∣ ∣ W ∣ ∣ 2 , 1 ∂ W = 2 Λ d × d W ∂ L ∂ W = 2 ( X T X + β X T L X + α Λ ) W − 2 X T Y = 0 推 导 得 迭 代 解 : W = ( X T X + β X T L X + α Λ ) − 1 X T Y L=||XW-Y||_F^2+\alpha||W||_{2,1}+\beta~tr\left\{W^TX^TLXW\right\} \\ \frac{\partial||XW-Y||_F^2}{\partial W}=2X^TXW-2X^TY \\ \frac{\partial\beta~tr\left\{W^TX^TLXW\right\}}{\partial W}=2\beta X^TLXW \\ \frac{\partial||W||_{2,1}}{\partial W}=2\Lambda_{d\times d} W \\ \frac{\partial L}{\partial W}=2(X^TX+\beta X^TLX+\alpha\Lambda)W-2X^TY=0 \\ 推导得迭代解:W=(X^TX+\beta X^TLX+\alpha\Lambda)^{-1}X^TY L=∣∣XW−Y∣∣F2+α∣∣W∣∣2,1+β tr{WTXTLXW}∂W∂∣∣XW−Y∣∣F2=2XTXW−2XTY∂W∂β tr{WTXTLXW}=2βXTLXW∂W∂∣∣W∣∣2,1=2Λd×dW∂W∂L=2(XTX+βXTLX+αΛ)W−2XTY=0推导得迭代解:W=(XTX+βXTLX+αΛ)−1XTY -
W W W 的迭代求解 (Alg1) :
-
Input: Data Matrix X ∈ R m × d X\in R^{m\times d} X∈Rm×d , S ∈ R m × m S\in R^{m\times m} S∈Rm×m, Hyper-parameters: α , β , ϵ > 0 \alpha, \beta, \epsilon > 0 α,β,ϵ>0
-
Output: W W W
-
Initialize: Λ = I d × d \Lambda=I_{d\times d} Λ=Id×d
-
Calculate: L = D − S L=D-S L=D−S
-
Repeat:
- Update W ← ( X T X + β X T L X + α Λ ) − 1 X T Y W\leftarrow(X^TX+\beta X^TLX+\alpha\Lambda)^{-1}X^TY W←(XTX+βXTLX+αΛ)−1XTY
- Calculate: Λ = [ Λ i i ] d × d \Lambda=[\Lambda_{ii}]_{d\times d} Λ=[Λii]d×d, Λ i i = 1 ∣ ∣ W i ∣ ∣ 2 + ϵ \Lambda_{ii}=\frac{1}{||W^i||_2+\epsilon} Λii=∣∣Wi∣∣2+ϵ1
-
Until Convergence.
-
-
基于回归学习及流形结构保持的无监督特征选择 (Alg2) :
- Input: X ∈ R m × d , S ∈ R m × m , α , β , ϵ > 0 X\in R^{m\times d},S\in R^{m\times m},\alpha,\beta,\epsilon>0 X∈Rm×d,S∈Rm×m,α,β,ϵ>0, the number of selected features k.
- Output: k selected features
- Initialize: Λ = I \Lambda=I Λ=I
- Calculate: L = D − S L=D-S L=D−S
- Repeat:
- Update W by Alg 1
- Update Y by Y ← X W Y\leftarrow XW Y←XW
- Until Convergence
- Sort all features according to ∣ ∣ W i ∣ ∣ 2 ||W^i||_2 ∣∣Wi∣∣2 in descending order and select the top k ranked features
四、代码实现
- code
# coding=utf-8 import numpy as np import numpy.matlib import math import heapq import scipy.io as scio from sklearn.cluster import KMeans from munkres import Munkres from sklearn.model_selection import train_test_split from sklearn import metrics def load_data(path): # 加载数据集 data = scio.loadmat(path) data_x = data['X'] data_y = data['Y'][:, 0] - 1 return data_x, data_y def split_set(X, Y): train_data, test_data, train_label, test_label = train_test_split(X, Y, test_size=0.25, random_state=10) return train_data, test_data, test_label def matrix_gaussianKernel_S(X, sigma, m): # 高斯核生成S矩阵 S = np.zeros((m, m)) for i in range(m): for j in range(m): S[i][j] = math.exp(-math.pow(np.linalg.norm(X[i] - X[j], ord=2), 2) / (2 * sigma ** 2)) return S def initialize_matrix_Lambda(d): # 初始化Lambda矩阵 Lambda = numpy.matlib.identity(d) return Lambda def calculate_matrix_L(S, m): # 计算拉普拉斯矩阵L D = np.zeros((m, m)) for i in range(m): D[i][i] = np.sum(S[i]) return D - S def update_Lambda(epsilon, W, d): # 在循环中更新Lambda矩阵 Lambda = np.zeros((d, d)) for i in range(d): Lambda[i][i] = 1 / (np.linalg.norm(W[i], ord=2) + epsilon) return Lambda def iteration_until_convergence(alpha, beta, epsilon, wcycles, L, Lambda, X, Y, d): # 算法1,用于生成W W = np.matmul( np.matmul(np.linalg.inv(np.matmul(X.T, X) + np.matmul(np.matmul(beta * X.T, L), X) + alpha * Lambda), X.T), Y) for i in range(wcycles): # while np.sum(np.abs(temp_W - W)) > threshold: W = np.matmul( np.matmul(np.linalg.inv(np.matmul(X.T, X) + np.matmul(np.matmul(beta * X.T, L), X) + alpha * Lambda), X.T), Y) Lambda = update_Lambda(epsilon, W, d) return W def update_Y(X, W): # 在循环中更新Y return np.matmul(X, W) def rank_based_W(alpha, beta, epsilon, sigma, wcycles, ycycles, k, X, Y, m, d, c, name): # 算法2,用于生成特征值 S = matrix_gaussianKernel_S(X, sigma, m) Lambda = initialize_matrix_Lambda(d) L = calculate_matrix_L(S, m) temp_Y = np.zeros((m, c)) itetimes = 0 W = iteration_until_convergence(alpha, beta, epsilon, wcycles, L, Lambda, X, Y, d) x, y = [], [] for i in range(ycycles): x.append(itetimes) y.append(np.sum(np.abs(temp_Y - Y))) temp_Y = Y itetimes += 1 W = iteration_until_convergence(alpha, beta, epsilon, wcycles, L, Lambda, X, Y, d) Y = update_Y(X, W) if name != '0': np.savez(name, x, y) norm_W = np.linalg.norm(W, axis=1) index = heapq.nlargest(k, range(len(norm_W)), norm_W.take) feature_X = np.zeros((m, k)) for i in range(m): feature_X[i] = X[i][index] # 取每个数据中被提取的特征数据 return feature_X def best_map(L1, L2): # 重排列标签 Label1 = np.unique(L1) # 去除重复的元素,由小大大排列 nClass1 = len(Label1) # 标签的大小 Label2 = np.unique(L2) nClass2 = len(Label2) nClass = np.maximum(nClass1, nClass2) G = np.zeros((nClass, nClass)) for i in range(nClass1): ind_cla1 = L1 == Label1[i] ind_cla1 = ind_cla1.astype(float) for j in range(nClass2): ind_cla2 = L2 == Label2[j] ind_cla2 = ind_cla2.astype(float) G[i, j] = np.sum(ind_cla2 * ind_cla1) m = Munkres() index = m.compute(-G.T) index = np.array(index) c = index[:, 1] newL2 = np.zeros(L2.shape) for i in range(nClass2): newL2[L2 == Label2[i]] = Label1[c[i]] return newL2 def acc_rate(gt_s, s): # 计算ACC c_x = best_map(gt_s, s) err_x = np.sum(gt_s[:] == c_x[:]) acc = err_x.astype(float) / (gt_s.shape[0]) return acc def comparative_test(path, epsilon, sigma, wcycles, ycycles, alpha, beta, c, k, name): # 对比特征提取与特征不提取的效果 X, Y = load_data(path) # 加载数据集 onehot_Y = np.eye(c)[np.array(Y)] m = X.shape[0] # 训练集长度 d = X.shape[1] # 图片长度 print('---提取特征:') feature_X = rank_based_W(alpha, beta, epsilon, sigma, wcycles, ycycles, k, X, onehot_Y, m, d, c, name) train_data, test_data, test_label = split_set(feature_X, Y) k_means = KMeans(n_clusters=c, random_state=0) k_means.fit(train_data) pred = k_means.predict(test_data) print('ACC:', acc_rate(test_label, pred)) NMI = metrics.normalized_mutual_info_score(pred, test_label) print('NMI: ', NMI) print() print('---不提取特征:') train_data, test_data, test_label = split_set(X, Y) k_means = KMeans(n_clusters=c, random_state=0) k_means.fit(train_data) pred = k_means.predict(test_data) print('ACC:', acc_rate(test_label, pred)) NMI = metrics.normalized_mutual_info_score(pred, test_label) print('NMI: ', NMI) print() def f_test(path, epsilon, sigma, wcycles, ycycles, alpha, beta, c, k, name): # 不输出信息的算法2 X, Y = load_data(path) # 加载数据集 onehot_Y = np.eye(c)[np.array(Y)] m = X.shape[0] # 训练集长度 d = X.shape[1] # 图片长度 feature_X = rank_based_W(alpha, beta, epsilon, sigma, wcycles, ycycles, k, X, onehot_Y, m, d, c, name) train_data, test_data, test_label = split_set(feature_X, Y) k_means = KMeans(n_clusters=c, random_state=0) k_means.fit(train_data) pred = k_means.predict(test_data) ACC = acc_rate(test_label, pred) NMI = metrics.normalized_mutual_info_score(pred, test_label) return ACC, NMI def test_alpha_beta(path, epsilon, sigma, wcycles, ycycles, c, k, name): # 测试alpha与beta的不同取值对聚类结果的影响 Alpha = [-3, -2, -1, 0, 1, 2, 3] Beta = Alpha ACC = [] NMI = [] for alpha in Alpha: for beta in Beta: try: acc, nmi = f_test(path, epsilon, sigma, wcycles, ycycles, alpha, beta, c, k, name) except: acc, nmi = 0, 0 ACC.append(acc) NMI.append(nmi) np.save('./npy/alpha_beta_ACC_' + path.split('/')[-1].split('.')[0] + '.npy', ACC) np.save('./npy/alpha_beta_NMI_' + path.split('/')[-1].split('.')[0] + '.npy', NMI) if __name__ == '__main__': """ --------------comparative test-------------- 该部分用于做进行特征提取以及不进行特征提取的对比试验 path: 数据集路径 epsilon: Lambda矩阵防止除0错误的分母附加值 sigma: 生成S时正态分布核的参数 wcycles: 优化Y需要的循环次数 ycycles: 优化Y需要的循环次数 alpha: 超参数 beta: 超参数 c: 标签类别个数 k: 特征提取的个数 name: 保存Y迭代的值,用于画图 """ # # mnist # print('--------------mnist--------------') # comparative_test(path='datasets/MNIST.mat', epsilon=0.000001, sigma=1.0, wcycles=5, ycycles=50, alpha=3, beta=1, # c=10, k=500, name='./npz/mnist.npz') # # # lung # print('--------------lung--------------') # comparative_test(path='datasets/lung.mat', epsilon=0.000001, sigma=1.0, wcycles=5, ycycles=50, alpha=3, beta=2, # c=5, k=2400, name='./npz/lung.npz') """ --------------alpha beta test-------------- 该部分用于研究不同alpha与beta取值对聚类结果的影响 path: 数据集路径 epsilon: Lambda矩阵防止除0错误的分母附加值 sigma: 生成S时正态分布核的参数 wcycles: 优化Y需要的循环次数 ycycles: 优化Y需要的循环次数 c: 标签类别个数 k: 特征提取的个数 name: 保存Y迭代的值,用于画图 """ # mnist test_alpha_beta(path='datasets/MNIST.mat', epsilon=0.000001, sigma=1.0, wcycles=5, ycycles=50, c=10, k=500, name='0') # lung test_alpha_beta(path='datasets/lung.mat', epsilon=0.000001, sigma=1.0, wcycles=5, ycycles=50, c=5, k=2400, name='0')
- plot
import matplotlib.pyplot as plt import numpy as np def plot_2d(data, path, name): # Y 迭代的画图函数 plt.figure() plt.title(name) # 标题 plt.xlabel('迭代次数') plt.ylabel('标签绝对差值') plt.plot(data['arr_0'], data['arr_1'], 'r--') plt.savefig(path) def plot_Y_loss(): # Y 迭代的入口函数 plt.rcParams["font.family"] = "FangSong" mnist_path = './npz/mnist.npz' lung_path = './npz/lung.npz' mnist_np = np.load(mnist_path) lung_np = np.load(lung_path) plot_2d(mnist_np, './img/mnist_Y.png', 'mnist') plot_2d(lung_np, './img/lung_Y.png', 'lung') def plot_3d(NMI, ACC, X, Y, width, depth, c, name): # alpha beta 遍历的画图函数 fig = plt.figure() ax = fig.gca(projection='3d') height = np.zeros_like(NMI) ax.bar3d(X, Y, height, width, depth, NMI, color=c, shade=False) ax.set_xlabel('alpha') ax.set_ylabel('beta') ax.set_zlabel('NMI') ax.set_title(name) plt.savefig('./img/{}_NMI.png'.format(name)) fig = plt.figure() ax = fig.gca(projection='3d') height = np.zeros_like(ACC) ax.bar3d(X, Y, height, width, depth, ACC, color=c, shade=False) ax.set_xlabel('alpha') ax.set_ylabel('beta') ax.set_zlabel('ACC') ax.set_title(name) plt.savefig('./img/{}_ACC.png'.format(name)) def plot_alpha_beta(): # alpha beta 遍历的入口函数 mnist_NMI_path = './npy/alpha_beta_NMI_MNIST.npy' mnist_ACC_path = './npy/alpha_beta_ACC_MNIST.npy' lung_NMI_path = './npy/alpha_beta_NMI_lung.npy' lung_ACC_path = './npy/alpha_beta_ACC_lung.npy' mnist_NMI = np.load(mnist_NMI_path) mnist_ACC = np.load(mnist_ACC_path) lung_NMI = np.load(lung_NMI_path) lung_ACC = np.load(lung_ACC_path) X = Y = [-3, -2, -1, 0, 1, 2, 3] xx, yy = np.meshgrid(X, Y) X, Y = xx.ravel(), yy.ravel() width = depth = 0.5 c = ['r', 'r', 'r', 'r', 'r', 'r', 'r', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'g', 'g', 'g', 'g', 'g', 'g', 'g', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'm', 'm', 'm', 'm', 'm', 'm', 'm', 'k', 'k', 'k', 'k', 'k', 'k', 'k'] plot_3d(mnist_NMI, mnist_ACC, X, Y, width, depth, c, 'mnist') plot_3d(lung_NMI, lung_ACC, X, Y, width, depth, c, 'lung') if __name__ == '__main__': plot_Y_loss() plot_alpha_beta() # 不要与上一条语句同时运行,三维坐标会显示错误