光谱特征工程流程图
随机蛙跳算法步骤:
步骤一:初始化函数,在波段范围内随机选取一个初始变量子集V0,包含的变量个数为Q,设置所需要的迭代次数N。
步骤二:基于V0,选择包含有Q*个随机变量的候选子集V*作为V1,使用V1代替V0,循环次步骤直到N次迭代完成。
从均值为Q,方差为0.3Q的正态分布中随机选取一个整数Q*,之后通过三种情况产生一个包含Q*个变量的候选子集V*:(1)若Q*=Q,则令V*=V0;(2)若Q*<Q,首先对V0建立PLS模型,计算每个被选择变量的回归系数,将Q-Q*个最小回归系数的相关变量从V0去除,剩下的Q*个变量作为候选子集V*。(3)若Q*>Q,则从V-V0中随机抽取ω(Q*-Q)个变量,产生一个子集T,将V0和T组合,建立PLS模型,保留PLS回归系数中最大的Q*个变量,产生候选子集V*。若V*的表现好于V0,则用V*取代V0;否则,按一定的概率接受V*作为V0。
步骤三:统计每次迭代过程中光谱被选中的波段,将其带入PLS模型进行计算R2和RMSE,计算在整个迭代过程中每个光谱波段被选择的概率,概率越高则光谱变量越重要。
步骤四:将光谱变量按选中概率从高到低进行排序,使用PLS或MLR进行回归分析,每次增加一个变量,建立回归模型,将模型带入测试集找到RMSEP及R2最优时所选择的波段。
python代码:
import random from matplotlib import pyplot import numpy as np import matplotlib.pyplot as plt from sklearn.cross_decomposition import PLSRegression from sklearn.metrics import mean_squared_error, r2_score from sklearn.model_selection import cross_val_predict from pandas import read_csv from scipy import signal import warnings warnings.filterwarnings("ignore") from sys import stdout pyplot.rcParams['font.sans-serif'] = ['SimHei'] # 修改为中文字符 pyplot.rcParams['axes.unicode_minus'] = False data = read_csv(r'C:\Users\Lenovo\Desktop\niandu.csv', header=None) x = np.array(data.loc[1:, 1:]) y = data.loc[1:, 0] # 定义基本PLS回归模型 def base_pls(x,y,n_components): pls = PLSRegression(n_components=n_components) pls.fit(x, y) # 函数拟合 beta = pls.coef_ return beta # 交叉验证确定主成分数 def pls_optimise_components(x, y, npc): rmsecv = np.zeros(npc) for i in range(1, npc+1, 1): pls_simple = PLSRegression(n_components=i) pls_simple.fit(x, y) y_cv = cross_val_predict(pls_simple, x, y, cv=10) rmsecv[i-1] = np.sqrt(mean_squared_error(y, y_cv)) opt_comp = np.argmin(rmsecv) # 最佳主成分数 return opt_comp+1 def pls_optimise_components_1(x, y, npc): rmsecv = np.zeros(npc) for i in range(1, npc+1, 1): pls_simple = PLSRegression(n_components=i) pls_simple.fit(x, y) y_cv = cross_val_predict(pls_simple, x, y, cv=10) rmsecv[i-1] = np.sqrt(mean_squared_error(y, y_cv)) rmsecv_min = np.min(rmsecv) return rmsecv_min def index_sort(s, x): for j in range(x.shape[1]): # s[j] = s[j]/x.shape[1] s[j] = s[j] / 10000 # 根据N的值更改 return s def randomfrog_pls(x, y, Q, N): varindex = np.arange(x.shape[1]) # 构建索引 model = np.zeros((N, x.shape[1])) # 构建N行,列的0矩阵 a = np.arange(x.shape[1]) v0 = np.random.choice(a, Q, replace=False) # 选取前Q个元素作为初始V0 probability = np.zeros(x.shape[1]) # 创建1行,列的零矩阵作为接收器 for i in range(N): nvstar = min(x.shape[1], np.max(np.round(np.random.normal(Q, 0.3*Q, 2)))) # 随机生成一个数nVstar(n*) vstar = generate_newmodel(v0, nvstar, a, x, y) rmsecv_v0 = pls_optimise_components_1(x[:, v0], y, min(len(v0), 25)) rmsecv_vstar = pls_optimise_components_1(x[:, vstar], y, min(len(vstar), 25)) stdout.write("\r" + str(i)+' ') print("rmsecv_v0为%f, rmsecv_vstar为%f, v0的大小为%d" % (rmsecv_v0, rmsecv_vstar, len(v0))) if rmsecv_vstar < rmsecv_v0: prob_accept = 1 else: prob_accept = 0.1*rmsecv_v0/rmsecv_vstar rand_judge = random.random() # 产生一个0到1之间的随机数 if prob_accept > rand_judge: v0 = vstar else: v0 = v0 model[i, v0] = 1 print(v0) probability[v0] += 1 Q = len(v0) return probability def generate_newmodel(v0, nvstar, a, x, y): nv0 = len(v0) d = nvstar - nv0 if d > 0: w = 3 qs = w*d varindex_1 = np.delete(a, v0) j = len(varindex_1) # 减去v0剩下索引的长度 h = min(j, qs) # 比较J和QS的大小 np.random.shuffle(varindex_1) # 打乱索引的顺序 t = varindex_1[:int(h)] # 取前H个作为索引t index_combine = np.hstack((v0, t)) # 合并t和v0 x_top1 = x[:, index_combine].reshape(-1, len(index_combine)) opt_best1 = pls_optimise_components(x_top1, y, min(len(index_combine), 25)) l2 = base_pls(x_top1, y, n_components=opt_best1) coef_abs = np.abs(l2) coef_abs2 = np.argsort(-coef_abs, axis=0).reshape(1, -1)[0] index_v2 = coef_abs2[:int(nvstar)] # 取相关系数最大的n*个索引 vstar = index_combine[index_v2] elif d < 0: x_top = x[:, v0].reshape(-1, len(v0)) opt_best = pls_optimise_components(x_top, y, min(len(v0), 25)) l1 = base_pls(x_top, y, n_components=opt_best) coef_abs = np.abs(l1) coef_abs1 = np.argsort(-coef_abs, axis=0).reshape(1, -1)[0] index_v1 = coef_abs1[:int(nvstar)] # 选取回归系数大的相关变量,去除Q-q1个最小回归系数的变量 vstar = v0[index_v1] else: vstar = v0 return vstar s = randomfrog_pls(x, y, Q=10, N=10000) index = index_sort(s, x) index_max = np.argsort(index)[::-1] # 降序后排列 print(index_max[:200]) x1 = np.arange(x.shape[1]) y1 = index plt.figure(figsize=(12, 8), dpi=100) plt.scatter(x1, y1) # 绘制散点图 # plt.tick_params(labelsize=20) plt.xlabel("波段索引", fontsize=20) plt.ylabel("被选中概率", fontsize=20) plt.title("随机蛙跳算法", fontsize=20) plt.show()
通过使用随机蛙跳特征提取方法对数据集进行处理,使用原数据集67%的光谱特征量,将测试集的R2提高2.99%,将测试集的RMSEP提高7.95%。R2和RMSEP分别达到0.9193和0.1934。