本笔记来源于B站Up主: 有Li 的影像组学系列教学视频
本节(5)主要介绍: 特征筛选之方差选择法
针对医疗人员在影像组学研究中碰到的编程问题,李博士建议:
如果有一门编程语言基础的话会比较轻松
先学说话,再学语法
根据你的需求顺序,而非课本安排的顺序来学
方差选择法:
思考:一个能用来做分类的特征,它的方差应该是怎么样的?
方差公式:
方差选择法进行降维的代码实现:
import pandas as pd
import numpy as np
from sklearn.utils import shuffle
xlsx1_filePath = 'C:/Users/RONG/Desktop/PythonBasic/data_A.xlsx' xlsx2_filePath = 'C:/Users/RONG/Desktop/PythonBasic/data_B.xlsx' data_1 = pd.read_excel(xlsx1_filePath) data_2 = pd.read_excel(xlsx2_filePath) rows_1,__ = data_1.shape rows_2,__ = data_2.shape data_1.insert(0,'label',[0]*rows_1) data_2.insert(0,'label',[1]*rows_2) data = pd.concat([data_1,data_2]) data = shuffle(data) data = data.fillna(0)
X = data[data.columns[0:]] X.head()
方差选择法:
# VarianceSelection from sklearn.feature_selection import VarianceThreshold selector = VarianceThreshold(1e10) # 注意修改参数达到筛选目的 selector.fit_transform(X) # print('EveryVaris:'+str(selector.variances_)) print('selectedFeatureIndex:'+str(selector.get_support(True))) print('selectedFeatureNameis:'+str(X.columns[selector.get_support(True)])) # print('excludedFeatureNameis:'+str(X.columns[~ selector.get_support()])) # ‘~’取反
Output:
# selectedFeatureIndex:[17 30 34 92] # selectedFeatureNameis:Index(['original_firstorder_Energy', 'original_firstorder_TotalEnergy', # 'original_glcm_ClusterProminence', # 'original_glszm_LargeAreaHighGrayLevelEmphasis'], # dtype='object')
作者:北欧森林
链接:https://www.jianshu.com/p/7e44f3b53411
来源:简书,已获授权转载RadiomicsWorld.com “影像组学世界”论坛:
影像组学世界/RadiomicsWorld