利用minepy可以方便的利用最大互信息系数法进行特征工程,我在安装时遇到了以下问题,如图所示:
错误原因因该是我的visual studio安装出了问题,我打开相应位置,发现我没有platformSDK文件夹
我不想重装VS了,到网上寻找解决办法,发现可以从https://www.lfd.uci.edu/~gohlke/pythonlibs/网站上下载,如下图所示:
下载后,切换到目录,用以下命令进行安装:
pip install minepy-1.2.4-cp37-cp37m-win_amd64.whl
最后给一个用最大互信息系数进行特征工程的案例代码:
from sklearn.feature_selection import SelectKBest
from minepy import MINE
label = "故障类别"
def mic(x, y):
m =MINE(alpha=0.6, c=15)
m.compute_score(x, y)
return (m.mic(), 0.6)
'''
compute_score(x, y)
Computes the maximum normalized mutual information scores between x and y.
计算x和y之间的最大标准化互信息分数。
'''
Y = data_all_copy[label].values
Y=Y[:500]
x = data_all_copy[[x for x in data_all_copy.columns if x != label]].values
x=x[:500]
mean_X = x.mean(axis=0)
std_X = x.std(axis=0)
x=(x-mean_X)/std_X
print(X)
#选择K个最好的特征,返回特征选择后的数据
#print(SelectKBest(lambda X, Y: list(np.array([pearsonr(x, Y) for x in X.T]).T), k=5).fit_transform(x, Y))
X_Threshold=SelectKBest(lambda X, Y: np.array(list(map(lambda x:mic(x, Y), X.T))).T[0], k=10).fit_transform(x, Y)
print(x.shape, X_Threshold.shape)
Selected_feature=[]
for i in range(X_Threshold.shape[1]):
for j in range(x.shape[1]):
if abs(X_Threshold[:,i]-x[:,j]).sum()==0:
Selected_feature.append(columnsName[j])
f4=Selected_feature
f4 #将选择的5个特征名称显示出来