核心代码
使用 feature_importances_
可以提取随机森林的特征排序
rf_model.feature_importances_
案例
创建数据:
import pandas as pd
import numpy as np
x = pd.DataFrame(np.random.randint(0,100,size=(50, 3)))
y = pd.DataFrame(np.random.randint(0,5,size=(50, 1)))
切分训练集与测试集:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3)
训练随机森林模型:
from sklearn.ensemble import RandomForestRegressor
rf_model = RandomForestRegressor(n_estimators=100)
rf_model.fit(x_train,y_train)
随机森林训练后的特征重要性提取:
predict = rf_model.predict(x_test)
features = x.columns
feature_importances = rf_model.feature_importances_
features_df = pd.DataFrame({'Features':features,'Importance':feature_importances})
features_df.sort_values('Importance',inplace=True,ascending=False)
重要性的排序:
features_df
绘图:
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(rc={"figure.figsize": (21, 4)})
sns.barplot(features_df['Features'][:10], features_df['Importance'][:10],)
plt.ylabel('Word count')
# 数据可视化:柱状图
sns.despine(bottom=True)
plt.show()