官方解释
Python中的xgboost可以通过get_fscore获取特征重要性,先看看官方对于这个方法的说明:
get_score(fmap=’’, importance_type=‘weight’)
Get feature importance of each feature. Importance type can be defined as:
‘weight’: the number of times a feature is used to split the data across all trees.
‘gain’: the average gain across all splits the feature is used in.
‘cover’: the average coverage across all splits the feature is used in.
‘total_gain’: the total gain across all splits the feature is used in.
‘total_cover’: the total coverage across all splits the feature is used in.
看释义不直观,下面通过训练一个简单的模型,输出这些重要性指标,再结合释义进行解释。
代码实践
首先构造10个样例的样本,每个样例有两维特征,标签为0或1,二分类问题:
import numpy as np
sample_num = 10
feature_num = 2
np.random.seed(0)
data = np.random.randn(sample_num, feature_num)