错误的原因之一是此代码rfc.fit(train_data, test_data)。你应该把火车标签作为第二个参数,而不是测试数据。在
至于绘图,你可以尝试做一些类似下面的代码。我假设您知道在这种情况下,k-folds CV只用于选择不同的训练数据集。由于未进行预测,因此忽略测试数据:import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import KFold
from sklearn.datasets import make_classification
# dummy classification dataset
X, y = make_classification(n_features=10)
# dummy feature names
feature_names = ['F{}'.format(i) for i in range(X.shape[1])]
kf = KFold(n_splits=3)
rfc = RandomForestClassifier()
count = 1
# test data is not needed for fitting
for train, _ in kf.split(X, y):
rfc.fit(X[train, :], y[train])
# sort the feature index by importance score in descending order
importances_index_desc = np.argsort(rfc.feature_importances_)[::-1]
feature_labels = [feature_names[i] for i in importances_index_desc]
# plot
plt.figure()
plt.bar(feature_labels, rfc.feature_importances_[importances_index_desc])
plt.xticks(feature_labels, rotation='vertical')
plt.ylabel('Importance')
plt.xlabel('Features')
plt.title('Fold {}'.format(count))
count = count + 1
plt.show()