将数据分成训练集,验证集,测试集
其中将训练集+验证集的K折交叉验证,就是将数据分成K等分,其中1份作为验证集,剩余的k-1作为训练集,对模型进行拟合,得到的平均值,就是这个模型的score,如下就是简单的示例,其中 train_table, train_labels是已经处理好的数据集
from sklearn.model_selection import cross_val_score
#x_train, x_test, y_train, y_test = train_test_split(train_table, train_labels, test_size=0.2, random_state=0)
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=100, max_depth=None, min_samples_split=2, random_state=0)
scores = cross_val_score(clf, train_table, train_labels, cv=5)
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
Accuracy: 0.98 (+/- 0.02)
方法2:
from sklearn.model_selection import ShuffleSplit
n_samples = iris.data.shape[0]
cv = ShuffleSplit(n_splits=5, test_size=0.3, random_state=0)
cross_val_score(clf, iris.data, iris.target, cv=cv)