sklearn中有cross_val_score()交叉验证函数,也可以自定义此函数:
from sklearn.model_selection import StratifiedKFold
from sklearn.base import clone
skfolds=StratifiedKFold(n_splits=3,random_state=42)
for train_index,test_index in skfolds.split(X_train,y_train_5):
clone_clf=clone(sgd_clf)
X_train_folds=X_train[train_index]
y_train_folds=y_train_5[train_index]
X_test_fold=X_train[test_index]
y_test_fold=y_train_5[test_index]
clone_clf.fit(X_train_folds,y_train_folds)
y_pred=clone_clf.predict(X_test_fold)
n_correct=sum(y_pred==y_test_fold)
print(n_correct/len(y_pred))
输出为:
0.96295
0.9649
0.9501
每个折叠由StratifiedKFold执行分层抽样产生,其所包含的各个类的比例符合整体比例。每个迭代会创建一个分类器的副本,用训练集对这个副本进行训练,然后用测试集进行预测,最后计算正确预测的次数,输出预测的正确率。
如果使用sklearn的cross_val_score()函数的话,代码如下:
from sklearn.model_selection import cross_val_score
cross_val_score(sgd_clf,X_train,y_train_5,cv=3,scoring='accuracy')
输出为:
array([0.96295, 0.9649 , 0.9501 ])